1. Learn a Programming Language

It doesn’t matter what language you pick, but it is important to learn at least one. You will be able to use that language to write automation scripts.

The guide recommends Python or Go for programming languages and Bash or PowerShell Core for shell scripting in DevOps. Python is versatile, widely used for automation and tools, while Go is crucial for container technologies like Kubernetes and Docker. Bash is ideal for Linux environments, and PowerShell Core is cross-platform, suitable for Windows and Linux. For more details, check the full guide here.

◇Python

Python is a high-level, interpreted programming language known for its simplicity, readability, and versatility. It supports multiple programming paradigms, including procedural, object-oriented, and functional programming. Python’s extensive standard library and vast ecosystem of third-party packages make it suitable for a wide range of applications, from web development and data analysis to artificial intelligence and scientific computing. Its clean syntax and dynamic typing allow for rapid development and prototyping. Python’s “batteries included” philosophy provides rich built-in functionalities, while its cross-platform compatibility ensures code portability. With strong community support and continuous development, Python has become one of the most popular programming languages, widely used in academia, industry, and open-source projects for tasks ranging from simple scripting to complex software development.

◇Go

Go, also known as Golang, is a statically typed, compiled programming language developed by Google. It emphasizes simplicity, efficiency, and built-in concurrency support. Go features fast compilation, garbage collection, and a robust standard library. Its syntax is clean and concise, promoting easy readability and maintainability. Go’s goroutines and channels provide powerful tools for concurrent programming. The language is particularly well-suited for system programming, network services, and cloud-native applications. Go’s efficient memory usage and quick execution make it popular for building scalable server-side applications and microservices. With its focus on simplicity and performance, Go has gained significant adoption in DevOps tooling, containerization technologies, and cloud infrastructure projects.

2. Operating System

Operating systems (OS) are fundamental software that manage computer hardware and software resources, providing common services for computer programs. They act as an intermediary between applications and hardware, handling tasks like memory management, process scheduling, file system management, and device control. Common desktop operating systems include Microsoft Windows, macOS, and various Linux distributions. Mobile devices typically run iOS or Android. Server environments often use Linux distributions like Ubuntu Server, Red Hat Enterprise Linux, or Windows Server. Each OS type offers distinct features, user interfaces, and compatibility with different software and hardware. Operating systems play a crucial role in system security, performance optimization, and providing a consistent user experience across diverse computing devices and environments.

◇Ubuntu/Debian(Linux)

Ubuntu and Debian are both popular Linux distributions, with Debian serving as the upstream base for Ubuntu. Debian is known for its stability, extensive package repository, and rigorous testing process, making it a favored choice for servers and systems requiring long-term support. Ubuntu, derived from Debian, aims to provide a more user-friendly experience with regular releases, a focus on ease of use, and commercial support options. It features a more streamlined installation process, extensive documentation, and an active community. Both distributions use the Debian package management system (APT) and share many underlying technologies, but Ubuntu emphasizes a more polished desktop experience and rapid release cycle.

◇RHEL/Derivatives(Linux)

Red Hat Enterprise Linux (RHEL) is a popular distribution of the Linux operating system that is designed for enterprise-level use. It is developed and maintained by Red Hat, Inc., and it is available under a subscription-based model. There are several distributions of Linux that are based on RHEL, or that have been derived from RHEL in some way. These distributions are known as RHEL derivatives. Some examples of RHEL derivatives include: AlmaLinux, CentOS, CloudLinux, Oracle Linux, and Scientific Linux. RHEL derivatives are often used in enterprise environments because they offer the stability and reliability of RHEL, but with the added benefit of being free or lower-cost alternatives.

◇FreeBSD(Unix)

FreeBSD is a free, open-source Unix-like operating system descended from the Berkeley Software Distribution (BSD). Known for its stability, performance, and advanced networking capabilities, FreeBSD is popular for server environments, embedded systems, and as a basis for network appliances. It features a monolithic kernel, a comprehensive set of userland utilities, and a ports collection for easy software installation. FreeBSD supports a wide range of architectures and includes advanced features like the ZFS file system, jails for containerization, and the pf packet filter. While less common in desktop environments, it’s widely used in internet infrastructure, storage systems, and by companies requiring a robust, customizable OS with a permissive license.

3. Terminal Knowledge

A terminal is simply a text-based interface to the computer, it is used to interact with your computer system via CLI (command line interface).

Scripting: Bash

Bash (Bourne Again Shell) is a powerful Unix shell and command language interpreter, serving as the default shell for most Linux distributions and macOS. It provides a command-line interface for interacting with the operating system, executing commands, and automating tasks through shell scripts. Bash supports variables, control structures, functions, and command substitution, making it versatile for system administration, DevOps tasks, and general-purpose scripting. Its ability to pipe commands, redirect input/output, and utilize a vast array of built-in commands and utilities makes it an essential tool for developers and system administrators in managing and automating workflows in Unix-like environments.

Editors: Vim, Nano, Emacs

Text editors are software tools used for creating, editing, and managing text files. They range from simple editors with basic features to complex Integrated Development Environments (IDEs). Popular text editors include:

  • Notepad: A basic editor for Windows, suitable for simple text files.

  • Vim: A highly configurable and powerful editor known for its efficiency and modal interface.

  • Emacs: A versatile editor with extensive customization options and a wide range of plugins.

  • Sublime Text: A feature-rich editor with a focus on speed and a user-friendly interface.

  • Visual Studio Code: A modern, open-source editor with built-in support for debugging, extensions, and integration with various development tools.

  • official Vim

  • official GNU Nano

  • official GNU Emacs

  • article Vim Adventures

  • video Vim Tutorial for Beginners

  • video Linux Crash Course - nano

  • video The Absolute Beginner’s Guide to Emacs

◇Process Monitoring

Process monitoring is the continuous observation and analysis of processes within an IT system or organization to ensure optimal performance, efficiency, and compliance. It involves tracking key metrics, resource utilization, and behaviors of individual processes or applications running on a system. This practice helps identify anomalies, bottlenecks, or potential issues before they impact overall system performance or user experience. Process monitoring tools typically provide real-time data on CPU usage, memory consumption, I/O operations, and thread activity. They often include features for alerting, logging, and visualization of process data. In modern IT environments, process monitoring is crucial for maintaining system stability, optimizing resource allocation, troubleshooting performance issues, and supporting capacity planning in complex, distributed systems.

Lsof lists on its standard output file information about files opened by processes.

◇Performance Monitoring

Performance monitoring is the systematic observation and measurement of an IT system’s operational efficiency and effectiveness. It involves collecting, analyzing, and reporting on key performance indicators (KPIs) across various components including applications, networks, servers, and databases. This process uses specialized tools to track metrics such as response time, throughput, resource utilization, and error rates. Performance monitoring helps identify bottlenecks, predict potential issues, and optimize system resources. It’s crucial for maintaining service level agreements (SLAs), ensuring user satisfaction, and supporting capacity planning. In modern IT environments, performance monitoring often incorporates real-time analytics, AI-driven insights, and automated alerting systems, enabling proactive management of complex, distributed systems and supporting continuous improvement in IT operations and service delivery.

◇Networking Tools

Networking tools are essential software utilities used for monitoring, analyzing, troubleshooting, and managing computer networks. They include a wide range of applications such as Wireshark for deep packet analysis, Nmap for network scanning and security auditing, Ping for testing basic connectivity, Traceroute for visualizing network paths, Netstat for displaying network connections, Tcpdump for command-line packet capture, Iperf for performance testing, Netcat for various network operations, Nslookup/Dig for DNS queries, and PuTTY for remote access via SSH or telnet. These tools collectively enable network administrators and security professionals to diagnose issues, optimize performance, conduct security assessments, and maintain the overall health and efficiency of network infrastructures, ranging from small local networks to large-scale enterprise environments.

◇Text Manipulation

Text manipulation tools are utilities or software that enable users to modify, process, and transform text data efficiently. These tools are often used in scripting, data cleaning, and automation tasks. Common text manipulation tools include sed (stream editor) for search and replace, awk for pattern scanning and data extraction, and grep for searching text using regular expressions. Other popular tools include cut, sort, tr, and uniq for various text processing functions. These command-line tools are commonly used in UNIX/Linux environments to handle large text files, automate workflows, and perform complex text transformations.

4. Version Control Systems

Version control systems (VCS) are tools that track changes to code and files over time, enabling multiple users to collaborate on projects, maintain history, and manage different versions of codebases. They help in tracking modifications, merging changes, and resolving conflicts. There are two main types of VCS: centralized and distributed. Centralized systems (like Subversion and CVS) rely on a single central repository, while distributed systems (like Git and Mercurial) allow each user to have a complete copy of the repository, including its history. Distributed VCSs, such as Git, are particularly popular for their flexibility, branching capabilities, and robust support for collaborative workflows.

◇Git

Git is a distributed version control system designed to track changes in source code during software development. It allows multiple developers to work on the same project simultaneously, maintaining a complete history of modifications. Git features local repositories on each developer’s machine, enabling offline work and fast operations. It supports non-linear development through branching and merging, facilitating parallel work streams. Git’s distributed nature enhances collaboration, backup, and experimentation. Key concepts include commits, branches, merges, and remote repositories. With its speed, flexibility, and robust branching and merging capabilities, Git has become the standard for version control in modern software development, powering platforms like GitHub and GitLab.

5. VCS Hosting

When working on a team, you often need a remote place to put your code so others can access it, create their own branches, and create or review pull requests. These services often include issue tracking, code review, and continuous integration features. A few popular choices are GitHub, GitLab, BitBucket, and AWS CodeCommit.

◇GitHub

GitHub is a web-based platform for version control and collaboration using Git. It provides cloud-based Git repository hosting, offering features like bug tracking, task management, and project wikis. GitHub facilitates code review through pull requests, supports issue tracking, and enables social coding with features like forking and starring repositories. It offers both public and private repositories, making it popular for open-source projects and private development. GitHub’s ecosystem includes integrations with various development tools and CI/CD platforms. With features like GitHub Actions for automation, GitHub Packages for package management, and GitHub Pages for web hosting, it serves as a comprehensive platform for software development workflows, fostering collaboration among developers worldwide.

◇GitLab

GitLab is a web-based DevOps lifecycle tool that provides a Git repository manager with wiki, issue tracking, and CI/CD pipeline features. It offers a complete DevOps platform, delivered as a single application, covering the entire software development lifecycle from planning to monitoring. GitLab supports both cloud-hosted and self-hosted options, catering to various organizational needs. Key features include integrated CI/CD, container registry, package registry, and security scanning tools. It emphasizes innersource methodologies, allowing teams to collaborate more effectively within an organization. GitLab’s built-in DevOps capabilities, coupled with its focus on a single, integrated platform, make it popular for organizations seeking to streamline their development processes and implement DevOps practices efficiently.

6. Containers

Containers are lightweight, portable, and isolated environments that package applications and their dependencies, enabling consistent deployment across different computing environments. They encapsulate software code, runtime, system tools, libraries, and settings, ensuring that the application runs the same regardless of where it’s deployed. Containers share the host operating system’s kernel, making them more efficient than traditional virtual machines. Popular containerization platforms like Docker provide tools for creating, distributing, and running containers. This technology supports microservices architectures, simplifies application deployment, improves scalability, and enhances DevOps practices by streamlining the development-to-production pipeline and enabling more efficient resource utilization.

◇Docker

Docker is an open-source platform that automates the deployment, scaling, and management of applications using containerization technology. It enables developers to package applications with all their dependencies into standardized units called containers, ensuring consistent behavior across different environments. Docker provides a lightweight alternative to full machine virtualization, using OS-level virtualization to run multiple isolated systems on a single host. Its ecosystem includes tools for building, sharing, and running containers, such as Docker Engine, Docker Hub, and Docker Compose. Docker has become integral to modern DevOps practices, facilitating microservices architectures, continuous integration/deployment pipelines, and efficient resource utilization in both development and production environments.

7. Web Services & Proxy Configuration

Learn how to setup:

  • Forward Proxy
  • Reverse Proxy
  • Load Balancer
  • Firewall
  • Caching Server
  • Web Server

◇Forward Proxy

A forward proxy, often simply called a proxy, is a server that sits between client devices and the internet, forwarding requests from clients to web servers. It acts on behalf of clients, potentially providing benefits like anonymity, security, and access control. Forward proxies can cache frequently requested content, filter web traffic, bypass geographical restrictions, and log user activity. They’re commonly used in corporate networks to enforce internet usage policies, enhance security by hiding internal network details, and improve performance through caching. Unlike reverse proxies, which serve resources on behalf of servers, forward proxies primarily serve client-side needs, acting as an intermediary for outbound requests to the wider internet.

◇Reverse Proxy

A reverse proxy is a server that sits between client devices and backend servers, intercepting requests from clients and forwarding them to appropriate backend servers. It acts on behalf of the servers, providing benefits such as load balancing, caching, SSL termination, and security. Reverse proxies can distribute incoming traffic across multiple servers to improve performance and reliability, cache frequently requested content to reduce server load, handle SSL encryption and decryption to offload this task from backend servers, and provide an additional layer of security by hiding server details. Common uses include improving web application performance, enabling microservices architectures, and enhancing security in web hosting environments. Popular reverse proxy software includes NGINX, HAProxy, and Apache with mod_proxy.

◇Caching Server

A caching server, also known as a proxy server or cache server, is a dedicated network server that saves web pages and other Internet content locally to reduce bandwidth usage, server load, and perceived lag. It works by intercepting requests from clients, saving the responses from web servers, and serving cached content to subsequent requests for the same information. Caching servers can significantly improve response times and reduce network traffic, especially for frequently accessed content. They are commonly used in content delivery networks (CDNs), enterprise networks, and Internet service providers to optimize performance, reduce costs, and enhance user experience by serving content from a location closer to the end-user.

◇Firewall

Firewall is a network security device that monitors and filters incoming and outgoing network traffic based on an organization’s previously established security policies. It is a barrier that sits between a private internal network and the public Internet. A firewall’s main purpose is to allow non-threatening traffic in and to keep dangerous traffic out.

◇Load Balancer

Load Balancer acts as the traffic cop sitting in front of your servers and routing client requests across all servers capable of fulfilling those requests in a manner that maximizes speed and capacity utilization and ensures that no one server is overworked. If one of the servers goes down, the load balancer redirects traffic to the remaining online servers.

◇Nginx(Web Servers)

NGINX is a high-performance, open-source web server, reverse proxy, and load balancer. Known for its efficiency in handling concurrent connections, NGINX uses an event-driven, asynchronous architecture that consumes minimal resources. It excels at serving static content, proxying requests to application servers, and load balancing across multiple backends. NGINX is widely used for its ability to improve website performance, scalability, and security. It supports various protocols including HTTP, HTTPS, SMTP, and WebSocket, and offers features like SSL/TLS termination, caching, and content compression. Popular in both small-scale and enterprise environments, NGINX is a key component in many modern web architectures, particularly in microservices and containerized deployments.

8. Networking & Protocols

Networking protocols are standardized rules and procedures that govern how data is transmitted, received, and interpreted across computer networks. They define the format, timing, sequencing, and error control in data communication. Key protocols include:

◇FTP/SFTP(File Transfer)

FTP (File Transfer Protocol) is a standard network protocol used for transferring files between a client and a server on a computer network. It operates on a client-server model, typically using separate control and data connections between the client and server. FTP allows users to upload, download, and manage files on remote systems, supporting both authenticated and anonymous access. While widely used for its simplicity and compatibility, FTP has security limitations as it transmits data and credentials in plain text. As a result, more secure alternatives like SFTP (SSH File Transfer Protocol) and FTPS (FTP Secure) have gained popularity for sensitive data transfers. Despite its age, FTP remains in use for various file transfer needs, especially in legacy systems and where security is less critical.

SFTP (SSH File Transfer Protocol) is a secure file transfer protocol that provides file access, transfer, and management functionalities over a secure shell (SSH) data stream. It’s designed as an extension of SSH to offer secure file transfer capabilities. SFTP encrypts both commands and data in transit, protecting against eavesdropping and man-in-the-middle attacks. Unlike traditional FTP, SFTP uses a single connection and doesn’t separate control and data channels. It supports features like resuming interrupted transfers, directory listings, and remote file removal. SFTP is widely used in enterprise environments for secure file transfers, automated scripts, and as a more secure alternative to FTP. Its integration with SSH makes it a preferred choice for system administrators and developers working with remote systems securely.

◇DNS(Domain Services)

DNS (Domain Name System) is a hierarchical, decentralized naming system for computers, services, or other resources connected to the Internet or a private network. It translates human-readable domain names (like www.example.com) into IP addresses (like 192.0.2.1) that computers use to identify each other on the network. DNS serves as the internet’s phone book, enabling users to access websites using easy-to-remember names instead of numerical IP addresses. The system comprises DNS servers, resolvers, and records (like A, CNAME, MX), working together to route internet traffic efficiently. DNS is crucial for internet functionality, affecting everything from web browsing and email to load balancing and service discovery in modern cloud architectures.

◇HTTP(Web Communication)

HTTP (Hypertext Transfer Protocol) is the foundation of data communication on the World Wide Web. It’s an application-layer protocol that enables the transfer of various types of data, primarily web pages and their components, between clients (usually web browsers) and servers. HTTP operates on a request-response model, where clients send requests for resources, and servers respond with the requested data or error messages. It’s stateless by design, meaning each request is independent of previous ones. HTTP supports various methods (GET, POST, PUT, DELETE, etc.) for different types of operations. While originally designed for plain-text transmission, HTTPS, its secure version using encryption, is now widely adopted to protect data in transit.

◇HTTPS(Web Communication)

HTTPS (Hypertext Transfer Protocol Secure) is the secure version of HTTP, encrypting data exchanged between a client and a server. It uses SSL/TLS protocols to provide authentication, data integrity, and confidentiality. HTTPS prevents eavesdropping, tampering, and man-in-the-middle attacks by encrypting all communications. It uses digital certificates to verify the identity of websites, enhancing trust and security. HTTPS is crucial for protecting sensitive information like login credentials and financial data. It has become the standard for secure web communication, with major browsers marking non-HTTPS sites as “not secure.” HTTPS also provides SEO benefits and is essential for many modern web features and progressive web applications.

◇SSL/TLS(Web Communication)

Secure Sockets Layer (SSL) and Transport Layer Security (TLS) are cryptographic protocols used to provide security in internet communications. These protocols encrypt the data that is transmitted over the web, so anyone who tries to intercept packets will not be able to interpret the data. One difference that is important to know is that SSL is now deprecated due to security flaws, and most modern web browsers no longer support it. But TLS is still secure and widely supported, so preferably use TLS.

◇SSH(Web Communication)

SSH (Secure Shell) is a cryptographic network protocol used to securely access and manage remote machines over an unsecured network. It provides encrypted communication, ensuring confidentiality and integrity, and allows for secure file transfers, command execution, and tunneling. SSH is widely used for remote administration of servers, cloud infrastructure, and networking devices, typically employing key-based authentication or passwords. Tools like OpenSSH are commonly used to establish SSH connections, providing a secure alternative to older, less secure protocols like Telnet.

◇OSI Model

The OSI (Open Systems Interconnection) model is a conceptual framework that standardizes the functions of a telecommunication or computing system into seven abstraction layers. These layers, from bottom to top, are: Physical, Data Link, Network, Transport, Session, Presentation, and Application. Each layer serves a specific purpose in the process of data communication, with lower layers handling more hardware-oriented tasks and upper layers dealing with software and user-interface aspects. The model helps in understanding how data moves through a network, troubleshooting network issues, and designing network protocols and hardware. While not strictly adhered to in real-world implementations, the OSI model remains a valuable educational tool and reference point for network engineers and developers, providing a common language for discussing network operations and architecture.

◇White/Grey Listing(Email Protocols)

Whitelisting involves creating a list of trusted entities (such as IP addresses, email addresses, or applications) that are explicitly allowed to access a system or send messages. Anything not on the whitelist is denied by default. Whitelisting offers a high level of security by limiting access to only known and approved entities, but it can be inflexible and require frequent updates to accommodate legitimate changes. Greylisting is a more flexible approach used primarily in email filtering. When an email is received from an unknown sender, the server temporarily rejects it with a “try again later” response. Legitimate mail servers will retry sending the email after a short delay, while spammers, which often do not retry, are blocked. This method reduces spam by taking advantage of the fact that spammers usually do not follow retry mechanisms. Greylisting can be less intrusive than whitelisting, but it may introduce slight delays in email delivery for first-time senders.

◇SMTP(Email Protocols)

Email is emerging as one of the most valuable services on the internet today. Most internet systems use SMTP as a method to transfer mail from one user to another. SMTP is a push protocol and is used to send the mail whereas POP (post office protocol) or IMAP (internet message access protocol) are used to retrieve those emails at the receiver’s side. SMTP is an application layer protocol. The client who wants to send the mail opens a TCP connection to the SMTP server and then sends the mail across the connection. The SMTP server is an always-on listening mode. As soon as it listens for a TCP connection from any client, the SMTP process initiates a connection through port 25. After successfully establishing a TCP connection the client process sends the mail instantly.

◇DMARC(Email Protocols)

DMARC (Domain-based Message Authentication, Reporting, and Conformance) is an email authentication protocol that builds upon SPF and DKIM to protect against email spoofing and phishing attacks. It allows domain owners to specify how email receivers should handle messages that fail authentication checks. DMARC provides a feedback mechanism for domain owners to receive reports on email authentication results, helping them monitor and improve their email security. By implementing DMARC policies, organizations can enhance their email deliverability, protect their brand reputation, and reduce the likelihood of their domain being used in fraudulent email campaigns. DMARC is widely adopted by major email providers and is considered a crucial component of modern email security strategies.

◇IMAP(Email Protocols)

IMAP (Internet Message Access Protocol) is a standard email protocol that allows email clients to access messages stored on a mail server. Unlike POP3, IMAP keeps emails on the server, enabling access from multiple devices while maintaining synchronization. It supports folder structures, message flagging, and partial message retrieval, making it efficient for managing large volumes of emails. IMAP allows users to search server-side, reducing bandwidth usage. It’s designed for long-term mail storage on the server, ideal for users who need to access their email from various devices or locations. IMAP’s synchronization capabilities and server-side management features make it the preferred protocol for most modern email systems, especially in mobile and multi-device environments.

◇SPF(Email Protocols)

Sender Policy Framework (SPF) is used to authenticate the sender of an email. With an SPF record in place, Internet Service Providers can verify that a mail server is authorized to send email for a specific domain. An SPF record is a DNS TXT record containing a list of the IP addresses that are allowed to send email on behalf of your domain.

◇POP3S(Email Protocols)

POP3 (port 110) or POP3s (port 995) stands for The Post Office Protocol. It’s an Internet standard protocol used by local email software clients to retrieve emails from a remote mail server over a TCP/IP connection. Email servers hosted by Internet service providers also use POP3 to receive and hold emails intended for their subscribers. Periodically, these subscribers will use email client software to check their mailbox on the remote server and download any emails addressed to them. Once the email client has downloaded the emails, they are usually deleted from the server, although some email clients allow users to specify that mails be copied or saved on the server for a period of time.

◇Domain Keys(Email Protocols)

DomainKeys is an email authentication method designed to verify the domain of an email sender and ensure message integrity. Developed by Yahoo, it was a precursor to DKIM (DomainKeys Identified Mail). DomainKeys uses public key cryptography to allow email systems to verify that a message was sent by an authorized sender and hasn’t been tampered with in transit. The sending server signs outgoing emails with a private key, and receiving servers can verify the signature using a public key published in the sender’s DNS records. While largely superseded by DKIM, DomainKeys played a crucial role in the evolution of email authentication techniques aimed at combating email spoofing and phishing attacks.

9. Cloud Providers

Cloud providers provide a layer of APIs to abstract infrastructure and provision it based on security and billing boundaries. The cloud runs on servers in data centers, but the abstractions cleverly give the appearance of interacting with a single “platform” or large application. The ability to quickly provision, configure, and secure resources with cloud providers has been key to both the tremendous success and complexity of modern DevOps.

◇AWS

Amazon Web Services has been the market leading cloud computing platform since 2011, ahead of Azure and Google Cloud. AWS offers over 200 services with data centers located all over the globe. AWS service is an online platform that provides scalable and cost-effective cloud computing solutions. It is broadly adopted cloud platform that offers several on-demand operations like compute power, database storage, content delivery and so on.

◇Azure

Microsoft Azure is a comprehensive cloud computing platform offering a wide array of services for building, deploying, and managing applications through Microsoft-managed data centers. It provides Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS) solutions, supporting various programming languages, tools, and frameworks, including both Microsoft-specific and third-party systems. Azure’s services span computing, analytics, storage, networking, and more, enabling businesses to scale and transform their operations, leverage AI and machine learning, and implement robust security measures, all while potentially reducing IT costs through its pay-as-you-go pricing model.

◇Google Cloud

Google Cloud is Google’s cloud computing service offering, providing over 150 products/services to choose from. It consists of a set of physical assets, such as computers and hard disk drives, and virtual resources, such as virtual machines(VMs), that are contained in Google’s data centers. It runs on the same infrastructure that Google uses internally for its end-user products, such as Search, Gmail, Google Drive, and YouTube.

10. Serverless

Serverless is a cloud-computing execution model where the cloud provider dynamically manages the infrastructure, allowing developers to focus solely on writing code. In this model, resources are automatically allocated and scaled based on demand, and billing is based on actual usage rather than pre-purchased capacity. Serverless architectures are often used for event-driven workloads and microservices, improving development efficiency and reducing operational overhead. Popular platforms for serverless computing include AWS Lambda, Azure Functions, and Google Cloud Functions.

◇AWS Lambda

AWS Lambda is a serverless compute service that allows you to run code without provisioning or managing servers. It automatically scales, executes code in response to triggers, and charges only for the compute time consumed. Lambda supports multiple programming languages and integrates seamlessly with other AWS services, making it ideal for building microservices, automating tasks, and processing data streams with minimal operational overhead.

◇Cloudflare

Cloudflare is a internet company that provides a range of services to help protect and accelerate websites and applications. At its core, Cloudflare is a content delivery network (CDN) and a reverse proxy cloud provider. This means that it acts as an intermediary between a website’s origin server and its visitors, caching content and filtering out malicious traffic. Cloudflare was founded in July 2009 by Matthew Prince, Lee Holloway, and Michelle Zatlyn. The company was venture-capital funded and submitted its S-1 filing for IPO on the New York Stock Exchange in August 2019. It opened for public trading on September 13, 2019, at $15 per share.

11. Provisioning

Provisioning refers to the process of setting up and configuring the necessary IT infrastructure to support an application or service. This includes allocating and preparing resources such as servers, storage, networking, and software environments. Provisioning can be done manually, but in modern DevOps practices, it’s typically automated using tools like Terraform, Pulumi, or CloudFormation. These tools allow for infrastructure-as-code, where the entire provisioning process is defined in version-controlled scripts or templates. This approach enables consistent, repeatable deployments across different environments, reduces human error, and facilitates rapid scaling and disaster recovery.

◇Terraform

Terraform is an open-source infrastructure as code (IaC) tool developed by HashiCorp, used to define, provision, and manage cloud and on-premises infrastructure using declarative configuration files. It supports multiple cloud providers like AWS, Azure, and Google Cloud, as well as various services and platforms, enabling infrastructure automation across diverse environments. Terraform’s state management and modular structure allow for efficient scaling, reusability, and version control of infrastructure. It is widely used for automating infrastructure provisioning, reducing manual errors, and improving infrastructure consistency and repeatability.

12. Configuration Management

Configuration management is a systems engineering process for establishing consistency of a product’s attributes throughout its life. In the technology world, configuration management is an IT management process that tracks individual configuration items of an IT system. IT systems are composed of IT assets that vary in granularity. An IT asset may represent a piece of software, or a server, or a cluster of servers. The following focuses on configuration management as it directly applies to IT software assets and software asset CI/CD. Software configuration management is a systems engineering process that tracks and monitors changes to a software systems configuration metadata. In software development, configuration management is commonly used alongside version control and CI/CD infrastructure. This post focuses on its modern application and use in agile CI/CD software environments.

◇Ansible

Ansible is an open-source automation tool used for configuration management, application deployment, and task automation. It simplifies the process of managing and orchestrating infrastructure by using a declarative language to define desired states and configurations. Ansible operates using YAML files, called playbooks, which describe the tasks to be executed on remote systems. It employs an agentless architecture, meaning it uses SSH or other remote protocols to execute tasks on target machines without requiring additional software to be installed. Ansible is widely used for automating repetitive tasks, ensuring consistency, and managing large-scale deployments across various environments.

13. CI/CD Tools

CI/CD is a method to frequently deliver apps to customers by introducing automation into the stages of app development. The main concepts attributed to CI/CD are continuous integration, continuous delivery, and continuous deployment. CI/CD is a solution to the problems integrating new code can cause for development and operations teams. Specifically, CI/CD introduces ongoing automation and continuous monitoring throughout the lifecycle of apps, from integration and testing phases to delivery and deployment. Taken together, these connected practices are often referred to as a “CI/CD pipeline” and are supported by development and operations teams working together in an agile way with either a DevOps or site reliability engineering (SRE) approach.

◇CircleCI

CircleCI is a popular continuous integration and continuous delivery (CI/CD) platform that automates the build, test, and deployment processes of software projects. It supports a wide range of programming languages and integrates with various version control systems, primarily GitHub and Bitbucket. CircleCI uses a YAML configuration file to define pipelines, allowing developers to specify complex workflows, parallel job execution, and custom environments. It offers features like caching, artifact storage, and Docker layer caching to speed up builds. With its cloud-based and self-hosted options, CircleCI provides scalable solutions for projects of all sizes, helping teams improve code quality, accelerate release cycles, and streamline their development workflows.

◇GitLab CI

GitLab CI is an integrated continuous integration and delivery platform within the GitLab ecosystem. It automates the process of building, testing, and deploying code changes through pipelines defined in YAML files. GitLab CI offers features like parallel execution, container registry integration, and auto-DevOps, enabling teams to implement robust CI/CD workflows directly from their GitLab repositories without additional tools or infrastructure.

◇GitHub Actions

GitHub Actions is a continuous integration and continuous delivery (CI/CD) platform integrated directly into GitHub repositories. It allows developers to automate software workflows, including building, testing, and deploying applications. Actions are defined in YAML files and triggered by various GitHub events such as pushes, pull requests, or scheduled tasks. The platform provides a marketplace of pre-built actions and supports custom actions. GitHub Actions offers matrix builds, parallel job execution, and supports multiple operating systems and languages. It integrates seamlessly with GitHub’s ecosystem, facilitating automated code review, issue tracking, and project management. This tool enables developers to implement DevOps practices efficiently within their GitHub workflow, enhancing productivity and code quality.

14. Secret Management

Secret management refers to the secure handling, storage, and distribution of sensitive information such as passwords, API keys, and certificates within an organization’s IT infrastructure. It involves using specialized tools and practices to protect secrets from unauthorized access while ensuring they are available to authorized systems and users when needed. Secret management solutions typically offer features like encryption at rest and in transit, access controls, auditing, rotation policies, and integration with various platforms and services. These systems aim to centralize secret storage, reduce the risk of exposure, automate secret lifecycle management, and provide seamless integration with applications and DevOps workflows. Effective secret management is crucial for maintaining security, compliance, and operational efficiency in modern, complex IT environments.

◇Vault

HashiCorp Vault is a tool designed for securely managing secrets and protecting sensitive data, such as passwords, API keys, and encryption keys. It provides centralized secrets management, access control, and auditing features. Vault supports various authentication methods and dynamic secrets, allowing it to generate secrets on-the-fly and manage their lifecycle. It also offers robust encryption capabilities, both for data at rest and in transit. Vault is widely used in DevOps environments to ensure secure and scalable management of sensitive information, integrating with various infrastructure and application platforms.

15. Infrastructure Monitoring

Monitoring refers to the practice of making the performance and status of infrastructure visible. This section contains common tools used for monitoring. This is a very vendor-heavy space - use caution when studying materials exclusively from a given product or project, as there are many conflicting opinions and strategies in use. There is no single solution for the most substantially complex internet-facing applications, so understanding the pros and cons of these tools will be useful in helping you plan how to monitor a system for a given goal.

◇Prometheus

Prometheus is an open-source systems monitoring and alerting toolkit designed for reliability and scalability. It features a multi-dimensional data model, a flexible query language (PromQL), and an efficient time series database. Prometheus collects metrics from configured targets at given intervals, evaluates rule expressions, displays results, and can trigger alerts when specified conditions are observed. It operates on a pull model, scraping metrics from HTTP endpoints, and supports service discovery for dynamic environments. Prometheus is particularly well-suited for monitoring microservices and containerized environments, integrating seamlessly with systems like Kubernetes. Its ecosystem includes various exporters for third-party systems and a built-in alert manager. Widely adopted in cloud-native architectures, Prometheus is a core component of modern observability stacks, often used alongside tools like Grafana for visualization.

◇Grafana

Grafana is an open-source analytics and interactive visualization web application. It connects to various data sources, including time-series databases, relational databases, and cloud services, to create customizable dashboards. Grafana excels at visualizing time-series data for infrastructure and application analytics, supporting a wide range of chart types and plugins. It features alerting capabilities, user authentication, and role-based access control. Grafana is commonly used for monitoring system metrics, application performance, and business analytics. Its flexibility and ability to combine data from multiple sources make it popular in DevOps environments for creating comprehensive monitoring solutions. Grafana’s user-friendly interface and extensive customization options enable users to create visually appealing and informative dashboards for real-time data visualization and analysis.

official Grafana official Grafana Webinars and Videos video Server Monitoring // Prometheus and Grafana Tutorial feed Explore top posts about Grafana

◇Datadog

Datadog is a monitoring and analytics platform for large-scale applications. It encompasses infrastructure monitoring, application performance monitoring, log management, and user-experience monitoring. Datadog aggregates data across your entire stack with 400+ integrations for troubleshooting, alerting, and graphing.

16. Log Management

Log management is the process of handling log events generated by all software applications and infrastructure on which they run. It involves log collection, aggregation, parsing, storage, analysis, search, archiving, and disposal, with the ultimate goal of using the data for troubleshooting and gaining business insights, while also ensuring the compliance and security of applications and infrastructure.

official Log Management: What DevOps Teams Need to Know article Introduction to Logs Management article Logging for Kubernetes: What to Log and How to Log It

◇Loki

Loki is a horizontally-scalable, highly-available, multi-tenant log aggregation system designed by Grafana Labs. It’s purpose-built to be cost-effective and easy to operate, making it particularly well-suited for storing and querying logs from Kubernetes clusters. Loki indexes metadata about logs rather than the full text, which allows it to be more resource-efficient than traditional log management systems. It uses the same querying language as Prometheus (LogQL), making it easier for users familiar with Prometheus to adopt. Loki integrates seamlessly with Grafana for visualization and is often used alongside Prometheus and Grafana in cloud-native observability stacks. Its design focuses on simplicity, making it an attractive option for organizations looking for efficient log management in containerized environments.

official Loki official Loki Documentation video Meet Grafana LOKI, a Log Aggregation System for EVERYTHING feed Explore top posts about Grafana

◇Elastic Stack

The Elastic Stack, formerly known as ELK Stack, is a set of open-source tools for searching, analyzing, and visualizing data in real-time. It consists of four main components: Elasticsearch (a distributed search and analytics engine), Logstash (a data processing pipeline), Kibana (a data visualization and management tool), and Beats (lightweight data shippers). Together, these tools enable users to collect data from various sources, process and enrich it, store it in a searchable format, and create interactive visualizations and dashboards. The Elastic Stack is widely used for log analytics, application performance monitoring, security information and event management (SIEM), and business intelligence applications, offering scalability and flexibility for handling large volumes of diverse data.

17. Container Orchestration

Container orchestration is the process of managing and automating the lifecycle of containers, including their deployment, scaling, and networking across multiple hosts. It is a critical technology for running complex containerized applications in production environments.

By leveraging tools like Kubernetes, Docker Swarm, and Apache Mesos, organizations can ensure high availability, scalability, and reliability for their applications. Container orchestration simplifies operations by automating routine tasks and providing a robust foundation for microservices, cloud-native development, and DevOps practices.

◇Kubernetes

Kubernetes is an open source container management platform, and the dominant product in this space. Using Kubernetes, teams can deploy images across multiple underlying hosts, defining their desired availability, deployment logic, and scaling logic in YAML. Kubernetes evolved from Borg, an internal Google platform used to provision and allocate compute resources (similar to the Autopilot and Aquaman systems of Microsoft Azure). The popularity of Kubernetes has made it an increasingly important skill for the DevOps Engineer and has triggered the creation of Platform teams across the industry. These Platform engineering teams often exist with the sole purpose of making Kubernetes approachable and usable for their product development colleagues.

18. Application Monitoring

Application monitoring involves the continuous observation and analysis of software applications to ensure they perform optimally, identify issues, and provide insights into their operation. This process includes tracking metrics such as response times, error rates, resource utilization (CPU, memory, and disk), and transaction performance. Application monitoring tools collect and analyze data to detect anomalies, provide alerts for potential problems, and offer detailed insights into application behavior and performance. By monitoring applications, organizations can proactively address issues, optimize performance, and improve user experience, ultimately ensuring reliability and efficiency in their software systems.

◇OpenTelemetry

OpenTelemetry is an open-source observability framework for cloud-native software, providing a standardized way to collect and export telemetry data such as metrics, logs, and traces. It aims to make observability a built-in feature of cloud-native applications by offering a vendor-neutral, unified set of APIs, libraries, agents, and instrumentation. OpenTelemetry simplifies the implementation of observability across different languages and platforms, enabling developers to instrument their code once and send data to multiple backends. It supports automatic instrumentation for many popular frameworks and libraries, reducing the effort required to add observability to applications. By providing a consistent approach to data collection and export, OpenTelemetry facilitates better interoperability between observability tools and platforms in modern, distributed software environments.

19. Artifact Management

In software development, artifacts are various outputs produced throughout the development lifecycle, including source code, binaries, documentation, configuration files, build outputs, and test results. These artifacts are essential for managing, deploying, and maintaining applications, as they provide the necessary resources and documentation for development, testing, and production environments. They help track the progress of a project, ensure consistency, and facilitate the efficient delivery and operation of software systems.

◇Artifactory

Artifactory is a universal DevOps solution for hosting, managing, and distributing binaries and artifacts. Any type of software in binary form – such as application installers, container images, libraries, configuration files, etc. – can be curated, secured, stored, and delivered using Artifactory. The name “Artifactory” reflects the fact that it can host any type of “artifact” needed in your software development “factory.” In software development, an artifact is any object produced during the software development and delivery process. Artifacts include the files used to install and run applications, as well as any complementary information necessary to configure or manage software. Artifactory serves as the central hub for your DevOps processes. All artifacts, dependencies, packages, etc. ultimately get put into and pulled from Artifactory.

20. GitOps

GitOps is a paradigm for managing infrastructure and application deployments using Git as the single source of truth. It extends DevOps practices by using Git repositories to store declarative descriptions of infrastructure and applications. Changes to the desired state are made through pull requests, which trigger automated processes to align the actual state with the desired state. GitOps relies on continuous deployment tools that automatically reconcile the live system with the desired state defined in Git. This approach provides benefits such as version control for infrastructure, improved auditability, easier rollbacks, and enhanced collaboration. GitOps is particularly well-suited for cloud-native applications and Kubernetes environments, offering a streamlined method for managing complex, distributed systems.

◇ArgoCD

Argo CD is a continuous delivery tool for Kubernetes that is based on the GitOps methodology. It is used to automate the deployment and management of cloud-native applications by continuously synchronizing the desired application state with the actual application state in the production environment. In an Argo CD workflow, changes to the application are made by committing code or configuration changes to a Git repository. Argo CD monitors the repository and automatically deploys the changes to the production environment using a continuous delivery pipeline. The pipeline is triggered by changes to the Git repository and is responsible for building, testing, and deploying the changes to the production environment.Argo CD is designed to be a simple and efficient way to manage cloud-native applications, as it allows developers to make changes to the system using familiar tools and processes and it provides a clear and auditable history of all changes to the system. It is often used in conjunction with tools such as Helm to automate the deployment and management of cloud-native applications.

21. Service Mesh

A service mesh is a dedicated infrastructure layer that manages communication between microservices in a distributed application. It provides features like load balancing, service discovery, encryption, observability, and traffic management, allowing services to communicate securely and efficiently. By abstracting network-related concerns from the application code, a service mesh enhances reliability and security while simplifying the management of microservice interactions. Popular service mesh implementations include Istio, Linkerd, and Consul.

◇Istio

Istio is an open source service mesh platform that provides a way to control how microservices share data with one another. It includes APIs that let Istio integrate into any logging platform, telemetry, or policy system. Istio is designed to run in a variety of environments: on-premise, cloud-hosted, in Kubernetes containers, in services running on virtual machines, and more.

◇Consul

Consul is a service mesh solution providing a full featured control plane with service discovery, configuration, and segmentation functionality. Each of these features can be used individually as needed, or they can be used together to build a full service mesh. Consul requires a data plane and supports both a proxy and native integration model. Consul ships with a simple built-in proxy so that everything works out of the box, but also supports 3rd party proxy integrations such as Envoy.

22. Cloud Design Patterns

Cloud design patterns are reusable solutions to common problems encountered in cloud computing architectures. These patterns address challenges related to scalability, reliability, security, and performance in distributed systems. They provide best practices for designing and implementing cloud-native applications, covering aspects such as data management, messaging, resiliency, and deployment. Examples include the Circuit Breaker pattern for handling faults, the CQRS pattern for separating read and write operations, and the Sidecar pattern for deploying components of an application into a separate process or container. By leveraging these patterns, developers can create more robust, efficient, and maintainable cloud applications that better utilize the benefits of cloud platforms.

◇Availability

Availability is the percentage of time that a system is functional and working as intended, generally referred to as uptime. Availability can be affected by hardware or software errors, infrastructure problems, malicious attacks, and system load. Many cloud providers typically offer their users a service level agreement (SLA) that specifies the exact percentages of promised uptime/downtime. Availability is related to reliability in this sense. For example, a company might promise 99.99% uptime for their services.

◇Data Management

Data management is the key element of cloud applications, and influences most of the quality attributes. Data is typically hosted in different locations and across multiple servers for reasons such as performance, scalability or availability, and this can present a range of challenges. For example, data consistency must be maintained, and data will typically need to be synchronized across different locations.Additionally data should be protected at rest, in transit, and via authorized access mechanisms to maintain security assurances of confidentiality, integrity, and availability. Refer to the Azure Security Benchmark Data Protection Control for more information.

◇Design and Implementation

Good design encompasses factors such as consistency and coherence in component design and deployment, maintainability to simplify administration and development, and reusability to allow components and subsystems to be used in other applications and in other scenarios. Decisions made during the design and implementation phase have a huge impact on the quality and the total cost of ownership of cloud hosted applications and services.

◇Management and Monitoring

DevOps management and monitoring entails overseeing the entire development process from planning, development, integration and testing, deployment, and operations. It involves a complete and real-time view of the status of applications, services, and infrastructure in the production environment. Features such as real-time streaming, historical replay, and visualizations are critical components of application and service monitoring.