Python for DevOps: The Complete Roadmap (No Fluff, Just What Works)
Complete roadmap for learning Python for Devops
A lot of people reach out to me with one query: how to learn Python for Devops. How much to learn and what to learn.
I am writing this to answer all these questions for once and all.
If you are planning to learn Azure devops with industry-grade real-world projects, then my upcoming Azure Devops bootcamp is for you.
16-Week Real World Project-based Azure DevOps Bootcamp
After teaching 1000+ DevOps engineers, I’ve noticed something: most Python tutorials teach you to build calculators and games. But in DevOps, you’re automating infrastructure, parsing logs, and making API calls to AWS.
This roadmap focuses on what actually matters. Let’s dive in.
Week 1-2: Foundation (Don’t Skip This)
1. Python Syntax Basics
What to learn:
Variables, operators, basic math
Print statements and input
Comments (trust me, future you will thank present you)
Indentation (Python is picky about this)
Why it matters in DevOps: Every script starts here. You’ll write hundreds of automation scripts in your career. Get the basics right.
Quick win exercise: Write a script that asks for a server name and environment (dev/staging/prod) and prints: “Deploying to server-name in environment”
2. Data Structures (This Is Where DevOps Gets Real)
Strings
Learn:
.split(),.join(),.strip(),.replace()f-strings:
f"Server {name} is {status}"Multi-line strings with triple quotes
Real DevOps use:
Parsing log files (you’ll do this daily)
Building dynamic Terraform files
Formatting kubectl output
Creating Ansible playbooks
Example:
log_line = "2024-02-16 ERROR user login failed"
parts = log_line.split()
timestamp = parts[0]
level = parts[1]
Lists
Learn:
Creating, accessing, slicing lists
.append(),.extend(),.remove()List comprehensions:
[x for x in list if condition]
Real DevOps use:
Storing multiple EC2 instance IDs
Batch operations on servers
Managing Docker container names
Processing multiple CloudWatch alarms
Example:
servers = ['web-1', 'web-2', 'db-1']
web_servers = [s for s in servers if 'web' in s]
Dictionaries (Your Best Friend in DevOps)
Learn:
Creating dicts:
server = {'name': 'web-1', 'ip': '10.0.1.5'}.get(),.keys(),.values(),.items()Nested dictionaries
Dict comprehensions
Real DevOps use:
90% of AWS API responses are dictionaries
Kubernetes manifest parsing
Configuration management
Storing server metadata
Example:
ec2_instance = {
'InstanceId': 'i-1234567',
'State': {'Name': 'running'},
'Tags': [{'Key': 'Name', 'Value': 'web-server'}]
}
Tuples
Learn:
Immutable sequences
When to use tuple vs list
Unpacking:
name, age = ('John', 30)
Real DevOps use:
Database query results
Returning multiple values from functions
Fixed configuration data
Sets
Learn:
Unique elements only
.union(),.intersection(),.difference()
Real DevOps use:
Finding unique IPs in logs
Comparing security groups across environments
Detecting configuration drift between prod and staging
Example:
prod_ips = {'10.0.1.5', '10.0.1.6', '10.0.1.7'}
staging_ips = {'10.0.1.5', '10.0.1.8'}
only_in_prod = prod_ips - staging_ips
Practice project: Parse an nginx access log file. Count requests per IP address. Store in a dictionary. Print top 5 IPs.
Week 3: Type Conversion & Data Manipulation
3. Working with Different Data Types
Learn:
int(),str(),float(),list(),dict()type()to check typesisinstance()for validation
Why this matters: AWS returns everything as strings. Environment variables are strings. But you need integers for math. You’ll convert types constantly.
Example:
cpu_usage = "85.6" # From monitoring API
if float(cpu_usage) > 80:
send_alert()
Week 4-5: File Operations & JSON (Critical for DevOps)
4. File Management
Learn:
open(),read(),write(),readlines()File modes: ‘r’, ‘w’, ‘a’
Context managers:
with open() as f:Working with paths using
pathlib
Real DevOps use:
Reading config files
Writing deployment logs
Updating hosts files
Managing SSH keys
Example:
with open('/var/log/app.log', 'r') as f:
for line in f:
if 'ERROR' in line:
print(line)
JSON Management (Core DevOps Skill)
Learn:
json.load()(from file) vsjson.loads()(from string)json.dump()(to file) vsjson.dumps()(to string)Pretty printing:
json.dumps(data, indent=2)
Why JSON is everywhere in DevOps:
Terraform state files
AWS CLI output
Kubernetes manifests
CI/CD pipeline configs
API responses
Example:
import json
# Reading Terraform state
with open('terraform.tfstate', 'r') as f:
state = json.load(f)
resources = state['resources']
# Writing config
config = {'region': 'us-east-1', 'instance_type': 't3.micro'}
with open('config.json', 'w') as f:
json.dump(config, f, indent=2)
Practice project: Read a JSON config file with server details. Update the instance count. Write it back to the file.
Week 5-6: System Operations
5. OS Module & Subprocess
OS Module:
os.getcwd(),os.chdir(),os.listdir()os.path.exists(),os.path.join()os.environfor environment variablesos.mkdir(),os.remove()
Subprocess Module:
subprocess.run()to execute shell commandsCapturing output with
capture_output=TrueChecking return codes
subprocess.Popen()for advanced use
Real DevOps use:
Running AWS CLI commands
Executing kubectl commands
Checking if files exist before operations
Managing environment variables in deployments
Example:
import subprocess
# Run AWS CLI command
result = subprocess.run(
['aws', 'ec2', 'describe-instances'],
capture_output=True,
text=True
)
if result.returncode == 0:
print(result.stdout)
else:
print(f"Error: {result.stderr}")
Week 6-7: API Calls (You’ll Do This Daily)
6. Making API Calls with Requests
Learn:
requests.get(),requests.post(),requests.put(),requests.delete()Understanding status codes (200, 404, 500, etc.)
Headers and authentication
Request payload (JSON body)
Query parameters
Handling timeouts
Real DevOps use:
AWS API calls
GitHub API for CI/CD
Slack/Teams notifications
Datadog/Prometheus metrics
Webhook integrations
Example:
import requests
# Get EC2 instances via AWS API
response = requests.get(
'https://api.github.com/repos/user/repo/issues',
headers={'Authorization': f'token {github_token}'},
timeout=10
)
if response.status_code == 200:
issues = response.json()
for issue in issues:
print(issue['title'])
else:
print(f"Failed with status: {response.status_code}")
Status codes you must know:
200: Success
201: Created
400: Bad request (check your payload)
401: Unauthorized (check your API key)
404: Not found
500: Server error (not your fault)
Practice project: Write a script that posts a message to Slack when a deployment completes. Include deployment status and timestamp.
Week 7-8: Control Flow & Functions
7. Conditionals & Loops
Conditionals:
if,elif,elseComparison operators:
==,!=,>,<,inLogical operators:
and,or,not
Loops:
forloops (most common in DevOps)whileloops (use sparingly)breakandcontinueenumerate()for index and valuezip()for parallel iteration
Example:
# Check multiple servers
servers = ['web-1', 'web-2', 'db-1']
for index, server in enumerate(servers, 1):
status = check_server_status(server)
if status == 'down':
print(f"Alert: Server {index} ({server}) is down!")
break
8. Functions (Write Reusable Code)
Learn:
Defining functions with
defParameters and arguments
Return values
Default parameters
*argsand**kwargsLambda functions (for simple operations)
Example:
def deploy_to_server(server_name, environment='staging'):
print(f"Deploying to {server_name} in {environment}")
# Deployment logic here
return True
# Usage
success = deploy_to_server('web-1', 'production')
Week 8-9: Code Organization
9. Building Custom Modules
Learn:
Creating
.pyfiles as modulesimportandfrom ... importif __name__ == "__main__":Organizing code into functions
Package structure
Why this matters: You’ll reuse the same code across multiple automation scripts. Write once, import everywhere.
Example structure:
devops_tools/
├── aws_utils.py
├── slack_notifier.py
└── config_parser.py
# In aws_utils.py
def get_running_instances():
# AWS logic
return instances
# In your script
from devops_tools.aws_utils import get_running_instances
instances = get_running_instances()
Week 9-10: Production-Ready Code
10. Logging & Exception Handling
Logging (Stop using print!):
loggingmodule basicsLog levels: DEBUG, INFO, WARNING, ERROR, CRITICAL
Writing logs to files
Log formatting
Example:
import logging
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
filename='deployment.log'
)
logging.info("Starting deployment")
logging.error("Deployment failed: Connection timeout")
Exception Handling:
try,except,finallyCatching specific exceptions
Raising custom exceptions
Never use bare
except:
Example:
try:
response = requests.get(api_url, timeout=5)
response.raise_for_status()
data = response.json()
except requests.Timeout:
logging.error("API request timed out")
except requests.HTTPError as e:
logging.error(f"HTTP error: {e}")
except json.JSONDecodeError:
logging.error("Invalid JSON response")
finally:
logging.info("Request attempt completed")
Practice project: Create a server health check script with proper logging and exception handling. Log to both console and file.
Week 10-11: Object-Oriented Basics
11. OOP Concepts (Just What You Need)
Learn:
Classes and objects
__init__()methodInstance variables vs class variables
Methods
Basic inheritance
Why OOP in DevOps: You won’t write complex OOP, but you’ll use AWS SDK (boto3), Kubernetes client, and other libraries that are object-oriented.
Example:
class Server:
def __init__(self, name, ip, environment):
self.name = name
self.ip = ip
self.environment = environment
self.status = 'unknown'
def check_health(self):
# Health check logic
self.status = 'healthy'
return self.status
def deploy(self, version):
print(f"Deploying v{version} to {self.name}")
# Usage
web_server = Server('web-1', '10.0.1.5', 'production')
web_server.check_health()
web_server.deploy('2.1.0')
Additional Critical Topics
12. Regular Expressions (Regex)
Learn:
re.search(),re.match(),re.findall()Common patterns:
\d(digit),\w(word),\s(space)Groups with
()Character classes
[]
Real DevOps use:
Parsing log files (this is huge)
Extracting IPs from text
Validating email/domain formats
Searching CloudWatch logs
Example:
import re
log = "2024-02-16 10:30:45 ERROR Connection failed from 192.168.1.100"
# Extract IP address
ip = re.search(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', log)
print(ip.group()) # 192.168.1.100
# Find all error logs
errors = re.findall(r'ERROR.*', log_file_content)
13. Working with YAML
Learn:
pyyamllibraryyaml.safe_load()andyaml.dump()Handling multi-document YAML
Why it matters: Kubernetes manifests, Ansible playbooks, GitHub Actions, and Docker Compose all use YAML.
Example:
import yaml
with open('deployment.yaml', 'r') as f:
config = yaml.safe_load(f)
config['spec']['replicas'] = 5
with open('deployment.yaml', 'w') as f:
yaml.dump(config, f)
14. Environment Variables & Config Management
Learn:
os.environ.get().envfiles withpython-dotenvConfig files vs environment variables
Never hardcode secrets
Example:
import os
from dotenv import load_dotenv
load_dotenv()
AWS_ACCESS_KEY = os.environ.get('AWS_ACCESS_KEY_ID')
DB_PASSWORD = os.environ.get('DB_PASSWORD')
if not AWS_ACCESS_KEY:
raise ValueError("AWS_ACCESS_KEY_ID not set")
15. Working with Dates & Times
Learn:
datetimemoduleParsing timestamps
Time zones with
pytzDate formatting
Real DevOps use:
Log rotation
Backup naming
Cron job scheduling
Report generation
Example:
from datetime import datetime, timedelta
# Create backup with timestamp
now = datetime.now()
backup_name = f"db_backup_{now.strftime('%Y%m%d_%H%M%S')}.sql"
# Check if file is older than 7 days
file_age = datetime.now() - datetime.fromtimestamp(os.path.getmtime('old_log.txt'))
if file_age > timedelta(days=7):
os.remove('old_log.txt')
16. Command Line Arguments
Learn:
sys.argvbasicsargparsemodule for proper CLI toolsRequired vs optional arguments
Help messages
Example:
import argparse
parser = argparse.ArgumentParser(description='Deploy application')
parser.add_argument('environment', choices=['dev', 'staging', 'prod'])
parser.add_argument('--version', required=True)
parser.add_argument('--rollback', action='store_true')
args = parser.parse_args()
print(f"Deploying version {args.version} to {args.environment}")
17. Working with CSV Files
Learn:
csvmoduleDictReaderandDictWriterHandling different delimiters
Real DevOps use:
Cost reports
Server inventory
Log analysis exports
Data migration
Example:
import csv
with open('servers.csv', 'r') as f:
reader = csv.DictReader(f)
for row in reader:
print(f"Server: {row['name']}, IP: {row['ip']}")
18. Multithreading Basics (For Parallel Operations)
Learn:
threadingmoduleThreadPoolExecutorfor concurrent tasksWhen to use threads vs processes
Real DevOps use:
Checking health of 100 servers simultaneously
Parallel API calls
Batch operations
Example:
from concurrent.futures import ThreadPoolExecutor
def check_server(server):
# Health check logic
return f"{server} is healthy"
servers = ['web-1', 'web-2', 'web-3', 'db-1']
with ThreadPoolExecutor(max_workers=4) as executor:
results = executor.map(check_server, servers)
for result in results:
print(result)
10 Real-World DevOps Projects
Project 1: EC2 Instance Manager
Goal: Manage EC2 instances via script
Features:
List all running instances
Start/stop instances by tag
Get instance details (IP, type, state)
Send Slack notification on state change
Skills practiced: boto3, API calls, error handling, logging
Project 2: Log Parser & Alert System
Goal: Parse application logs and send alerts
Features:
Read log files line by line
Use regex to find ERROR and CRITICAL logs
Count errors per hour
Send email if errors > threshold
Generate daily summary report
Skills practiced: File I/O, regex, datetime, email sending
Project 3: Backup Automation Script
Goal: Automated database backups to S3
Features:
Dump MySQL/PostgreSQL database
Compress with timestamp
Upload to S3
Delete local backups older than 7 days
Log all operations
Send failure notifications
Skills practiced: subprocess, boto3, datetime, exception handling
Project 4: Kubernetes Pod Monitor
Goal: Monitor pod health and restart unhealthy pods
Features:
List all pods in namespace
Check pod status
Restart pods if not Running
Log pod events
Send metrics to Datadog
Skills practiced: subprocess (kubectl), JSON parsing, API calls
Project 5: Multi-Cloud Cost Reporter
Goal: Generate cost reports from AWS and Azure
Features:
Fetch cost data from AWS Cost Explorer API
Fetch Azure cost data
Combine into single report
Generate CSV and send via email
Create visualizations (optional: matplotlib)
Skills practiced: Multiple APIs, CSV handling, data manipulation
Project 6: CI/CD Pipeline Trigger
Goal: Trigger deployments based on conditions
Features:
Check GitHub for new releases
Validate release notes format
Trigger Jenkins/GitLab pipeline via API
Post status to Slack
Rollback on failure
Skills practiced: GitHub API, webhook handling, error recovery
Project 7: SSL Certificate Expiry Checker
Goal: Monitor SSL certificates across multiple domains
Features:
Read domains from config file
Check SSL expiry date for each
Alert if expiring in < 30 days
Generate report with all cert details
Store results in JSON
Skills practiced: ssl module, datetime, file I/O, notifications
Project 8: Infrastructure Drift Detector
Goal: Compare actual infrastructure vs Terraform state
Features:
Read Terraform state file
Query actual AWS resources
Compare and find differences
Generate drift report
Send to teams if drift detected
Skills practiced: JSON parsing, boto3, data comparison, dictionaries
Project 9: Automated Security Group Auditor
Goal: Audit AWS security groups for compliance
Features:
List all security groups
Check for open 0.0.0.0/0 rules
Flag SSH (22) open to internet
Check for unused security groups
Generate compliance report
Auto-remediate (optional)
Skills practiced: boto3, sets for comparison, conditional logic
Project 10: Multi-Server Deployment Tool
Goal: Deploy application to multiple servers
Features:
Read server list from YAML
SSH to each server (using paramiko)
Pull latest code from Git
Restart services
Health check after deployment
Rollback if health check fails
Parallel deployment with threading
Skills practiced: YAML, SSH (paramiko), threading, exception handling, logging
Learning Path Summary
Weeks 1-2: Basics + Data Structures
Weeks 3-4: Files + JSON
Weeks 5-6: OS operations + API calls
Weeks 7-8: Control flow + Functions
Weeks 9-10: Logging + Error handling
Week 11: OOP basics
Week 12: Additional topics (regex, YAML, CLI args)
Weeks 13-16: Build all 10 projects
My Advice After Teaching 1000+ Engineers
Don’t just watch tutorials. Type every line of code yourself.
Break things. Intentionally make mistakes. See what errors look like.
Read real code. Check out boto3 source code, Ansible modules, Terraform providers. See how pros write Python.
Automate your daily tasks. That manual task you do every day? Automate it. That’s the best learning.
Join DevOps communities. Share your scripts. Get feedback. Learn from others.
Start small, ship fast. Don’t wait to write the perfect script. Write a working one. Improve it later.
Document as you code. Future you (and your team) will be grateful.
What’s Next?
After completing this roadmap:
Deep dive into boto3 (AWS SDK for Python)
Learn Ansible (Python-based)
Explore Terraform CDK (Python)
Build monitoring tools with Prometheus Python client
Contribute to open-source DevOps tools
Remember: Python is a tool, not the goal. The goal is to solve real DevOps problems faster and more reliably.
Stop reading. Start coding.
See you in production,
Akhilesh
P.S. If you build any of these projects, tag me on LinkedIn. I’d love to see what you create.


Very well explained python for devops, thanks for sharing,