What I Do
- Service-Level Objectives: SLI, SLO, SLA, Error Budget, Burn Rate
- Distributed Systems: Architecture, hybrid environments, high availability
- Configuration Management: Puppet, Hiera, Terraform, Ansible
- Container Computing: Docker, Kubernetes
- Cloud Services: Azure, Alibaba Cloud, AWS, GCP, OpenStack
- Distributed Messaging: RabbitMQ, Pulsar
- Proxies & Load Balancing
- Monitoring: Prometheus, Kibana, Grafana, Elasticsearch
- Logging: Splunk, SysLog, ELK Stack, Linux Journal
- Source Control: GitHub Enterprise
- CI/CD: Jenkins, Argo, GitHub Actions, Atlantis
- Linux: Bash, debugging, performance tuning
- Networking: Troubleshooting, packet loss, routing
- Programming: Language-agnostic
Practices
- Cultivate broad and specialized expertise in relevant areas.
- Stay up-to-date with technology trends and industry standards.
- Communicate complex ideas clearly to diverse audiences.
- Build strong relationships with cross-functional teams.
- Co-own operations and reliability with partner teams.
- Deeply understand the services I support and their goals.
- Explore new technologies through demos, experiments, and labs.
- Break down complex tasks into manageable units.
- Participate in on-call rotations to resolve incidents.
- Lead blameless postmortems to improve service reliability.
- Collaborate to foster positive relationships across the company.
- Use systems knowledge to triage problems and optimize resources.
- Champion automation to reduce manual work and increase velocity.
- Demonstrate configuration management for service consistency.
- Build IaC configurations for cloud infrastructure.
- Support and improve build pipelines.
- Adopt containers and Kubernetes for new and existing services.
- Apply “everything-as-code” methodologies where applicable.
- Automate or eliminate repetitive tasks.
- Improve workflows to simplify team operations.
- Troubleshoot incidents using metrics, logs, and other data.
- Embrace older technologies for their reliability and low maintenance.
- Spread DevOps philosophies across teams.
- Collaborate with engineers across software, network, cloud, and systems.
Outlook
- Shape the future of service management using Kubernetes.
- Support global data platforms across multiple clouds.
- Improve and manage service migrations across clouds/data centers.
- Work with partner teams to influence product operations.
- Enhance monitoring and logging for better observability.
- Enable Service-Level Objectives for improved service reliability.
- Design and conduct stress tests to align scale expectations with reality.