What I Do
I work on making applications cloud-native so that they run cheaper,
more reliable, and more performant.
- Service-Level Objectives: SLI, SLO, SLA, Error Budget, Burn Rate
- Distributed Systems: architecture, hybrid environments, high availability
- Configuration Management: Puppet, Hiera, Terraform, Ansible
- Container Computing: Docker, Kubernetes
- Cloud Services: Azure, Alibaba Cloud, AWS, GCP, OpenStack
- Distributed Messaging: RabbitMQ, Pulsar
- Proxies and Load Balancing
- Monitoring: Prometheus, Kibana, Grafana, Elasticsearch
- Logging: Splunk, SysLog, ELK Stack, Linux Journal
- Source Control: GitHub Enterprise
- CI/CD: Jenkins, Argo, GitHub Actions, Atlantis
- Linux: bash, debugging, performance tuning
- Networking: troubleshooting, packet loss, routing
- Programming: language-agnostic
Practices
- Cultivate broad and specialized expertise in relevant areas
- Stay up-to-date with technology trends and industry standards
- Communicate complex ideas clearly to diverse audiences
- Build strong relationships with cross-functional teams
- Co-own operations and reliability with partner teams
- Deeply understand the services I support and their goals
- Explore new technologies through demos, experiments, and labs
- Break down complex tasks into manageable units
- Participate in on-call rotations to resolve incidents
- Lead blameless postmortems to improve service reliability
- Collaborate to foster positive relationships across the company
- Use systems knowledge to triage problems and optimize resources
- Champion automation to reduce manual work and increase velocity
- Demonstrate configuration management for service consistency
- Build Infrastructure as Code (IaC) for cloud infrastructure
- Support and improve build pipelines
- Adopt containers and Kubernetes for new and existing services
- Apply everything-as-code methodologies where applicable
- Automate or eliminate repetitive tasks
- Improve workflows to simplify team operations
- Troubleshoot incidents using metrics, logs, and other data
- Embrace older technologies for their reliability and low maintenance
- Spread DevOps philosophies across teams
- Collaborate with engineers across software, network, cloud, and systems
Outlook
- Shape the future of service management using Kubernetes
- Support global data platforms across multiple clouds
- Improve and manage service migrations across clouds and data centers
- Work with partner teams to influence product operations
- Enhance monitoring and logging for better observability
- Enable Service-Level Objectives for improved reliability
- Design and conduct stress tests to align scale expectations with reality