A Site Reliability Engineer at Canonical is responsible for deploying, managing, and optimizing open source infrastructure such as OpenStack and Kubernetes, focusing on automation, monitoring, and incident management to ensure high availability and performance for mission-critical cloud services.
Key Responsibilities
Deploy and operate OpenStack, Kubernetes, storage solutions, and open source applications using DevOps practices
Identify and resolve incidents, monitor applications, and anticipate potential issues to ensure high-quality standards
Automate infrastructure and operations through software engineering and scripting, primarily using Python
Manage and maintain Linux environments and cloud infrastructure across physical and public cloud estates
Work with mission-critical services for global customers, ensuring reliability and performance
Requirements
Degree in software engineering or computer science
Python software development experience
Operational experience in Linux environments
Experience with Kubernetes deployment or operations
The ability to work in operations with mission-critical services for global brand-name customers
Excellent interpersonal skills, curiosity, flexibility, and accountability
Ability to travel internationally twice a year, for company events up to two weeks long
Benefits & Perks
Compensation adjusted every 6 months based on performance, with annual bonuses
Global remote work environment
Twice-yearly in-person team sprints at interesting locations worldwide
Personal learning and development budget of USD 2,000 per year
Recognition rewards
Annual holiday leave
Maternity and paternity leave
Employee Assistance Programs
Opportunity to travel to new locations to meet colleagues
Travel upgrades and Priority Pass for long-haul company events
Ready to Apply?
Join Canonical and make an impact in renewable energy