The role involves ensuring the reliability, availability, and scalability of cloud infrastructure and services through automation, monitoring, incident response, and collaboration with engineering teams, primarily in a Site Reliability Engineering (SRE) capacity within a cloud-focused environment.
Key Responsibilities
Ensure the reliability and availability of core cloud services through operational frameworks, monitoring, and automation.
Lead incident response and root cause analysis to facilitate rapid recovery and prevent recurring issues.
Architect and implement automation and Infrastructure-as-Code to optimize deployments and service management.
Collaborate with engineering teams to influence service architecture and embed SRE best practices for highly available cloud-native systems.
Develop and enhance observability systems, including metrics, logging, tracing, and alerting, to monitor system health.
Requirements
Strong experience designing, operating, and improving highly available cloud services, including deep understanding of service uptime, Service Level Objectives (SLOs), and production operational excellence.
Expertise with public cloud platforms such as AWS, Azure, or GCP, and hands-on experience with cloud-native architectures.
Proficiency in Infrastructure-as-Code and automation using tools such as Terraform, Ansible, CloudFormation, Puppet, or similar.
Practical experience running containerized environments and orchestration systems such as Kubernetes.
Ability to build and operate observability stacks, including metrics, logging, tracing, and actionable alerting, using tools like ELK, Prometheus, or OpenTelemetry.
Experience managing on-call processes using tools like PagerDuty.
Strong programming skills in languages such as Python, Go, Java, Ruby, or similar.
Deep understanding of Linux systems and networking fundamentals.
Knowledge of modern software delivery practices.
Benefits & Perks
Flexible time off
Wellness resources
Company-sponsored team events
Ready to Apply?
Join Pure Storage and make an impact in renewable energy