This role involves leading reliability engineering and platform teams to enhance the scalability, reliability, and operational excellence of Pure Storage's cloud services, with a focus on automation, incident management, and infrastructure modernization in a cloud environment.
Key Responsibilities
Lead and develop SRE and Platform teams, setting strategy for reliability, scalability, and operability of cloud services
Own reliability engineering, including defining and evolving SLIs, SLOs, error budgets, and operational practices
Build and operate internal platform tooling for developer workflows, observability, automation, and incident response
Manage and harden core cloud infrastructure, including Kubernetes and Infrastructure as Code (IaC) across control and data planes
Lead capacity planning, cost optimization, disaster recovery, and multi-region readiness
Champion incident management processes, including postmortems, systematic toil reduction, and continuous improvement
Requirements
Proven leadership experience running SRE, Production Engineering, and Platform functions for SaaS or cloud services at scale, building high performance, inclusive teams.
Hands-on software development experience with fluency in engineering fundamentals, including design reviews, automated testing, CI/CD, and version control, with the ability to contribute to production grade code.
Deep understanding of SRE foundations including SLIs, SLOs, error budgets, incident management, capacity planning, change release management, and reliability reviews.
Practical cloud expertise, with a preference for Azure, including experience with modern SRE toolchain components such as containers, Kubernetes, Infrastructure as Code (Terraform, Bicep, CloudFormation), CI/CD, and observability tools like OpenTelemetry, Prometheus, Grafana, ELK, and Azure Monitor.
Strong systems thinking and architectural acumen, including resilience reviews, failure mode analysis, chaos engineering, disaster recovery testing, and data-driven stakeholder communication.
Experience in leading and developing teams, setting strategy and execution for reliability, scalability, and operability across cloud platforms.
Experience operating and hardening core cloud infrastructure services, including Kubernetes and Infrastructure as Code (IaC) across control and data planes, with responsibilities including capacity planning, cost optimization, disaster recovery, and multi-region readiness.
Ability to champion incident management, continuous improvement, blameless postmortems, and systematic toil reduction, with measurable impact on MTTR.
Work authorization and physical presence requirements: the role is primarily an in-office environment in Prague, Czech Republic, and the candidate is expected to work from the Prague office unless on PTO, work travel, or other approved leave.
Benefits & Perks
Relocation package
Flexible time off
Wellness resources
Company-sponsored team events
Ready to Apply?
Join Pure Storage and make an impact in renewable energy