The role involves designing, deploying, and managing large-scale bare metal Kubernetes clusters in data centers, ensuring platform reliability, security, and performance, while collaborating with cross-functional teams and mentoring engineers.
Key Responsibilities
Design, deploy, and operate large-scale bare metal Kubernetes clusters across data centers
Lead technical design and implementation of cluster features, including lifecycle management, networking, and storage
Ensure platform reliability, performance, security, and observability through monitoring and incident management
Implement security best practices such as RBAC, network policies, and secrets management
Collaborate with engineering teams to onboard workloads and provide technical leadership on cross-team projects
Mentor junior and mid-level engineers in Kubernetes, automation, and production operations
Requirements
At least 6 years of experience in infrastructure, SRE, or platform engineering roles, including a minimum of 3 years of experience running Kubernetes in production environments, with significant experience on bare metal servers.
Strong proficiency in Linux systems administration, including networking, performance tuning, and security hardening.
Deep understanding of Kubernetes internals such as API server, etcd, controllers, scheduler, kubelet, and key concepts including Pods, Deployments, Services, Ingress, ConfigMaps, Secrets, and Horizontal Pod Autoscaler (HPA).
Hands-on experience with Kubernetes networking CNI plugins, preferably Cilium, including configuring Services, Ingress, NetworkPolicies, and L4/L7 load balancing.
Proficiency with Infrastructure as Code (IaC) and automation tools such as Ansible, Terraform, or equivalent.
Strong experience with observability stacks including Prometheus, Elastic Stack (ELK), Grafana, Fluentd, and Fluent Bit for cluster and workload monitoring.
Solid scripting or programming skills in Python, Go, or similar languages for automation, tooling, and integration work.
Excellent communication and documentation skills, with the ability to collaborate effectively across distributed teams and write clear technical documentation and runbooks.
Benefits & Perks
Flexible time off
Wellness resources
Company-sponsored team events
Support for growth and development
Inclusive and diverse work environment
Accommodations for disabilities
Ready to Apply?
Join Pure Storage and make an impact in renewable energy