The role involves designing and building automation tools and scalable, reliable systems to improve software development, testing, and delivery processes for data storage products, with a focus on high-availability, security, and system optimization.
Key Responsibilities
Design and build developer platforms and automation services to improve software delivery reliability, scalability, and velocity
Architect and implement high-availability solutions, disaster recovery, failover, and scaling strategies
Develop automation tooling and frameworks in Python, Go, or Rust to enhance developer productivity and system reliability
Extend and optimize CI frameworks for complex, distributed build and test environments
Apply Linux/Unix fundamentals to design resilient systems, debug issues, and optimize performance
Embed security into CI/CD systems through access control, patch automation, and system hardening
Perform capacity planning, system optimization, and resource utilization management
Build observability and debugging tools for metrics, logs, and traces to facilitate troubleshooting
Standardize monitoring, logging, and incident management practices across engineering teams
Collaborate with developers to integrate CI/CD systems into the full development lifecycle
Requirements
Eight (8) years of software engineering experience, with a strong background in designing scalable architectures and building reliable systems end-to-end.
Deep expertise with CI/CD platforms and modern automation practices for large-scale systems.
Proficiency in one or more modern programming languages such as Python, Rust, with a track record of building automation, developer tooling, or scalable services.
Strong Linux/Unix fundamentals, including advanced troubleshooting, debugging, and performance tuning.
Hands-on experience with containerization and orchestration technologies such as Kubernetes; experience with Docker Swarm, Nomad, or similar is also valuable.
Proven experience in observability, including designing and integrating monitoring, logging, tracing, and alerting into CI/CD pipelines and production systems.
Ability to design and implement high-availability solutions, disaster recovery, failover, and scaling strategies.
Experience in applying security principles to CI/CD systems, including access control, patch automation, and system hardening.
Experience in capacity planning and system optimization to ensure efficient utilization of infrastructure and resources.
Experience in performing risk analysis and mitigation through proactive vulnerability assessments and automation of remediation processes.
Ability to build observability and debugging tools that surface metrics, logs, and traces to accelerate troubleshooting and root cause analysis.
Experience in standardizing and advancing monitoring, logging, and incident management practices across engineering teams.
Benefits & Perks
Salary range: 175,000 - 263,000 USD
Work primarily in-office at Santa Clara
Potential incentive pay and/or equity
Flexible time off
Wellness resources
Company-sponsored team events
Ready to Apply?
Join Pure Storage and make an impact in renewable energy