The role involves designing and developing automated, self-healing systems for hardware infrastructure, ensuring high availability and resilience of storage technology through advanced monitoring, troubleshooting, and system certification processes.
Key Responsibilities
Design and implement automated recovery systems for hardware failures to ensure high fleet availability.
Build and maintain health certification pipelines to validate system integrity across OS, storage, and network components.
Develop monitoring tools and dashboards to track hardware health, degradation trends, and recovery performance.
Investigate complex system failures and develop code-based preventative measures.
Collaborate with platform and DevOps teams to improve system resilience and reduce downtime.
Requirements
Proficiency in Python or Go with a focus on writing clean, production-grade code designed for automation and distributed systems.
Deep expertise in Linux environments, including systemd, storage subsystems, kernel logs, and troubleshooting unresponsive or failing systems.
Solid understanding of networking protocols such as DNS, DHCP, and SSH, and experience managing failure modes of physical hardware or large-scale virtualized clusters.
Experience in building CI/CD pipelines and treating infrastructure-as-code to ensure environments are reproducible and resilient.
Ability to design and implement automated recovery loops that diagnose and repair hardware failures to maintain 24/7 fleet availability.
Experience in architecting health certification pipelines and building services that validate system integrity of OS, storage, and network components.
Experience developing monitoring tools and real-time dashboards for hardware degradation trends, recovery velocity, and fleet capacity insights.
Ability to investigate complex system failures such as kernel panics and network flakes, and translate findings into code-based preventative measures.
Location requirement: Willingness to work from the Bengaluru office in compliance with company policies.
Benefits & Perks
Flexible time off
Wellness resources
Company-sponsored team events
Support for growth and development
Inclusive and diverse work environment
Ready to Apply?
Join Pure Storage and make an impact in renewable energy