The Senior Software Engineer for Testbed Health is responsible for designing and implementing automated, self-healing systems to ensure hardware resilience, develop monitoring tools, and troubleshoot complex system failures in a large-scale, distributed storage environment.
Key Responsibilities
Design and implement automated recovery systems for hardware failures
Build and maintain health certification pipelines to validate system integrity
Develop monitoring tools and dashboards for fleet observability and hardware health insights
Investigate and resolve complex system failures, implementing preventative measures
Requirements
Proficiency in Python or Go with a focus on writing clean, production-grade code designed for automation and distributed systems.
Deep expertise in Linux environments, including systemd, storage subsystems, kernel logs, and troubleshooting unresponsive or failing systems.
Solid understanding of networking protocols such as DNS, DHCP, SSH, and experience managing failure modes of physical hardware or large-scale virtualized clusters.
Experience in building CI/CD pipelines and treating infrastructure-as-code to ensure environments are reproducible and resilient.
Ability to design and implement automated recovery loops that diagnose and repair hardware failures to maintain 24/7 fleet availability.
Experience in architecting health certification pipelines and building services that validate system integrity of OS, storage, and network components.
Experience in developing monitoring tools and real-time dashboards for fleet observability, hardware degradation trends, recovery velocity, and capacity insights.
Proven ability to lead investigations into complex system failures such as kernel panics and network flakes, and translate findings into code-based preventative measures.
Location requirement: Willingness to work primarily from the Bengaluru office in accordance with company policies.
Benefits & Perks
Flexible time off
Wellness resources
Company-sponsored team events
Ready to Apply?
Join Pure Storage and make an impact in renewable energy