This role involves leading deep-dive debugging and root-cause analysis of complex storage system issues, focusing on firmware integration, to ensure the reliability and performance of large-scale storage platforms for hyperscale customers.
Key Responsibilities
Perform deep-dive debugging and root-cause analysis on complex storage platform issues.
Analyze failures across operating system, firmware, and hardware layers.
Design and refine automated tools for log analysis and system health monitoring.
Participate in cross-functional debug meetings and provide technical evidence for issue resolution.
Contribute to designing safety checks, observability features, and failure prevention mechanisms.
Validate the integration and reliability of new firmware builds and hardware SKUs before deployment.
Requirements
Extensive experience in systems, storage, or low-level software engineering with a detective mindset for solving non-deterministic bugs.
Proven ability to navigate and resolve issues across operating system layers, platform software, and NVMe Flash storage firmware layers.
Strong skills in Python, Go, or C used to automate data collection and forensic analysis, alongside mastery of Linux command-line tools.
Familiarity with large-scale test environments and CI/CD pipelines (e.g., Jenkins) to validate fixes and monitor for regressions.
Ability to perform deep-dive debugging and root-cause analysis on issues that span large-scale storage platforms, including firmware integration within the broader storage stack.
Experience in analyzing complex failures from high-level operating system layers down to low-level device firmware and hardware interactions.
Ability to analyze and resolve high-stakes escalations by correlating evidence across platform logs, system metrics, and hardware telemetry.
Experience in designing safety checks, observability features, and safety rails that proactively prevent known failure modes in storage solutions.
Experience in supporting the reliability of new firmware builds and hardware SKUs by validating their integration at scale before deployment.
Experience in participating in cross-functional debug meetings and daily war rooms, providing technical evidence and reproduction steps to drive fixes.
Ability to translate complex technical forensics into clear, actionable bug reports and investigation summaries for both executive and engineering audiences.
Benefits & Perks
Flexible time off
Wellness resources
Company-sponsored team events
Support for growth and development
Inclusive and diverse work environment
Ready to Apply?
Join Pure Storage and make an impact in renewable energy