This role involves supporting data center hardware operations by validating, troubleshooting, and recovering servers, storage, and network equipment, while developing automation workflows and providing technical guidance to ensure high fleet reliability and operational excellence.
Key Responsibilities
Assist in providing post hardware repair support to validate and return hardware to the production fleet
Own the end-to-end validation and recovery of server, storage, and network hardware
Design and implement automated workflows using Ansible and Python to streamline deployment processes
Perform deep-dive troubleshooting for complex hardware and software failures to identify root causes
Create and maintain technical documentation such as runbooks and infrastructure diagrams
Provide technical guidance and cross-training to datacenter operations teams to improve service delivery
Requirements
Work in a local datacenter interfacing directly with the datacenter operations team to assist in providing post hardware repair support, ensuring hardware gets validated and returned back to the production fleet.
Take initiative and drive leadership representing the Infrastructure and Shared Services ISS team in the datacenter.
Own the end-to-end validation and recovery of server, storage, and network hardware, ensuring assets are returned to the production fleet with high confidence and minimal downtime.
Design and implement scalable automation for repetitive tasks using Ansible and Python to eliminate manual bottlenecks in the deployment lifecycle.
Lead deep-dive troubleshooting for complex hardware and software failures, performing thorough analysis to prevent systemic issues and improve overall fleet health.
Architect comprehensive technical runbooks and infrastructure diagrams using tools such as LucidChart or Visio that standardize complex processes for the global DevOps and R&D communities.
Act as a domain expert, providing technical guidance and cross-training to Datacenter Operations teams to elevate service delivery SLAs and operational excellence.
Possess advanced hands-on proficiency in Linux administration and deep knowledge of enterprise hardware including Cisco, Brocade, Supermicro, including BIOS configurations and storage networking.
Demonstrated ability to write production-grade scripts in Python and manage infrastructure via configuration management platforms like Ansible or Puppet.
Proven track record of managing high-volume ticketing queues in Jira and driving complex technical projects to completion in a fast-paced datacenter environment.
Exceptional communication skills with the ability to translate complicated hardware failures into actionable insights for both local operations and remote engineering teams.
Experience working with virtualization (VMware ESXi), containerization (Docker), and automated deployment tools to support a hybrid infrastructure model.
Ability and willingness to work onsite at the Bluffdale, UT office in accordance with company policies.
Benefits & Perks
Salary range: 104,000 - 156,000 USD
Potential for incentive pay and/or equity
Work from the Bluffdale, UT office
Flexible time off
Wellness resources
Company-sponsored team events
Ready to Apply?
Join Pure Storage and make an impact in renewable energy