The role involves supporting and automating storage management within a data storage company, focusing on hardware troubleshooting, scripting, and collaboration with datacenter teams to ensure reliable and efficient storage operations.
Key Responsibilities
Assist in automating storage management and validation processes
Troubleshoot hardware failures, apply repairs, and validate hardware for production
Identify and resolve code-related issues in automated validation processes
Communicate with datacenter operations teams to resolve hardware issues
Deploy, troubleshoot, and repair server hardware to ensure optimal operation
Monitor data storage utilization, I/O capacities, and alerts
Create and maintain monitors for data stores
Respond to storage-related alerts and escalations during business hours
Review and implement code fixes, and create tickets for resolution
Maintain technical documentation, troubleshooting manuals, and run books
Provide mentoring and training to the datacenter operations team
Work on-site in the Utah office in accordance with company policies
Requirements
Strong hands-on working knowledge of Linux and Python.
Understanding of networking and storage infrastructure.
Managing Jira ticket queue, troubleshooting hardware failure issues, applying appropriate repair steps, and validating hardware to send them back out to the production fleet.
Identify code-related problems in the validation process, create tickets for the appropriate team to implement a fix, and perform code review for bugs.
Extensive communication with datacenter operations teams to resolve hardware issues.
Deploying, troubleshooting, repairing, and ensuring the server fleet is operating at its maximum capacity.
Proactively monitoring data storage utilization, I/O capacities, and alerts, and creating monitors for existing data stores.
Being on call during business hours for storage-related alerts and escalations.
Maintain and contribute to technical documentation, troubleshooting manuals, and run books to an internal wiki.
Continuously review, learn, and understand internal services and tools that apply to IT workflows.
Provide mentoring and training to the Data Center Operations team when necessary.
Familiar experience with hardware from vendors like Cisco, Brocade, and Supermicro.
Resourcefulness and problem-solving aptitude.
Scripting skills in Python or Ansible programming are desirable.
Work from the Utah office in compliance with Pure Storage policies, unless on PTO, work travel, or other approved leave.
Basic knowledge of Pure Storage products such as Flash Array and Flash Blade.
Familiarity with automated booting in a Linux environment.
Familiarity with automation tools such as Jenkins and Docker.
Familiarity with Google suite of tools, including Sheets and Docs.
Familiarity with VMWare ESXi.
Educational background in Computer Science, Computer Hardware Engineering, or at least five years of industry experience.
Excellent interpersonal and teamwork skills.
Good written and verbal communication skills.
Must be detail-oriented and a well-organized self-starter.
Ability to accept constructive criticism.
Ability to explain complicated hardware issues, hardware configurations, and BIOS configurations.
Excellent problem-solving skills related to server hardware.
Ability to take ownership of issues arising from hardware and software problems and follow them to resolution.
Proven experience as a Site Reliability Engineer (SRE), DevOps engineer, or similar role.
Benefits & Perks
Salary range: 101,000 - 152,000 USD annually
Potential eligibility for incentive pay and equity
Work from the Utah office in compliance with company policies
Flexible time off
Wellness resources
Company-sponsored team events
Recognition as a Fortune's Best Large Workplace and Best Workplace for Millennials
Inclusive and diverse work environment with Employee Resource Groups
Ready to Apply?
Join Pure Storage and make an impact in renewable energy