The Staff Site Reliability Engineer at Redwood Materials is responsible for designing, implementing, and maintaining highly available and scalable systems, automating infrastructure management, and ensuring system resilience to support the company's rapid growth in battery recycling and energy storage solutions.
Key Responsibilities
Collect business technical requirements and collaborate with cross-functional teams to establish Service Level Objectives (SLOs).
Design and implement highly available, scalable on-premise hybrid systems using platform technologies like vSphere, Kubernetes, Linux, and Windows.
Coordinate work across IT, Software, Industrial Controls, and Engineering teams to ensure system requirements are met.
Identify opportunities to automate deployment and management of IT infrastructure to improve efficiency and recovery times.
Develop integrations to enhance data visibility and streamline system utility.
Support deployed systems by responding to incidents, troubleshooting issues, and participating in on-call rotations.
Lead post-incident reviews and implement improvements to prevent recurrence of failures.
Requirements
Bachelor's degree in information technology or any related field.
At least 2 years of experience in an SRE (Site Reliability Engineering) related role.
At least 5 years of experience in an IT Systems related role.
Experience administering IT Infrastructure such as VMware, Active Directory, Windows Server, Linux, Networking, Cloud Infrastructure including AWS and Azure, and Load balancing.
Expertise in scripting, coding, automation, and integration with tools such as Python, Ansible, Chef, Puppet, REST, YAML, JSON.
Ability to collect business technical requirements and work with cross-functional teams to establish Service Level Objectives (SLOs).
Ability to design effective on-premise hybrid systems solutions with high availability and scalability, utilizing platform technologies including vSphere, Kubernetes, Linux, and Windows.
Ability to coordinate work across IT, Software, Industrial Controls, and Engineering Business teams to implement complete systems that meet business needs.
Ability to identify opportunities to automate deployment management of IT infrastructure systems to reduce manual efforts and speed recovery.
Ability to develop integrations that streamline data visibility across components to deliver complete, efficient systems providing excellent utility and ease of use.
Experience supporting deployed systems by responding to incidents, leading fast triage, troubleshooting issues, and participating in an on-call rotation.
Experience leading post-incident reviews and driving improvements to eliminate repeat failure modes.
Physical ability to perform essential job functions safely and successfully, including climbing, standing, stooping, or typing, consistent with ADA, FMLA, and other standards.
Ability to maintain regular, punctual attendance consistent with ADA, FMLA, and other standards.
Ability to work in challenging conditions which may include exposure to noise, dust, chemicals, and temperature extremes, while protected by PPE, for extended periods.
Willingness to work occasional weekends, nights, or be on-call as a regular part of the job.
Willingness to undertake occasional travel as required.
This is a full-time position.
Benefits & Perks
Compensation will be commensurate with experience
Full-time position
Occasional work weekends, nights, or be on-call
Occasional travel requirements
Ready to Apply?
Join Redwood Materials and make an impact in renewable energy