Job Description
The role involves leading reliability engineering efforts for a microservices platform, ensuring system performance, resilience, and scalability through SRE principles, automation, and collaboration with engineering teams.
Key Responsibilities
- Ensure the health, performance, and resilience of the platform using SRE principles
- Lead reliability efforts for microservices on Kubernetes, including observability, automation, and incident prevention
- Develop and enforce SLOs, SLAs, and error budgets to improve system reliability
- Own high-priority incident escalations, perform technical analysis, and restore services within SLOs
- Automate manual processes to enhance availability, latency, and performance of production services
- Collaborate with engineering teams to conduct post-incident reviews and implement systemic reliability improvements
Requirements
- Bachelor's or Master's degree in Computer Science, Software Engineering, or a related technical field or equivalent hands-on experience.
- Minimum of 8 years of experience in software engineering or Site Reliability Engineering (SRE) roles.
- Deep experience with cloud platforms AWS, GCP, or Azure.
- Proficiency in Java, the Spring framework, and Python or a similar scripting language in a Linux environment.
- Prior experience contributing to Site Reliability Engineering initiatives or similar operational roles.
- Demonstrated ability to lead projects and influence engineering culture.
- Knowledge of SRE principles, including SLI, SLO design, error budgets, and toil reduction strategies.
- Excellent written and verbal communication skills in English.
- Own high-priority application incident escalations, perform deep technical analysis and restoration within defined SLOs.
- Develop and enforce SLOs, SLAs, and error budgets to drive reliability-focused development.
- Engineer solutions to enhance the availability, latency, and performance of production services, automating manual processes to eliminate toil and scale operational efficiency.
- Collaborate closely with platform and application engineering teams to conduct post-incident reviews, extract insights, and implement systemic changes that improve overall reliability.
Benefits & Perks
Base salary range: $195,000 - $235,000 USD
Total compensation package including bonus, commission, equity, benefits, health, dental, life, 401k, and paid time off
Hybrid working options
Generous paid time off (PTO)
Company equity (RSUs)
Extensive parental leave
Dedicated volunteer days
Gym subsidies
Counseling and well-being programs
Internal mobility and mentorship opportunities
Clear career paths and dedicated learning programs
Ready to Apply?
Join Celonis and make an impact in renewable energy
Stay Updated on Sustainability Jobs
Get the latest renewable energy jobs and career tips delivered to your inbox.
Job Alerts
Get notified about new sustainability jobs
More at Celonis
More jobs at Celonis
Global People Business Partner
Celonis
NEW
Raleigh
Full Time
14h
Global People Business Partner
Celonis
NEW
New York
Full Time
14h
$145k-165k
Senior Management Technology Consultant
Celonis
NEW
Munich
Full Time
14h
More jobs in Redwood City, California
Supervisor, Renewable Project Accounting
SB Energy
Redwood City
Full Time
Dec 24
$110k-140k
Supervisor, Renewable Project Accounting
SB Energy
Redwood City
Full Time
Dec 27
$110k-140k
Project Development Associate Manager
SB Energy
Redwood City
Full Time
Nov 7
$110k-150k