A Site Reliability Engineer at Samsara is responsible for designing, maintaining, and securing scalable, reliable, and highly available cloud infrastructure, ensuring system stability, security, and performance to support IoT-based physical operations across various industries.
Key Responsibilities
Design, maintain, and ensure the security, availability, and resilience of cloud infrastructure.
Provision infrastructure using Infrastructure as Code (IaC) tools like Terraform, Ansible, or Puppet.
Develop and maintain secure, standardized base components such as golden images and container images.
Embed reliability best practices into development and deployment pipelines to improve system stability.
Participate in a 24/7 on-call rotation to investigate outages and perform root cause analysis.
Define Service Level Indicators (SLIs) and Service Level Objectives (SLOs), and develop observability tools to monitor service reliability.
Create and manage monitoring dashboards, metrics pipelines, and alerting systems.
Lead the technical delivery of engineering projects, including solution design, scope evaluation, and trade-off analysis.
Requirements
Minimum of 5 years of experience as a DevOps Engineer, Security Engineer, Software Engineer, or Site Reliability Engineer
At least 5 years of hands-on experience working with Unix or Linux operating systems
At least 3 years of hands-on experience in public cloud networking engineering
At least 3 years of experience in Identity Access Management
Strong ability to automate tasks through scripting languages such as Python, Bash, Go, or similar
Experience with Infrastructure as Code tools such as Terraform, Ansible, or Puppet
Experience in implementing and managing containerized applications using Docker and orchestration platforms like Kubernetes, with security considerations in mind
Experience with AWS services including EC2, EKS, ECS, ECR, and IAM
Understanding of security weaknesses, exploits, attacks, and mitigations
Experience in designing and maintaining highly available, resilient, and secure infrastructure across public cloud platforms
Experience with infrastructure provisioning using Infrastructure as Code (IaC) to establish reliable developer self-service pathways
Experience in developing and maintaining standardized, secure, and reliable base components such as golden images, AMIs, container images for fault-tolerant provisioning
Experience in embedding reliability best practices into development and deployment pipelines to drive system stability and service quality
Experience in participating in a 24/7 on-call rotation, investigating outages, and contributing to root cause analysis post-mortems
Ability to define Service Level Indicators (SLIs) and Service Level Objectives (SLOs), and to build observability tooling including monitoring dashboards, metrics pipelines, and alerting configurations
Experience in leading the technical delivery of engineering projects, including designing solutions, evaluating options, and making trade-off decisions
Strong communication skills and a desire to collaborate across teams
Excellent problem-solving skills and the ability to troubleshoot network and security-related issues
Experience with performance and cost optimization as applied to cloud infrastructure
Familiarity with data privacy regulations and compliance
Benefits & Perks
Full time employees receive a competitive total compensation package
Employee-led remote and flexible working
Health benefits
Flexible working model that supports remote, hybrid, and in-office work
Ready to Apply?
Join Samsara and make an impact in renewable energy