NextGenEnergyJobsRenewable Energy Jobs
CompaniesCitiesIndustries

NextGenEnergyJobs

The #1 platform for renewable energy careers. Join thousands of professionals who've found their dream jobs in renewable energy, sustainability, and renewable tech.

0+Newsletter subscribers
25K+Jobs posted
100+Companies

Sustainability Partners

Sustainability Software DirectoryRefurbished Tech Guide

Find Jobs

  • All Jobs
  • By Location
  • By State
  • International
  • By Industry
  • Top Companies
  • Job Titles

Job Types

  • Remote Jobs
  • Hybrid Jobs
  • Full-time
  • Part-time
  • Contract
  • Internships
  • Visa Sponsored

Experience

  • Entry Level
  • Mid Level
  • Senior Level
  • Executive
  • Remote Internships

Resources

  • Career Advice Hub
  • Top 10 Jobs
  • Solar Sales Salary
  • Become Solar Engineer
  • Salary Insights
  • CV Analyzer
  • Post a Job

Popular Job Locations

San Francisco
245 jobs
Boston
189 jobs
Denver
167 jobs
Austin
143 jobs
New York
298 jobs
Chicago
132 jobs
Seattle
201 jobs
Portland
98 jobs
Los Angeles
176 jobs
San Diego
87 jobs
Washington DC
203 jobs
Atlanta
112 jobs

Hot Remote Specializations

Project ManagerSolar SalesCustomer SuccessData EntryAll Data Entry
© 2026 NextGenEnergyJobs. All rights reserved.
Privacy PolicyTerms of ServiceAbout UsContact
  1. Home
  2. Jobs
  3. Senior Cloud Infrastructure Engineer
Gatik logo

Senior Cloud Infrastructure Engineer

Gatik
Mountain View, California
Full Time
Posted March 18, 2026
Not Specified
Apply Now

Application opens on company website

Job Description

The Senior Cloud Infrastructure Engineer at Gatik is responsible for designing, building, and maintaining large-scale, high-performance cloud and Kubernetes infrastructure to support autonomous vehicle AI systems, including data pipelines, model management, and distributed training environments.

Key Responsibilities

  • Architect and manage large-scale compute and data infrastructure supporting autonomous driving systems.
  • Build and optimize Kubernetes clusters for GPU and TPU workloads, including GPU scheduling and resource utilization.
  • Implement Infrastructure as Code using tools like Terraform and Helm.
  • Deploy and monitor autonomous AI agents for cluster health and hardware failure triage.
  • Develop and maintain large-scale data pipelines using Apache Airflow, Kafka, and Spark.
  • Automate deployment workflows with GitOps tools such as ArgoCD and GitLab CI/CD.
  • Maintain infrastructure and model performance observability using Prometheus, Grafana, and OpenTelemetry.
  • Design and manage ML model lifecycle tracking and automated workflows for training and deployment.
  • Support deployment of models into simulation and production environments using Triton, Ray Serve, and ONNX Runtime.
  • Enable scalable multi-node training of models with frameworks like PyTorch Distributed, Ray, and Horovod.
  • Optimize low-level networking for large-scale training, including NCCL tuning and InfiniBand configuration.
  • Collaborate with researchers to fine-tune multi-node GPU clusters for high-performance model training.

Requirements

  • Five or more years of experience in Cloud Infrastructure, DevOps, or MLOps supporting high-scale compute environments.
  • Deep expertise in Kubernetes, Helm, and container orchestration.
  • Strong background in orchestration tooling such as Apache Airflow, Argo Workflows, MLFlow, and Terraform.
  • Practical experience supporting distributed systems frameworks like Ray and PyTorch Distributed.
  • Proficiency in Python and Bash scripting, with a solid understanding of IAM RBAC.
  • Experience supporting training systems that enable scaling models across multi-node setups using PyTorch Distributed, Ray Train, and Horovod.
  • Experience in optimizing low-level communication protocols such as NCCL tuning, InfiniBand, or RoCE v2 to minimize latency for large-scale training.
  • Partnering with researchers to fine-tune performance across multi-node GPU clusters for FSDP and DeepSpeed workloads.
  • Onsite role requiring work five days a week at the Mountain View, CA office.

Benefits & Perks

Salary range of $180,000 to $240,000
Onsite work schedule, 5 days a week at Mountain View, CA
Work environment perks include a culture emphasizing collaboration, respect, agility, diversity, inclusion, and opportunities for growth

Ready to Apply?

Join Gatik and make an impact in renewable energy

Apply Now

Stay Updated on Sustainability Jobs

Get the latest renewable energy jobs and career tips delivered to your inbox.

Job Alerts

Get notified about new sustainability jobs

More at Gatik

Senior AI Infrastructure Engineer

Mountain View$240k

Office Coordinator and Executive Support Contract to Hire

Mountain View$120k

Senior Staff Robotics Integration Engineer

Mountain View$250k

Jobs in Mountain View, California

Mechanical Engineering, Intern

Nuro$8k

Senior Training Program Manager

Aurora$161k

Software Engineer I

Aurora$174k

More jobs at Gatik

Gatik logo

Senior AI Infrastructure Engineer

Gatik
NEW
Mountain ViewMountain View, California
Full Time
16h
$180k-240k
Gatik logo

Office Coordinator and Executive Support Contract to Hire

Gatik
Mountain ViewMountain View, California
Contract
Mar 12
$80k-120k
Gatik logo

Senior Staff Robotics Integration Engineer

Gatik
Mountain ViewMountain View, California
Full Time
Mar 11
$170k-250k

More jobs in Mountain View, California

Nuro logo

Mechanical Engineering, Intern

Nuro
NEW
Mountain ViewMountain View, California
Full Time
16h
$8k-8k
Aurora logo

Senior Training Program Manager

Aurora
NEW
Mountain ViewMountain View, California
Full Time
16h
$111k-161k
Aurora logo

Software Engineer I

Aurora
NEW
Mountain ViewMountain View, California
Full Time
2d
$116k-174k