A Senior Data Engineer responsible for designing, developing, and maintaining scalable data platforms and pipelines using cloud technologies like AWS, to support analytics, machine learning, and data governance initiatives within a growing energy solutions company.
Key Responsibilities
Build, automate, and manage real-time scalable data ingestion pipelines for master data management, deep learning, and predictive analytics.
Design and maintain cloud-native big data environments on AWS, ensuring security, scalability, and performance.
Lead data governance and data profiling efforts to ensure data quality and proper metadata documentation.
Collaborate with Data Scientists, BI developers, and Product Managers to design data models, schemas, and processing logic.
Develop and optimize ETL processes for data validation, transformation, and feature modeling using Spark, Python, SQL, and AWS technologies.
Implement best practices for code development, testing, performance optimization, monitoring, and incident response to ensure data usability and quality.
Define and automate SLAs for data availability and correctness, and respond to data delivery alerts.
Requirements
A bachelor’s degree in computer science or information technology plus a minimum of 8 years of relevant experience.
High proficiency in programming languages commonly used in ETL development, such as PLSQL, SQL, and Python.
Ability to write efficient SQL queries, SQL stored procedures, develop scripts for data transformations, and utilize programming frameworks and libraries to create and enhance ETL mappings and workflows.
Expertise in utilizing AWS services, including but not limited to Amazon S3, AWS Glue, AWS Data Catalog, Amazon Redshift, Redshift Spectrum, and Amazon Athena, to build scalable, reliable, and performant data pipelines and analytics solutions.
Ability to build, automate, and manage near-real-time scalable data ingestion pipelines for master data management, deep-learning, and predictive analytics.
Experience in building and maintaining cloud-native big data environments on AWS that are highly secure, scalable, flexible, and highly performant using appropriate SQL, NoSQL, and NewSQL technologies.
Proficiency in working with relational databases such as Postgres, Oracle, MySQL, or SQL Server, including knowledge of database design, optimization techniques, and advanced querying capabilities.
Experience in performance tuning and optimizing database operations.
Ability to lead data governance and data profiling efforts to ensure data quality and proper metadata documentation for data lineage.
Knowledge of data security best practices and familiarity with data governance frameworks.
Experience in designing and developing ETL extract-transform-load processes to validate and transform data, calculate metrics, model features, and populate data models using Spark, Python, SQL, and other technologies in the AWS environment.
Demonstrated ability to lead by example, including best practices for code development and optimization, unit testing, CI/CD, performance testing, capacity planning, documentation, monitoring, alerting, and incident response to ensure data availability, data quality, and usability.
Ability to define SLAs for data availability and correctness, automate data availability and quality monitoring, and respond to alerts when data delivery SLAs are not being met.
Excellent communication skills to clearly communicate progress across organizations and levels from individual contributor to executive, and to identify and clarify critical issues for decision-making.
Benefits & Perks
Salary range of $140,000 - $165,000 annually, with a target compensation of $156,750 based on experience and qualifications
Hybrid work opportunity with flexible remote work and office locations in Oakland, CA, Orange, CA, Portland, OR, Chicago, IL, and Boston, MA
Generous retirement package
Medical, dental, and vision insurance
Pre-tax contribution plans
Employee Stock Ownership Plan (ESOP)
Ready to Apply?
Join Energy Solutions and make an impact in renewable energy