• Own and optimise our platform architecture (improve reliability, scalability, maintainability, etc)
• Build new data sources and data pipelines that deliver key data and insights to the business (this will include integration with third party systems and ingestion through APIs)
• Scope and deliver new data engineering projects in collaboration with business stakeholders
• Develop and deploy ML infrastructure to help build out our ever-growing AI requirements and use cases
• Collaborate closely with our dev teams to build seamless integrations between our back-end databases and our data platform
• Shape the direction of our growing team and coach team members on best practices
• First and foremost, a passion for decarbonisation
• Ability to work with ambiguity and own problems end-to-end
• A passion for writing high quality code and building lean processes
• Experience with distributed data processing
• Experience with monitoring, testing, and data quality
• Experience building robust pipelines from diverse sources e.g. parquet, SQL & no-SQL databases, API endpoints
• It would be helpful to have experience/expertise/knowledge in the following:
• Python
• AWS
• Kubernetes
• Spark & distributed computing
• dbt
• Terraform
• Data Warehousing (we use Databricks but experience with other platforms is fine)
• MLOps implementation (desirable but not required)
• Airflow (or other orchestration tooling)
• Governance