This role involves designing, building, and operating a scalable, secure, and interoperable data platform that enables efficient data ingestion, modeling, querying, and publishing, supporting data-driven decision-making and innovation within the organization.
Key Responsibilities
Design, build, and operate core data platform services for data ingestion, modeling, querying, and publishing.
Architect multi-engine environments (e.g., Trino, Dremio, ClickHouse, Postgres) with a focus on interoperability, operability, and security.
Drive platform evolution through experimentation, performance measurement, and cost optimization.
Develop documentation, defaults, and self-service interfaces to facilitate knowledge sharing and ease of use.
Manage lakehouse components, including table layouts, catalog integration, and lifecycle maintenance.
Implement robust batch and streaming data ingestion patterns with clear SLAs and observability.
Configure connectors and permissions to enable multi-engine access while maintaining guardrails.
Requirements
Six (6) years of experience in the design and development of data pipeline automation, specifically extracting data from API-based sources.
Technical expertise in developing complex automation frameworks, data modeling, and ETL processes using SQL, Python, DBT, Apache Airflow, or similar languages.
Full-stack proficiency across the data stack, with deep knowledge of Python, SQL, and ETL methodologies.
Hands-on familiarity with the big data ecosystem, including technologies such as Trino, Kafka, and Iceberg.
Ability to design, build, and operate core data platform services that enable teams to ingest, model, query, and publish data seamlessly.
Ability to architect a multi-engine environment e.g., Trino Starburst, Dremio, ClickHouse, Postgres pgLake with an Iceberg-based lakehouse, prioritizing interoperability, operability, and security.
Experience in driving platform evolution through experimentation, measuring performance, reliability, and cost trade-offs to standardize high-impact solutions.
Experience in establishing durable corporate knowledge by developing strong defaults, comprehensive documentation, and intuitive self-service interfaces.
Experience managing core lakehouse components, including table layout conventions, catalogue integration, and critical lifecycle maintenance patterns such as compaction and optimization.
Implementing robust ingestion patterns for both batch and streaming data, ensuring clear SLAs and highly observable failure modes.
Configuring connectors and catalog permissions to enable multi-engine access while maintaining strict guardrails for a consistent user experience.
Proficiency in working in an in-office environment from the Prague office, in compliance with company policies.
Benefits & Perks
Flexible time off
Wellness resources
Company-sponsored team events
Ready to Apply?
Join Pure Storage and make an impact in renewable energy