At Sword, we’re building AI to heal billions and unlock humanity’s full potential.
Requirements
Proven experience designing and operating data platforms at scale - warehouse, data lake, or lakehouse architectures in production.
Hands-on experience with a modern lakehouse table format - Iceberg strongly preferred; Delta Lake or Hudi also welcome. You understand how the format works under the hood: metadata layout, snapshots, manifests, compaction, copy-on-write vs. merge-on-read.
Clear mental model of catalogs (REST, Polaris, Glue, Unity, Hive) - their trade-offs, and how compute stays detached from storage.
Exposure to at least one vendor lakehouse or query platform - Snowflake, Starburst, or Databricks — at the level where you can reason about its architecture, not just use its UI.
Strong experience with a distributed processing engine - Flink strongly preferred; Spark also fine. You can reason about its internals, fine-tune a running job, and debug a pipeline that’s silently degrading.
Familiarity with durable execution - Temporal, Restate, or similar - or at minimum a solid mental model of what durable execution means and why it matters for data workflows.
Production experience building and operating APIs (REST or gRPC) at scale - good instincts about contracts, versioning, retries, rate limiting, and observability.
Solid understanding of Kafka and event-driven architectures (producers/consumers, partitioning, delivery semantics).
Comfortable in regulated environments (healthcare, fintech, gov) where audit, compliance, and data governance are part of every design.
Platform mindset: you design for self-service, API-first, and with systems and agents - not only humans - as legitimate consumers.