The Storage Benchmarking Engineer will design, execute, and analyze performance benchmarks spanning both industry-standard storage benchmarks (fio, vdbench, SPEC SFS 2020, IO500, SPC-1/SPC-2) and the emerging class of AI/ML storage workloads (MLPerf Storage, DLIO, and GPU-driven training/inference data pipelines). As AI has made storage a first-order bottleneck in the GPU data path, this role sits at the intersection of high-performance storage and large-scale AI infrastructure.
This position demands strong end-to-end performance troubleshooting across the entire stack — compute (including GPUs), network (including RDMA/InfiniBand and high-speed Ethernet), and storage — together with close collaboration across engineering, product management, marketing, and sales. The ideal candidate has hands-on experience with both classic storage benchmarks and AI data-pipeline benchmarking, a track record engaging benchmark standards organizations and communities, and exceptional communication and writing skills.
• Configure and scale the HPC/AI lab environment so all systems — including GPU servers, high-speed fabrics, and storage — achieve maximum efficiency and scale across a variety of test harnesses. Build robust automation so labs can be rapidly configured and reconfigured to meet the demands of different benchmarks.
• Design and execute storage performance benchmarks using industry-standard tools and methodologies, including fio, vdbench, SPEC SFS 2020, IO500, and SPC-1/SPC-2 (or similar).
• Design and execute AI/ML storage benchmarks, including MLPerf Storage, DLIO, and representative AI workloads — model training and checkpointing, inference and data ingest, RAG/vector-database access patterns, and GPU-driven I/O paths (e.g., GPUDirect Storage, NFS/RDMA). Characterize storage behavior against reference architectures such as NVIDIA DGX/SuperPOD and BasePOD.
• Perform end-to-end performance troubleshooting and debugging across compute, GPU, network, and storage components to pinpoint and resolve bottlenecks and achieve best-in-class results.
• Develop and maintain automated benchmarking workflows using tools like Ansible, Python, or Bash to ensure rapid provisioning and efficient, repeatable, reproducible results.
• Analyze benchmark results, generate detailed reports, and deliver actionable insights to engineering teams for product optimization.
• Collaborate with engineering, product management, marketing, and sales to align benchmarking efforts with product goals and customer needs.
• Engage directly with benchmark standards organizations (e.g., SPEC, SNIA, MLCommons) and communities to influence methodologies, drive submissions, and stay ahead of industry and AI infrastructure trends.
• Deliver high-impact presentations to internal teams, customers, and external stakeholders, translating complex technical data into clear narratives.
• Write technical marketing documents, whitepapers, and performance summaries to support product launches and customer engagement.
• Maintain comprehensive documentation of benchmarking processes, configurations, and results.
• We are primarily an in-office environment, and you will be expected to work from the Santa Clara office in compliance with Everpure's policies, unless you are on PTO, work travel, or other approved leave.