Site Reliability Engineer

Description

Join a core engineering group as Lead Site Reliability Engineer, designing and scaling Linux platforms that underpin ML/AI-driven trading. You will architect and own reliability for massive simulation, HPC, and production workloads—ensuring ultra-reliable, ultra-fast trading systems. This is a hands-on, leadership role focused equally on technical depth, strategic decision-making, and driving platform SRE excellence.

Key Responsibilities

  • Lead SRE practices for Linux platforms powering low-latency, high-throughput trading workloads.
  • Architect, optimize, and tune Linux for performance, resilience, and minimal latency.
  • Drive incident response, root cause analysis, and continuous reliability improvement across production systems.
  • Oversee system automation and reproducibility—build, deploy, and fleet-manage bare-metal Linux and containerized stacks.
  • Manage and enhance Kubernetes clusters, network configuration, and large-scale orchestration.
  • Set observability standards; expand monitoring, alerting, and performance metrics across platforms.
  • Analyze networking, kernel-level performance, and distributed systems—solving core challenges in a multi-petabyte, multi-cluster environment.
  • Build Python tools for automation, reliability engineering, and performance analysis.
  • Design highly distributed systems

Required Skills

  • Small, autonomous Linux SRE team with direct ownership and impact.
  • Collaborative engagement with quants, researchers, and trading experts to deliver robust platforms.
  • A culture built on deep technical ownership, learning, and high standards of performance engineering
  • Ultra-reliable, high-performance trading infrastructure where every engineering optimization affects performance
  • Next-generation simulation and HPC compute pipelines, supporting ML/AI workflows at scale.
  • Integration and continuous improvement of internal and open-source tools for automation and reliability.
  • Strategic platform direction: shaping foundational systems for critical infrastructure in an elite trading environment.

Preferred Qualifications

The ideal candidate comes from a top-tier tech environment (FAANG, elite trading, hyperscale infra). They have experience building technology 0→1, owning systems end-to-end, and working close to the metal. They will operate across everything from bare-metal Linux to modern build and observability stacks.

  • Deep Linux, Scripting – Python, DevOps, Kubernetes

Apply Today

Thank you for your interest in this opportunity. Please complete the form below and upload any relevant documents. A member of our team will review your application and be in touch soon.

Application Form