Shakudo Logo

Shakudo

Head of Site Reliability Engineering

Posted 18 Days Ago
Be an Early Applicant
Toronto, ON
Senior level
Toronto, ON
Senior level
Lead the Site Reliability Engineering team, ensuring uptime, reliability, and performance. Architect cloud-native infrastructures and foster operational excellence.
The summary above was generated by AI
About the Job & Shakudo

At Shakudo, we are building the world’s first operating system for data and AI. We use the term operating system in the truest sense of the word. Like iOS, Windows and Linux, Shakudo’s end-to-end OS offers ever-evolving, automatically operated, best-of-breed open-source components tailored to each business's unique needs.

The Role

We are hiring a Head of Site Reliability Engineering to lead the reliability, availability, and performance strategy of our platform. This role is ideal for someone who thrives on solving infrastructure challenges, scaling cloud-native systems, and building high-performance teams.You will work cross-functionally with engineering, product, and customer success to make Shakudo’s platform rock-solid and resilient for our customers around the world.

What You’ll Do

  • Build and lead the SRE function at Shakudo, setting goals, technical direction, and driving team culture
  • Own uptime, reliability, and incident response for our platform
  • Architect scalable infrastructure using Kubernetes, cloud-native tooling, and automation frameworks
  • Lead the design of observability, monitoring, and alerting systems to proactively detect and prevent issues
  • Create and enforce best practices for CI/CD, disaster recovery, and service-level objectives (SLOs)
  • Partner closely with engineering and product to ensure new features are reliable and production-ready
  • Mentor engineers and help instill a culture of operational excellence

What We're Looking For

  • 8+ years of experience in infrastructure, DevOps, or SRE roles with increasing responsibility
  • Proven experience scaling distributed systems in a high-availability, production environment
  • Expertise with Kubernetes, Terraform, containerization, and at least one major cloud provider (AWS preferred)
  • Strong knowledge of system design, networking, and reliability principles
  • Experience with observability tools (e.g., Prometheus, Grafana, Datadog) and incident response practices
  • Strong leadership and communication skills, with a hands-on, collaborative approach

Nice to Have

  • Experience supporting data pipelines, ML workloads, or complex orchestration systems
  • Familiarity with the data/ML tooling ecosystem (e.g., Airflow, dbt, Spark, Dremio,  etc.)
  • Previous experience in a startup or high-growth environment

Shakudo is an equal opportunity employer and encourages candidates of all backgrounds to apply. We foster diversity and inclusivity and welcome applications from a broad range of backgrounds and experiences.

Top Skills

AWS
Datadog
Grafana
Kubernetes
Prometheus
Terraform

Similar Jobs

An Hour Ago
Remote
19 Locations
Senior level
Senior level
Blockchain • Internet of Things • Payments • Cryptocurrency • Web3
The Senior Solutions Architect will drive growth for Chainlink, working closely with sales and product teams, providing technical guidance, making architectural recommendations, and delivering product demos and PoCs to prospective users.
Top Skills: BlockchainRustWeb3
9 Hours Ago
Hybrid
Vaughan, ON, CAN
Senior level
Senior level
Automotive • Hardware • Robotics • Software • Transportation • Manufacturing
Responsible for managing network infrastructure, ensuring network security, maintaining servers, implementing disaster recovery plans, and providing on-call technical support.
Top Skills: BashCad SoftwareCiscoLinuxMesMrpNetappPower AppsPower BIPowershellPythonSAPSQLVeeamVMwareWindows ServerWms
9 Hours Ago
Hybrid
St. Thomas, ON, CAN
Junior
Junior
Automotive • Hardware • Robotics • Software • Transportation • Manufacturing
The Weld Technician ensures quality by developing weld processes, troubleshooting equipment, and maintaining documentation while supporting manufacturing and program launches.
Top Skills: CalipersGas Metal Arc WeldingGd&TGmawHeight Gauges)IndicatorsIndustrial RobotsMeasuring Equipment (MicrometersProjection And Resistant Welding

What you need to know about the Ottawa Tech Scene

The capital city of Canada and the nation's fourth-largest urban area, Ottawa has proven a rapidly growing global tech hub. With over 1,800 tech companies, many of which are leaders in their sectors, the city's tech talent now makes up more than 13 percent of its total workforce. This growth is driven not only by the big players like UL Solutions and Dropbox, but also by a thriving startup ecosystem, as new businesses emerge to follow in the footsteps of those that came before them.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account