Orion Innovation Logo

Orion Innovation

Senior Site Reliability Engineer

Reposted 25 Days Ago
Be an Early Applicant
In-Office
Toronto, ON
Senior level
In-Office
Toronto, ON
Senior level
The Senior Site Reliability Engineer will ensure system reliability and performance, implement observability solutions, and maintain documentation in classified environments.
The summary above was generated by AI

Orion Innovation is a premier, award-winning, global business and technology services firm.  Orion delivers game-changing business transformation and product development rooted in digital strategy, experience design, and engineering, with a unique combination of agility, scale, and maturity.  We work with a wide range of clients across many industries including financial services, professional services, telecommunications and media, consumer products, automotive, industrial automation, professional sports and entertainment, life sciences, ecommerce, and education.

Job Overview

The Sr. SRE will be responsible for the reliability, scalability, and performance of systems supporting classified government projects in an air-gapped deployment. This role leverages advanced monitoring and DevOps tools to ensure uptime and compliance in a disconnected environment.

Key Responsibilities

  • Design and maintain highly reliable systems using RKE2, Kubernetes, Ingress, Kong, Artifactory, and Sonar.
  • Implement observability solutions with Prometheus, Grafana, Splunk, and Elastic to monitor system health in an air-gapped setting.
  • Ensure compliance and performance optimization across multi-tenant deployments.
  • Conduct code quality analysis and security assessments using Sonar.
  • Collaborate with the Lead and Infra/Security Specialists to resolve incidents and improve system resilience.
  • Develop and maintain documentation for system configurations and recovery procedures in a classified environment.

Required Skills and Qualifications

  • Expertise in RKE2, Kubernetes, Ingress, Kong, Artifactory, Prometheus, Grafana, Splunk, Elastic, and Sonar.
  • Strong background in site reliability engineering and system observability.
  • Experience working in air-gapped environments with a focus on classified data protection.
  • Proficiency in troubleshooting and optimizing complex, multi-tenant infrastructures.

Preferred Qualifications

  • SRE or DevOps certifications (e.g., CKAD, CKA).
  • Prior experience with government or defense-related SRE roles.
  • Must be eligible for up to a Top Secret Security Clearance.

Orion is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, creed, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, citizenship status, disability status, genetic information, protected veteran status, or any other characteristic protected by law.

Candidate Privacy Policy

Orion Systems Integrators, LLC and its subsidiaries and its affiliates (collectively, “Orion,” “we” or “us”) are committed to protecting your privacy. This Candidate Privacy Policy (orioninc.com) (“Notice”) explains:

  • What information we collect during our application and recruitment process and why we collect it;
  • How we handle that information; and
  • How to access and update that information.

Your use of Orion services is governed by any applicable terms in this notice and our general Privacy Policy.


Top Skills

Artifactory
Elastic
Grafana
Ingress
Kong
Kubernetes
Prometheus
Rke2
Sonar
Splunk

Similar Jobs

2 Days Ago
In-Office
2 Locations
Senior level
Senior level
Artificial Intelligence • Cloud • Information Technology • Software • Big Data Analytics
The role involves operating and scaling Kong's SaaS platform, building automated infrastructure, optimizing multi-region data layers, enhancing observability, and ensuring reliability across services.
Top Skills: ArgocdAWSAzureBashClickhouseDatadogDruidGCPGoGrafanaHelmKubernetesPostgresPrometheusPythonRedisTerraformTerragruntThanos
25 Days Ago
In-Office
Toronto, ON, CAN
Senior level
Senior level
Big Data • Cloud • Digital Media • Machine Learning • Mobile • Software • Industrial
The Senior Site Reliability Engineer manages AWS infrastructure, ensuring reliability and performance. Responsibilities include architecture, cloud automation, CI/CD processes, and operational support.
Top Skills: AWSBashCloudFormationDatadogDockerElk StackGitGoGrafanaGroovyJavaJenkinsKafkaNode.jsPythonServicenowSplunkSpring BootTerraformUnix
13 Days Ago
In-Office or Remote
9 Locations
Senior level
Senior level
Blockchain • Internet of Things • Payments • Cryptocurrency • Web3
As a Senior Site Reliability Engineer, you'll build observability platforms, support telemetry types, ensure reliability and security, and collaborate with engineers to deploy services.
Top Skills: AWSCC++Elk StackGithub ActionsGoGrafanaJavaKubernetesPackerPerlPrometheusPythonRubySplunkTerraform

What you need to know about the Ottawa Tech Scene

The capital city of Canada and the nation's fourth-largest urban area, Ottawa has proven a rapidly growing global tech hub. With over 1,800 tech companies, many of which are leaders in their sectors, the city's tech talent now makes up more than 13 percent of its total workforce. This growth is driven not only by the big players like UL Solutions and Dropbox, but also by a thriving startup ecosystem, as new businesses emerge to follow in the footsteps of those that came before them.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account