Glia

Senior Site Reliability Engineer

Sorry, this job was removed at 02:12 p.m. (EST) on Saturday, Dec 06, 2025

Be an Early Applicant

Remote

Hiring Remotely in Canada

Remote

Hiring Remotely in Canada

Similar Jobs

Applied Systems

Senior Site Reliability Engineer

9 Days Ago

Remote or Hybrid

Canada

Senior level

Cloud • Insurance • Payments • Software • Business Intelligence • App development • Big Data Analytics

As a Senior Site Reliability Engineer, you will ensure software reliability and scalability, manage IAC, CI/CD, monitor systems, and mentor junior engineers while collaborating across teams.

Top Skills: AnsibleArgocdBashDatadogGithub ActionsGitlabGoHashicorp ConsulHelmKubernetesPackerPostgresPowershellPythonSQL ServerTerraformTypescript

Block

Senior Site Reliability Engineer

5 Days Ago

In-Office or Remote

Senior level

Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency

The Senior Site Reliability Engineer will enhance reliability of Block's platform, improve incident response using AI tools, and coordinate incident management. Responsibilities include building reliable systems, standardizing tools, and leading high-severity incidents during on-call rotations.

Top Skills: Amazon Web ServicesDatadogDynamoDBGrpcHTTPIstioJavaJSONKotlinKubernetesLaunchdarklyMySQLProtocol BuffersTerraformVitess

Circle (circle.so)

Senior Site Reliability Engineer

3 Days Ago

Easy Apply

Remote

Canada

Easy Apply

Senior level

Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software

The Senior Site Reliability Engineer will manage system incidents, improve monitoring and logging, optimize database infrastructure, and collaborate on scaling systems efficiently.

Top Skills: AWSClickhouseKubernetesMySQLPostgresRedis

About Glia

Glia is the leading AI customer service solution for banks and credit unions. Our platform unifies AI and human agents across every voice and digital conversation through our proprietary ChannelLess® Architecture. With AI for All™, organizations overcome the tradeoff between efficiency and experience by using AI to automate conversations and elevate service operations.

Valued at over $1 billion and named a Deloitte Technology Fast 500™ company for five consecutive years, Glia powers over 700 financial institutions and maintains an industry-leading 72 NPS. We're also certified as a Great Place to Work, with 98% employee satisfaction.

The Team

You'll be joining our dedicated Infrastructure Team, which is responsible for the reliability, scalability, and performance of Glia’s cloud-native core infrastructure serving the conversational AI. Our team focuses on operational excellence and proactive problem-solving to ensure our systems are always available and performing optimally.

All SREs on the team report to a dedicated Engineering Manager. Our work is driven by Objectives and Key Results, defined quarterly in collaboration with the Director of Engineering. All projects are planned, led, and executed by our engineers. Our SRE team is located primarily in Vancouver and Toronto and works in the Pacific Time zone (PT). We are optimized for remote collaboration and welcome candidates from anywhere in Canada.

The Work

As a Senior Site Reliability Engineer, your primary focus will be on the health and performance of our production services. Responsibilities will include:

Defining, measuring, and reporting on Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for key services.
Partnering with development teams to establish error budgets and the operational consequences of their consumption.
Writing software to automate production operations, eliminating manual toil and improving system resilience.
Leading the incident response process for complex outages, including conducting blameless postmortems to drive systemic improvements.
Engineering and improving deployment systems and CI/CD pipelines to increase release velocity while maintaining production stability.
Conducting deep dives into system performance, engaging in capacity planning, and performing production readiness reviews.
Developing and maintaining operational runbooks and incident response playbooks.
Participating in a periodic on-call rotation as an escalation point for critical service interruptions.

Our Tech Stack

Infrastructure: AWS, Kubernetes (AWS EKS), Linkerd, EFK
Persistence: Amazon Aurora Serverless for Postgres, RabbitMQ
Cache: Amazon ElastiCache for Valkey
Monitoring & Observability: DataDog with a focus on dashboards and alerts for system health.
CI/CD: Github Actions, ArgoCD, Jenkins, Helm, with a focus on automation and pipeline optimization.
Infrastructure as Code: Terraform

Additionally, our Engineering teams use:

Backend: Python, Elixir, Node.js, and Ruby
Frontend: Javascript and React.js
Native mobile SDKs: Java and Swift

Candidate Requirements

5+ years of relevant experience in Site Reliability Engineering or a closely related discipline (e.g., DevOps, Platform Engineering, Infrastructure).
Deep, practical understanding of Site Reliability Engineering (SRE) principles (SLOs, error budgets, toil reduction).
Demonstrable experience analyzing and troubleshooting large-scale distributed systems.
Expert-level proficiency with AWS and Kubernetes (EKS), particularly in areas of observability, networking, and auto-scaling.
Strong software development skills in a language like Python or Go, used to build operational tools, services, or automation.
Experience with modern observability platforms (e.g., DataDog, Prometheus) and a deep understanding of metrics, logging, and tracing.
Expertise in designing and operating robust CI/CD pipelines for a microservices architecture (e.g., using ArgoCD, Github Actions, Helm).
A systematic, data-driven approach to problem-solving and root cause analysis.

We are insatiably curious and hungry for knowledge here at Glia. Even if you don’t meet all the requirements exactly, we encourage you to apply as long as you are passionate about mastering your craft and developing your skills.

*Glia is an equal-opportunity employer. Glia does not discriminate against any employee or applicant because of race, creed, color, religion, gender, sexual orientation, gender identity/expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition (including breastfeeding), or any other basis protected by law.

The Glia Talent Acquisition team uses @glia.com and @gliatalent.com, mailboxes for coordinating interviews, providing updates, and sending documents. Our hiring process involves an introduction, practical and team interviews, and a decision and offer. For more information, visit our Recruitment Privacy Notice page or contact our talent team via [email protected]

*Want to know more about working at Glia? Check our Glia's Career FAQs

What you need to know about the Ottawa Tech Scene

The capital city of Canada and the nation's fourth-largest urban area, Ottawa has proven a rapidly growing global tech hub. With over 1,800 tech companies, many of which are leaders in their sectors, the city's tech talent now makes up more than 13 percent of its total workforce. This growth is driven not only by the big players like UL Solutions and Dropbox, but also by a thriving startup ecosystem, as new businesses emerge to follow in the footsteps of those that came before them.