Calix

Staff AI Ops Engineer

Reposted 2 Days Ago

Remote

Hiring Remotely in Canada

Senior level

Remote

Hiring Remotely in Canada

Senior level

The role involves designing and maintaining infrastructure for machine learning applications, deploying ML pipelines, optimizing resources on GCP, and ensuring system observability.

The summary above was generated by AI

The Calix platform enables Communication Service Providers (CSPs) of all sizes to transform and future-proof their businesses. Through real-time data, automation, and actionable insights delivered via Calix One — our cloud-first, AI-powered platform — CSPs can simplify operations, collapse cost, and accelerate innovation. Calix One brings together the automation of everything and the experience of one, empowering customers to deliver differentiated subscriber experiences while driving acquisition, loyalty, and revenue growth. This is the Calix mission: to enable CSPs of all sizes to simplify, innovate, and grow, strengthening both their businesses and the communities they serve.
We’re at the forefront of a once in a generational change in the broadband industry. Join us as we innovate, help our customers reach their potential, and connect underserved communities with unrivaled digital experiences.

Calix is where passionate innovators come together with a shared mission: to reimagine broadband experiences and empower communities like never before. As a true pioneer in broadband technology, we ignite transformation by equipping service providers of all sizes with an unrivaled platform, state-of-the-art cloud technologies, and AI-driven solutions that redefine what’s possible. Every tool and breakthrough we offer is designed to simplify operations and unlock extraordinary subscriber experiences through innovation.

Calix is seeking a highly skilled Staff AI Ops Engineer with hands-on experience with GCP to join our cutting-edge AI/ML team. In this role, you will be responsible for building, scaling, and maintaining the infrastructure that powers our machine learning and generative AI applications. You will work closely with data scientists, ML engineers, and software developers to ensure our ML/AI systems are robust, efficient, and production ready.

This is a remote-based position that can be located anywhere in the United States or Canada. Please note that as part of the recruitment and hiring process, there is an in-person meeting that will take place.

Key Responsibilities:

Design, implement, and maintain scalable infrastructure for ML and GenAI applications
Deploy, operate, and troubleshoot production ML/GenAI pipelines/services
Build and optimize CI/CD pipelines for ML model deployment and serving
Scale compute resources across CPU/GPU architectures to meet performance requirements
Implement container orchestration with Kubernetes
Architect and optimize cloud resources on GCP for ML training and inference
Setup and maintain runtime frameworks and job management systems (Airflow, KubeFlow, MLflow, etc.)
Establish monitoring, logging and alerting for systems observability
Optimize system performance and resource utilization for cost efficiency
Develop and enforce AIOps best practices across the organization

Qualifications:

Bachelor's degree in Computer Science, Information Technology, or a related field (or equivalent experience).
8+ years of overall software engineering experience
3+ years of focused experience in DevOps/AIOps or similar ML infrastructure roles
Proficient in IaC, using Terraform.
Strong experience with containerization and orchestration using Docker and Kubernetes
Demonstrated expertise in cloud infrastructure management on GCP
Proficiency with workflow management such as Airflow & Kubeflow
Strong CI/CD expertise with experience implementing automated testing and deployment pipelines
Experience with scaling distributed compute architectures utilizing various accelerators (CPU/GPU)
Solid understanding of system performance optimization techniques
Experience implementing comprehensive observability solutions for complex systems
Knowledge of monitoring and logging tools (Prometheus, Grafana, ELK stack).
Strong proficiency in Python
Familiarity with ML frameworks such as PyTorch and ML platforms like Vertex AI
Excellent problem-solving skills and ability to work independently
Strong communication skills and ability to work effectively in cross-functional teams

#LI-Remote

The base pay range for this position varies based on the geographic location. More information about the pay range specific to candidate location and other factors will be shared during the recruitment process. Individual pay is determined based on location of residence and multiple factors, including job-related knowledge, skills and experience.

San Francisco Bay Area:

156,400 - 265,700 USD Annual

All Other US Locations:

136,000 - 231,000 USD Annual

As a part of the total compensation package, this role may be eligible for a bonus. For information on our benefits click here.

Top Skills

Airflow

Docker

Elk Stack

GCP

Grafana

Kubeflow

Kubernetes

Mlflow

Prometheus

Python

PyTorch

Terraform

Vertex Ai

Similar Jobs

Grafana Labs

Artificial Intelligence Engineer

9 Days Ago

Easy Apply

Remote

Canada

Easy Apply

Senior level

Software

Develop AI solutions and enhance observability data using AI-powered features. Collaborate cross-functionally, iterate rapidly, and take ownership of AI projects while ensuring scalability and impact.

Top Skills: AIAWSAzureDockerGCPGenaiKubernetesLlmsTerraform

Webflow

Application Security Engineer

3 Hours Ago

Easy Apply

Remote

Easy Apply

Senior level

Artificial Intelligence • Enterprise Web • Software • Design • Generative AI

As a Senior Application Security Engineer, you will secure Webflow's applications, implement secure development practices, and mentor junior engineers.

Top Skills: Ai Coding AgentsApplication SecurityDastPenetration TestingSastSca Supply ChainSecure CodingThreat Modeling

Veeva

Operations Associate

7 Hours Ago

In-Office or Remote

Entry level

Big Data • Cloud • Healthtech • Software • Big Data Analytics

As a Talent Operations Associate, you will manage interview scheduling, candidate experiences, and improve the recruitment process in a fast-paced environment.

Top Skills: Applicant Tracking SystemData Analytics

What you need to know about the Ottawa Tech Scene

The capital city of Canada and the nation's fourth-largest urban area, Ottawa has proven a rapidly growing global tech hub. With over 1,800 tech companies, many of which are leaders in their sectors, the city's tech talent now makes up more than 13 percent of its total workforce. This growth is driven not only by the big players like UL Solutions and Dropbox, but also by a thriving startup ecosystem, as new businesses emerge to follow in the footsteps of those that came before them.