The Senior Infrastructure Engineer will design and maintain scalable systems, optimize AI workflows, automate infrastructure, and collaborate with research teams.
About the Role
We’re looking for a Senior Infrastructure Engineer to help us design, build, and scale the foundational architecture that powers our next-generation AI systems. This role is ideal for someone who thrives in a fast-paced, engineering-driven environment and finds joy in creating robust, elegant systems from scratch.
What You’ll Do
- Build and maintain stable, scalable, and highly available compute infrastructure, spanning cloud (AWS) and bare metal environments.
- Design and operate efficient storage solutions for large-scale AI training datasets and checkpoints.
- Develop high-performance online inference systems, optimizing for diverse GPU environments (e.g., H100, B200).
- Automate infra workflows to maximize reliability, observability, and performance across our platform.
- Collaborate closely with AI researchers and backend engineers to support evolving model deployment and experimentation needs.
- Lead and contribute to internal tooling, CI/CD pipelines (e.g., GitHub Actions), and monitoring infrastructure (e.g., Grafana, Prometheus, OpenTelemetry).
What We’re Looking For
- 3 years+ of experience in DevOps / SRE / Infra.
- Strong programming ability in Python or Golang (must be proficient in at least one).
- Production-level experience with Kubernetes in daily operations.
- Deep understanding of modern DevOps / SRE / Infra principles, especially around scalability, automation, and fault-tolerance.
- Hands-on experience with AWS services (e.g., EC2, S3, EKS, IAM, RDS, CloudFront).
- Ability to work independently, lead projects, and become a subject matter expert.
- Strong communication skills and a collaborative, self-motivated mindset.
Nice to Have
- A tinkerer’s spirit - you enjoy hacking, experimenting, and building things for fun or learning. Examples we value:
- Running your own home lab or mini data center.
- Building clever side projects or open-source tools.
- Writing high-quality technical articles or contributing to public repos.
- Developing lightweight, efficient tooling for infra monitoring or ops.
- Contributions to the open-source community or prior experience maintaining open/closed source systems.
- Familiarity with model training workflows (e.g., LLMs, GPUs, large data IO).
- Interest in working directly with AI researchers and understanding model performance trade-offs.
- Brain: We value intelligence and the pursuit of knowledge. Our team is composed of some of the brightest minds in the industry.
- Heart: We care deeply about our work, our users, and each other. Empathy and passion drive us forward.
- Gut: We trust our instincts and are not afraid to take bold risks. Innovation requires courage.
- Taste: We have a keen eye for quality and aesthetics. Our products are not just functional but also beautiful.
- Competitive salary, equity, and benefits package.
- Opportunity to work with a talented and passionate team at the forefront of AI and 3D technology.
- Flexible work environment, with options for remote and on-site work.
- Opportunities for fast professional growth and development.
- An inclusive culture that values creativity, innovation, and collaboration.
- Unlimited, flexible time off.
Benefits
- Competitive salary, benefits and stock options.
- 401(k) plan for employees.
- Comprehensive health, dental, and vision insurance.
- The latest and best office equipment.
Top Skills
AWS
Ci/Cd
Github Actions
Go
Grafana
Kubernetes
Opentelemetry
Prometheus
Python
Similar Jobs
Information Technology • Internet of Things
Lead the development of tooling and workflows for EvenUp's infrastructure team. Collaborate with stakeholders, mentor others, and contribute to system design and enhancements.
Top Skills:
AWSBigQueryCi/CdCloud SecurityDockerElasticsearchGCPGrafanaKubernetesLoggingMlopsMonitoringNode.jsPrometheusPythonQaTensorboardTerraform
Information Technology
The Senior Infrastructure Engineer will develop standards and systems for infrastructure, focusing on reliability and efficiency, while collaborating with engineering teams to enhance development processes and security.
Top Skills:
AuroraAWSCi/CdDockerGoKubernetesOpensearchPostgresPythonRedshiftTerraform
Robotics • Automation • Manufacturing
The Senior Software Developer will design and optimize software infrastructure for autonomous cleaning robots, focusing on Linux development, C/C++, and automation tools.
Top Skills:
C/C++ContainerizationIotLinuxMqttPythonRoboticsShell
What you need to know about the Ottawa Tech Scene
The capital city of Canada and the nation's fourth-largest urban area, Ottawa has proven a rapidly growing global tech hub. With over 1,800 tech companies, many of which are leaders in their sectors, the city's tech talent now makes up more than 13 percent of its total workforce. This growth is driven not only by the big players like UL Solutions and Dropbox, but also by a thriving startup ecosystem, as new businesses emerge to follow in the footsteps of those that came before them.