Extreme Networks Logo

Extreme Networks

Staff Cloud Operations Engineer – Monitoring Lead (9810)

Posted 4 Days Ago
Be an Early Applicant
Hybrid
Ontario, ON
Senior level
Hybrid
Ontario, ON
Senior level
The Staff Cloud Operations Engineer leads the design and implementation of monitoring strategies for cloud infrastructure, ensuring system health and operational excellence while analyzing performance data to identify improvements.
The summary above was generated by AI
There has never been a better time to join Extreme, with several acquisitions extending our portfolio and go to market strategy, we have seen enormous opportunity and growth within the region.
Aside from being a Technology Leader in the Gartner Magic Quadrant, we also adamantly promote an internal culture that truly embraces diversity, inclusion, and equality in the workplace. Having Diversity and Inclusion as part of our core values and beliefs, we’re proud to foster an environment where every Extreme employee can thrive because of their differences, not despite them.
 
Staff Cloud Operations Engineer – Monitoring Lead
 
We are seeking a highly skilled and experienced Staff Cloud Operations Engineer – Monitoring Lead to join our growing Cloud Operations team. In this critical role, you will be responsible for designing, implementing, and optimizing our comprehensive monitoring and alerting strategy across our cloud infrastructure and applications. You will drive proactive identification of issues, ensure system health, and contribute significantly to our operational excellence and reliability goals. We're looking for the best and the brightest 'A' players who want to make a difference doing a job they love.

  • Lead the design, implementation, and continuous improvement of our end-to-end monitoring and alerting framework for cloud infrastructure (AWS, Azure, GCP), applications, and services.
  • Define key performance indicators (KPIs), service level indicators (SLIs), and service level objectives (SLOs) for critical systems.
  • Evaluate, select, and integrate monitoring tools (e.g., Prometheus, Grafana, Datadog, Splunk, CloudWatch, Azure Monitor, GCP Operations Suite) to meet evolving needs.
  • Develop and implement automation scripts and tools (e.g., Python, Bash, PowerShell) to streamline monitoring deployment, configuration, and incident remediation.
  • Build and maintain dashboards, alerts, and reports that provide actionable insights into system performance, health, and availability.
  • Analyze monitoring data to identify performance bottlenecks, resource inefficiencies, and potential cost optimization opportunities.
  • Collaborate with engineering teams to implement performance improvements and cost-saving measures.
  • Create and maintain comprehensive documentation for monitoring systems, procedures, and best practices.
  • Proactively identify areas for improvement in our cloud operations and monitoring capabilities.
  • Provide 24* 7 support for Cloud services
  • Participate in cloud security and compliance implementation.

Ideal Qualifications:

  • BS level technical degree required; Computer Science or Engineering background preferred.
  • 8+ years of progressive experience in Cloud Operations, DevOps, or Site Reliability Engineering roles, with a strong focus on monitoring.
  • Deep expertise with at least one major public cloud platform (AWS, Azure, or Google Cloud Platform).
  • Proven experience as a technical lead or senior contributor in a monitoring-focused role.
  • Working knowledge of container-based architecture and deployment (Docker, Kubernetes.)
  • Extensive experience with various monitoring and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk, ELK Stack, vendor-specific monitoring solutions).
  • Excellent problem-solving, analytical, and troubleshooting skills.
  • Working knowledge of Elasticsearch, PostgreSQL, Redis, Ignite, Kafka and RabbitMQ.
  • Comfortable working within a distributed team located in multiple time zones.

Top Skills

AWS
Azure
Azure Monitor
Bash
Cloudwatch
Datadog
GCP
Gcp Operations Suite
Grafana
Powershell
Prometheus
Python
Splunk

Similar Jobs

3 Hours Ago
Hybrid
Mississauga, ON, CAN
Mid level
Mid level
Automotive • Hardware • Robotics • Software • Transportation • Manufacturing
As a Robotic Weld Technician, administer preventive maintenance and repair Fanuc robots, standardize welding processes, troubleshoot issues, and maintain equipment for optimal performance.
Top Skills: Fanuc ProgrammingLincoln WeldersMedar WeldersMig Welding EquipmentResistance Welding Equipment
11 Hours Ago
Remote
Hybrid
6 Locations
Senior level
Senior level
Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity
Seeking a Senior Backend Engineer for building cloud-native security solutions in a data-intensive environment, focusing on Next-Gen SIEM capabilities.
Top Skills: AWSDockerGoKafkaKubernetesOpensearchPostgresRedis
11 Hours Ago
Remote
7 Locations
Expert/Leader
Expert/Leader
Blockchain • Internet of Things • Payments • Cryptocurrency • Web3
As a Staff Software Engineer, you'll build scalable software for Data Products, improve architecture, and lead teams in decentralized infrastructure development.
Top Skills: AWSC++GCPGoJavaKafkaPostgresPythonTerraformTypescript

What you need to know about the Ottawa Tech Scene

The capital city of Canada and the nation's fourth-largest urban area, Ottawa has proven a rapidly growing global tech hub. With over 1,800 tech companies, many of which are leaders in their sectors, the city's tech talent now makes up more than 13 percent of its total workforce. This growth is driven not only by the big players like UL Solutions and Dropbox, but also by a thriving startup ecosystem, as new businesses emerge to follow in the footsteps of those that came before them.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account