Senior Site Reliability Engineer

hace 4 semanas

Argentina Glia A tiempo completo

About Glia Our award-winning technology powers conversations with customers for some of the world’s largest enterprises. We believe that combining the human touch with technology is the best way to create amazing customer experiences. When human abilities such as problem-solving, creative thinking and relationship building are enhanced with technology... magical moments happen. The Team You'll be joining our dedicated Infrastructure Team , which is responsible for the reliability, scalability, and performance of Glia’s cloud-native core infrastructure serving the conversational AI. Our team focuses on operational excellence and proactive problem-solving to ensure our systems are always available and performing optimally. All SREs on the team report to a dedicated Engineering Manager. Our work is driven by Objectives and Key Results, defined quarterly in collaboration with the Director of Engineering. All projects are planned, led, and executed by our engineers. Our SRE team is located primarily in Vancouver and Toronto and works in the Pacific Time zone (PT). We are optimized for remote collaboration and welcome candidates from anywhere in Canada. The Work As a Senior Site Reliability Engineer, your primary focus will be on the health and performance of our production services. Responsibilities will include: Defining, measuring, and reporting on Service Level Objectives (SLOs) and Service Level Indicators (SLIs) for key services. Partnering with development teams to establish error budgets and the operational consequences of their consumption. Writing software to automate production operations, eliminating manual toil and improving system resilience. Leading the incident response process for complex outages, including conducting blameless postmortems to drive systemic improvements. Engineering and improving deployment systems and CI/CD pipelines to increase release velocity while maintaining production stability. Conducting deep dives into system performance, engaging in capacity planning, and performing production readiness reviews. Developing and maintaining operational runbooks and incident response playbooks. Participating in a periodic on-call rotation as an escalation point for critical service interruptions. Our Tech Stack Infrastructure: AWS, Kubernetes (AWS EKS), Linkerd, EFK Persistence: Amazon Aurora Serverless for Postgres, RabbitMQ Cache: Amazon ElastiCache for Valkey Monitoring & Observability: DataDog with a focus on dashboards and alerts for system health. CI/CD: Github Actions, ArgoCD, Jenkins, Helm, with a focus on automation and pipeline optimization. Infrastructure as Code: Terraform Additionally, our Engineering teams use: Backend: Python, Elixir, Node.js, and Ruby Frontend: Javascript and React.js Native mobile SDKs: Java and Swift Candidate Requirements 5+ years of relevant experience in Site Reliability Engineering or a closely related discipline (e.g., DevOps, Platform Engineering, Infrastructure). Deep, practical understanding of Site Reliability Engineering (SRE) principles (SLOs, error budgets, toil reduction). Demonstrable experience analyzing and troubleshooting large-scale distributed systems. Expert-level proficiency with AWS and Kubernetes (EKS), particularly in areas of observability, networking, and auto-scaling. Strong software development skills in a language like Python or Go, used to build operational tools, services, or automation. Experience with modern observability platforms (e.g., DataDog, Prometheus) and a deep understanding of metrics, logging, and tracing. Expertise in designing and operating robust CI/CD pipelines for a microservices architecture (e.g., using ArgoCD, Github Actions, Helm). A systematic, data-driven approach to problem-solving and root cause analysis. We are insatiably curious and hungry for knowledge here at Glia. Even if you don’t meet all the requirements exactly, we encourage you to apply as long as you are passionate about mastering your craft and developing your skills. *Glia is an equal-opportunity employer. Glia does not discriminate against any employee or applicant because of race, creed, color, religion, gender, sexual orientation, gender identity/expression, national origin, disability, age, genetic information, veteran status, marital status, pregnancy or related condition (including breastfeeding), or any other basis protected by law. The Glia Talent Acquisition team uses@glia.com and @gliatalent.com , mailboxes for coordinating interviews, providing updates, and sending documents. Our hiring process involves an introduction, practical and team interviews, and a decision and offer. For more information, visit our Recruitment Privacy Notice page or contact our talent team via *Want to know more about working at Glia? Check our Glia’s Career FAQs #J-18808-Ljbffr

Senior Site Reliability Engineer

hace 1 semana

Argentina MAS Global Consulting A tiempo completo

Who We AreAt MAS Global Consulting, we are a premium digital engineering partner delivering technology solutions to some of the world's most innovative companies — from high-growth startups to Fortune 500 enterprises.With a people-first culture and a commitment to excellence, we combine nearshore talent, agile delivery, and technical depth to build...
Site Reliability Engineer: Scale, Reliability

hace 2 semanas

, , Argentina Capchase A tiempo completo

Join a forward-thinking company as a Site Reliability Engineer, where you'll play a crucial role in building scalable, high-performing systems. This position offers the opportunity to shape the future of reliability engineering while ensuring the availability, latency, and performance of our systems. You'll collaborate with a diverse team to define the...
Site Reliability Engineer

hace 2 semanas

, , Argentina AgileEngine A tiempo completo

Join to apply for the Site Reliability Engineer role at AgileEngine AgileEngine is an Inc. 5000 company that creates award-winning software for Fortune 500 brands and trailblazing startups across 17+ industries. We rank among the leaders in areas like application development and AI/ML, and our people-first culture has earned us multiple Best Place to Work...
Senior Devops

hace 2 semanas

, Chubut, Argentina Ingenierojob A tiempo completo

Senior DevOps / Site Reliability Engineer (Azure) (Ref-Lch) ARS 1.200.000 - 1.500.000 Senior DevOps / Site Reliability Engineer (Azure) (Ref-Lch) We are looking for a highly skilled Senior DevOps / Site Reliability Engineer with deep experience in Azure cloud, CI/CD automation, and secure workload identity. This role is ideal for someone who masters modern...
Site Reliability Engineer

hace 7 días

, , Argentina Capchase A tiempo completo

Join to apply for the Site Reliability Engineer role at Capchase . Capchase provides flexible payment solutions to B2B software, cloud, and AI companies. Our core product, Capchase Pay , offers a buy-now-pay-later payment option for B2B SaaS, hardware, and cloud purchases, helping companies sell more and collect cash faster. Founded in 2020 and headquartered...
Lead Site Reliability Engineer

hace 4 semanas

, , Argentina EPAM Systems A tiempo completo

1 week ago Be among the first 25 applicants We are expanding our Enterprise Technology team and seeking a Lead Site Reliability Engineer to oversee and enhance enterprise applications and its infrastructure. You will leverage your expertise in Site Reliability Engineering, cloud platforms like AWS and Azure, and CI/CD to ensure robust, secure, and scalable...
Site Reliability Engineer

hace 2 semanas

, , Argentina Semperti A tiempo completo

¡En Semperti nos encontramos en la búsqueda de SRE SSR para sumarse al team! El Site Reliability Engineer (SRE) tendrá la misión de ayudar a nuestros clientes a garantizar la disponibilidad de sus sistemas como así también de liderar la adopción de nuevas herramientas para contribuir a la mejora de sus procesos, trabajando con numerosas tecnologías...
Site Reliability Engineer

hace 1 día

Argentina Description Ciklum A tiempo completo

DescriptionCiklum is looking for a Site Reliability Engineer to join our team in Argentina.We are a custom product engineering company that supports both multinational organizations and scaling startups to solve their most complex business challenges. With a global team of over 4,000 highly skilled developers, consultants, analysts and product owners, we...
Senior Azure DevOps

hace 2 semanas

, Chubut, Argentina Ingenierojob A tiempo completo

A leading technology firm is seeking a Senior DevOps / Site Reliability Engineer to enhance their Azure cloud infrastructure. The role involves architecting and maintaining cloud environments, ensuring secure deployments, and optimizing reliability. Candidates should have solid experience with Azure and CI/CD automation, and demonstrate strong communication...
Senior Site Reliability Engineer

hace 4 semanas

, , Argentina Cloudbeds A tiempo completo

What Makes Us Unique At Cloudbeds, we’re not just building software, we’re transforming hospitality. Our intelligently designed platform powers properties across 150 countries, processing billions in bookings annually. From independent properties to hotel groups, we help hoteliers transform operations and uplevel their commercial strategy through a...

América

Europa

Asia / Oceanía

África

Senior Site Reliability Engineer