Principal Site Reliability Engineer
hace 1 día
Groupon is a marketplace where customers discover new experiences and services everyday and local businesses thrive. To date we have worked with over a million merchant partners worldwide, connecting over 16 million customers with deals across various categories. In a world often dominated by e-commerce giants, we stand out as one of the few platforms uniquely committed to helping local businesses succeed on a performance basis.
Groupon is on a radical journey to transform our business with relentless pursuit of results. Even with thousands of employees spread across multiple continents, we still maintain a culture that inspires innovation, rewards risk-taking and celebrates success. The impact here can be immediate due to our scale and the speed of our transformation. We're a "best of both worlds" kind of company. We're big enough to have the resources and scale, but small enough that a single person has a surprising amount of autonomy and can make a meaningful impact.
About the Role
Groupon is modernizing its global platform — and reliability is at the center of that transformation. We're looking for a Principal Site Reliability Engineer to lead the evolution from reactive maintenance to predictive, AI-driven resilience.
You'll design intelligent, self-healing systems that prevent incidents before they happen, ensuring our customers enjoy fast, secure, and reliable experiences across millions of daily interactions.
Remote work model
Key Responsibilities:
- Architect and maintain self-healing systems with 99.9%+ availability targets.
- Use AI/ML to automate infrastructure governance and detect configuration or IaC anti-patterns.
- Implement adaptive SLIs/SLOs that evolve automatically from real-time data.
- Build AIOps-based observability and auto-remediation pipelines.
- Apply predictive modeling to forecast failures before they impact users.
- Lead chaos, performance, and resilience testing programs.
- Map platform and service behavior to revenue impact and drive improved revenue resilience through better infrastructure performance.
- Mentor engineers and drive reliability standards across teams.
- Partner with platform, data, and product teams to ensure stability aligns with business goals.
- Support major incident response, incident review, and participate in on-call rotations.
Key Requirements:
- 10+ years in software/systems engineering, including 5+ years in SRE or platform reliability.
- Strong experience with GCP (preferred) or AWS, Kubernetes, and Terraform.
- Proficiency in Python or Go for automation and tooling.
- Deep understanding of observability stacks (Prometheus, Grafana, OpenTelemetry) and service meshes (Istio, Envoy).
- Hands-on AIOps experience: anomaly detection, predictive analytics, ML-assisted operations.
- Strong communication and influencing skills — data over hierarchy.
Nice to Have:
- Experience with MLOps or large-scale data infrastructure.
- Exposure to FinOps or cloud cost optimization.
- Previous leadership of global incident response or SRE transformation programs.
What Success Looks Like
- 99.9%+ uptime sustained through predictive rather than reactive responses.
- Faster MTTR via automated detection and auto-remediation.
- Reliability insights used in leadership decisions.
- Mentorship leading to stronger reliability practices across teams.
We Are Interested In
- Technologists who see reliability as a product, not just a metric.
- Engineers who use AI/ML as a tool for scale and insight.
- Leaders who can balance innovation speed with operational excellence.
- Engineers who understand the entire e-commerce stack and how it impacts revenue.
What We Offer:
- The opportunity to work with cutting-edge technologies in a transformative environment.
- Professional growth and leadership development pathways tailored to your aspirations.
- A chance to leave a lasting impact by shaping the future of reliable and scalable systems.
Join us to push the boundaries of platform reliability and drive meaningful change in a fast-evolving digital world
#LI-Remote
Groupon is an AI-First Company
We're committed to building smarter, faster, and more innovative ways of working—and AI plays a key role in how we get there. We encourage candidates to leverage AI tools during the hiring process where it adds value, and we're always keen to hear how technology improves the way you work. If you're passionate about AI or curious to explore how it can elevate your role—you'll be right at home here.
Groupon's purpose is to build strong communities through thriving small businesses. To learn more about the world's largest local e-commerce marketplace, click here. You can also find out more about us in the latest Groupon news as well as learning about our DEI approach. If all of this sounds like something that's a great fit for you, then click apply and join us on a mission to become the ultimate destination for local experiences and services.
Beware of Recruitment Fraud: Groupon follows a merit-based recruitment process without charging job seekers any fees. We've noticed an increase in recruitment fraud, including fake job postings and fraudulent interviews and job offers aimed at stealing personal information or money. Be cautious of individuals falsely representing Groupon's Talent Acquisition team with fake job offers. If you encounter any suspicious job offers or interview calls demanding money, recognize these as scams. Groupon is not responsible for losses from such dealings. For legitimate job openings (and a sneak peek into life at Groupon), always check our official career website at Groupon Careers
-
Site Reliability Engineer III
hace 5 días
Remote - Argentina Smartek S.R.L A tiempo completoJoin the team leading the next evolution of virtual care. At Teladoc Health, you are empowered to bring your true self to work while helping millions of people live their healthiest lives.Here you will be part of a high-performance culture where colleagues embrace challenges, drive transformative solutions, and create opportunities for growth. Together,...
-
AI Engineer
hace 5 días
Remote, Argentina Motionpoint Corp A tiempo completoMarketfully.AI delivers AI-powered content intelligence that transforms how global brands create, adapt, and optimize multilingual marketing at scale. Unlike generic AI writing tools, Marketfully.AI combines cutting-edge GenAI with human expertise and deep localization capabilities to help enterprise marketing teams make smarter content decisions across...
-
FlipaClip - Senior iOS Engineer
hace 1 semana
Remote, Argentina Silver A tiempo completoThis position is open to candidates based anywhere within LATAM.FlipaClipOur Senior iOS Engineer is responsible for implementing new features and resolving issues and bugs. Focused on the overall stability and useability of the app by having a deep and clear understanding of the growing iOS ecosystem tools and libraries.Essential Duties and...
-
Solution Data Architect
hace 1 semana
Remote, Argentina Fusemachines A tiempo completoAbout FusemachinesFusemachines is a leading AI strategy, talent, and education services provider. Founded by Sameer Maskey Ph.D., Adjunct Associate Professor at Columbia University, Fusemachines has a core mission of democratizing AI. With a presence in 4 countries (Nepal, United States, Canada, and Dominican Republic and more than 400 full-time employees)....
-
Principal Software Engineer
hace 1 día
Argentina, Remote Sezzle A tiempo completoThe salary range for this role is $6,000 - $12,500 per month (Gross in USD) About Sezzle:With a mission to financially empower the next generation, Sezzle is revolutionizing the shopping experience beyond payments, blending cutting-edge tech with seamless, interest-free installment plans that make shopping smarter and more accessible. We're not just...
-
DevOps Engineer
hace 1 día
Argentina - Remote Particle41 A tiempo completoAs a DevOps Engineer (GCP) at Particle41, you will play a crucial role in enhancing our software development and IT operations processes. The ideal candidate will have a strong background in both software development and IT operations, with a focus on automating and streamlining processes to achieve efficient and reliable software delivery for our customers....
-
Data Engineer
hace 5 días
Argentina - Remote Particle41 A tiempo completoData EngineerParticle41 is seeking a talented and versatile Data Engineer to join our innovative team. As a Data Engineer, you will play a key role in designing, building, and maintaining robust data pipelines and infrastructure to support our clients' data needs. You will work on end-to-end data solutions, collaborating with cross-functional teams to ensure...
-
Sr. QA Engineer
hace 7 días
Remote (Argentina) Qu POS A tiempo completoAbout the RoleWe are seeking a Senior QA Automation Engineer with a strong background in backend testing, a passion for quality, and hands-on expertise in automation frameworks. You will be responsible for building and maintaining robust test suites for web services and APIs, ensuring the reliability, performance, and correctness of critical systems.The...
-
Infrastructure Systems Engineer
hace 2 semanas
Argentina (Remote) DataXstream A tiempo completoBased in Williamsburg, VA, DataXstream stands as a proud and dedicated SAP partner with over two decades of experience. We are relentlessly focused on innovating, rebuilding, and perfecting the most robust and user-friendly Order Management software available for the SAP ecosystem. As we continue to grow our impact and our team, we're seeking passionate...
-
Principal, Global Feasibility
hace 7 días
ARG-Remote, Argentina Syneos Health A tiempo completoPrincipal, Global FeasibilitySyneos Health is a leading fully integrated biopharmaceutical solutions organization built to accelerate customer success. We translate unique clinical, medical affairs and commercial insights into outcomes to address modern market realities. Our Clinical Development model brings the customer and the patient to the center of...