Cloud Reliability Engineer, Coverage

hace 2 meses


Córdoba, Argentina Avature A tiempo completo
Avature’s Coverage team is dedicated to maintaining and improving the quality of our monitoring tools and practices as applied during on-call shifts or other related incident-spotting endeavors. The scope of the team ranges from the management and continuous improvement of our servers and service monitoring and alerting to a holistic system reliability view.As a Cloud Reliability Engineer, you’ll strive to implement tools and processes that improve observability, monitoring, and incident management, minimize emergency response time, and provide a pain-free experience for the teams involved in incident management.

Your challenges and objectives:

Understand Avature’s infrastructure and processes. Contribute to defining standards with a DevOps/SRE mindset and advocate for them. Identify and address weaknesses in our infrastructure to ensure service availability. Develop strategies to mitigate and prevent interruptions in critical services. Occasionally perform troubleshooting on ongoing incidents. Work with development and engineering teams to implement SRE practices from the early stages of the software development life cycle.

Your day-to-day activities:

Participate in the definition and implementation of SRE policies and practices. Collaborate with other infrastructure and development teams in the continuous improvement of their services’ monitoring and observability. Engage in incident management, conducting post-mortem analyses and proposing preventive measures to avoid future disruptions. Occasionally perform troubleshooting on ongoing incidents. Research techniques and analyze metrics to streamline the way the teams access information about their systems.

About you:

Knowledge in observability: logs (ELK stack), metrics (e.g. Prometheus, Grafana), and tracing (e.g. Jaeger, OpenTelemetry). Experience creating and maintaining fault-tolerant and distributed systems. Solid experience in Linux system administration. Analytical and troubleshooting skills. Infrastructure-as-code mindset. Software development (Python, Golang) and configuration management (Puppet, Ansible) skills. Knowledge of incident management and related tools, such as Splunk On-Call.

About us:

Avature is a market leading enterprise SaaS Solution provider for global talent acquisition and talent management. We have a strong commitment to high quality engineering and customer service and are recognized innovators in the very large company market. We currently work with over 650 companies worldwide, including 110 of the Fortune 500, all of the Big Four consulting firms, the largest banks and manufacturers in the world, and five governments.We design, build, implement, and support our product ourselves. With 26 releases a year and a strong commitment to innovation and quality engineering, our private cloud platform has become the product choice for the very large global organization.At Avature, we value opportunities to learn and grow within a dynamic, creative, and collaborative environment. We encourage autonomy and empower our people to approach challenges innovatively while bringing their unique perspective to the table. We offer a career development program that supports continuous learning and thoughtful leadership, and that meaningfully impacts each individual’s professional trajectory.

What we offer:

A fast-paced, energetic, and engaging environment. Flexible hours. Work remotely or come by the office as much as you want. Four salary reviews per year. Option to earn part of your salary in US dollars. Three weeks vacations from the first year. Four weeks paternity leave. OSDE 310 health coverage (family plan). Four days a year to attend events related to professional development. End of year week off (December 26 to 31). Internet service expenses. Birthdays off. An organizational culture that empowers everyone to be themselves is key to thrive in business, but more importantly, it is a pathway for creating a more equitable society. Avature fosters a diverse and inclusive environment and celebrates that each unique person brings something different to our team. We are committed to considering all qualified applicants equally and to promoting equal opportunities within our organization.

  • Córdoba, Córdoba, Argentina Avature A tiempo completo

    Avature’s Coverage team is dedicated to maintaining and improving the quality of our monitoring tools and practices as applied during on-call shifts or other related incident-spotting endeavors. The scope of the team ranges from the management and continuous improvement of our servers and service monitoring and alerting to a holistic system reliability...


  • Córdoba, Córdoba, Argentina Avature A tiempo completo

    About the RoleAvature's Coverage team is dedicated to maintaining and improving the quality of our monitoring tools and practices as applied during on-call shifts or other related incident-spotting endeavors. As a Cloud Reliability Engineer, you'll strive to implement tools and processes that improve observability, monitoring, and incident management,...


  • Córdoba, Córdoba, Argentina Avature A tiempo completo

    About the RoleAvature's Coverage team is dedicated to maintaining and improving the quality of our monitoring tools and practices as applied during on-call shifts or other related incident-spotting endeavors. As a Cloud Reliability Engineer, you'll strive to implement tools and processes that improve observability, monitoring, and incident management,...


  • Córdoba, Córdoba, Argentina Avature A tiempo completo

    About the RoleAvature's Coverage team is dedicated to maintaining and improving the quality of our monitoring tools and practices as applied during on-call shifts or other related incident-spotting endeavors. As a Cloud Reliability Engineer, you'll strive to implement tools and processes that improve observability, monitoring, and incident management,...


  • Córdoba, Córdoba, Argentina Techunting A tiempo completo

    Job Title: Principal Site Reliability EngineerWe are seeking a highly skilled Principal Site Reliability Engineer to lead our SRE team and optimize infrastructure for high-performance applications. The ideal candidate will have a strong background in software development, automation, and cloud management.Key Responsibilities:Lead SRE team to ensure high...


  • Córdoba, Córdoba, Argentina Avature A tiempo completo

    About the Role: Avature's Coverage team is focused on enhancing and sustaining the quality of our monitoring systems and methodologies, particularly during on-call duties and related incident detection efforts. The team's responsibilities encompass the management and ongoing enhancement of our server and service monitoring, as well as a comprehensive view of...


  • Córdoba, Córdoba, Argentina Avature A tiempo completo

    Overview: At Avature, we are focused on enhancing the reliability and quality of our monitoring systems and practices during on-call duties and related incident management activities. Our Coverage team plays a crucial role in overseeing and continuously refining our server management, service monitoring, and alerting protocols to ensure a comprehensive view...

  • Site Reliability Engineer

    hace 4 semanas


    Córdoba, Córdoba, Argentina Internetwork Expert A tiempo completo

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Internetwork Expert. As a Site Reliability Engineer, you will play a critical role in ensuring the high availability, performance, and security of our internet-scale systems.Key ResponsibilitiesDesign and implement scalable and highly available systems to handle...


  • Córdoba, Córdoba, Argentina Avature A tiempo completo

    Overview: Avature's Coverage team is focused on enhancing the quality of our monitoring tools and practices, particularly during on-call shifts and incident detection efforts. The team's responsibilities encompass the management and ongoing enhancement of our server and service monitoring, as well as a comprehensive view of system reliability. Your...

  • Cloud Ops Engineer

    hace 4 meses


    Córdoba, Argentina Avature A tiempo completo

    The Cloud Ops team is a key part of Cloud Services, the area in charge of our infrastructure. As Cloud Engineer, you’ll be responsible for maintaining the resilience, availability, and reliability of the sophisticated infrastructure our systems rely on.Collaborating with teams from several engineering specializations, you’ll be in direct contact with the...


  • Córdoba, Córdoba, Argentina Intuition Machines, Inc. A tiempo completo

    Intuition Machines, Inc. leverages advanced AI/ML technologies to develop enterprise-grade security solutions. Our innovative research impacts systems that cater to hundreds of millions globally, supported by a diverse team operating from various locations. You may recognize our flagship product, the hCaptcha security suite. Our methodology is...


  • Córdoba, Argentina Techunting A tiempo completo

    We are seeking a highly skilled Principal Site Reliability Engineer with extensive experience in leading SRE teams and optimizing infrastructure for high-performance applications. The ideal candidate will have a strong background in software development, automation, and cloud management.Requirements:5+ years experience leading SRE team7+ years experience...


  • Córdoba, Córdoba, Argentina Intuition Machines, Inc. A tiempo completo

    Intuition Machines, Inc. leverages AI and machine learning to develop cutting-edge security solutions for enterprises. Our innovative research is applied to systems that cater to hundreds of millions of users globally, supported by a distributed team. One of our flagship products is the hCaptcha security suite, and our operational philosophy emphasizes...


  • Córdoba, Córdoba, Argentina Intuition Machines, Inc. A tiempo completo

    Intuition Machines, Inc. specializes in utilizing AI and machine learning to develop enterprise-level security solutions. Our innovations are applied to systems that cater to hundreds of millions of users globally, supported by a distributed team. You may recognize our flagship product, the hCaptcha security suite. Our methodology is straightforward: minimal...

  • Site Reliability Engineer

    hace 3 semanas


    Córdoba, Córdoba, Argentina Intuition Machines, Inc. A tiempo completo

    About the RoleWe are seeking a highly skilled Site Reliability Engineer to join our team at Intuition Machines, Inc. As a Site Reliability Engineer, you will play a critical role in ensuring the performance, availability, and security of our internet-scale systems.Key ResponsibilitiesDesign and implement solutions to enhance system performance, availability,...


  • Córdoba, Argentina Intuition Machines, Inc. A tiempo completo

    Intuition Machines uses AI/ML to build enterprise security products. We apply our research to systems that serve hundreds of millions of people, with a team distributed around the world. You are probably familiar with our best-known product, the hCaptcha security suite. Our approach is simple: low overhead, small teams, and rapid iteration.As a Site...


  • Córdoba, Córdoba, Argentina Avature A tiempo completo

    Avature is a sophisticated SaaS platform that relies on a robust infrastructure, combining AWS and our datacenters to stay performant, reliant, and secure. Cloud Engineers are fundamental, providing the tools and environments used throughout the development cycle. The Public Cloud Infrastructure team is in charge of covering all foundational aspects of our...

  • Site Reliability Engineer

    hace 2 semanas


    Córdoba, Córdoba, Argentina Internetwork Expert A tiempo completo

    {"h2": "About the Role", "p": "At Internetwork Expert, we're pushing the boundaries of AI/ML-powered security solutions. As a Site Reliability Engineer, you'll be at the forefront of building high-performance, secure, and cost-effective systems that serve millions of users worldwide. Our flat organization and customer-focused approach mean you'll work...


  • Córdoba, Córdoba, Argentina Intuition Machines, Inc. A tiempo completo

    We have a flat and highly customer-focused process, so you should be comfortable interacting directly with engineers at our large enterprise customers and startups alike when necessary, in conjunction with product, customer success, and sales teams. *** **What you’ll do**: - Work with large-scale systems (handling millions of requests per second, serving...

  • Cloud Engineer

    hace 4 meses


    Córdoba, Argentina Avature A tiempo completo

    Avature is a flexible product and building it requires using a variety of tools with an adaptable approach. Cloud Engineers are fundamental in that process, as they provide the tools and environments used throughout the development cycle. Additionally, as a bridge between our infrastructure and development teams, they are key to enable collaboration towards...