Senior Site Reliability Engineer, Observability
hace 1 semana
**About Us**
The Observability Team enables Chainlink development and empowers engineers to continue building and supporting crucial products and services that have a profound impact in the blockchain industry. Reliability is vital to the success of our company. As a Senior SRE, you will help us accelerate and enable other engineering teams by increasing self-service and decreasing cognitive load.
This job would be perfect for someone who has a strong DevOps mentality, is passionate about building and maintaining a mature GitOps environment, and has experience focusing on observability. The entire engineering team is expanding, and you would have plenty of opportunities to build, learn, and grow.
We all have different backgrounds and are determined to help you succeed no matter where you are or who you are. If you think you would do a great job at Chainlink, we are looking forward to speaking with you, even if you don't match 100% of the job requirements: those describe people we've usually had a great time working with, but they're not a tick-box exercise.
**Your Impact**
- Build and orchestrate Modern OTEL-based Observability Platform
- Support multiple telemetry types, like metrics, logs and traces.
- Define and support modern governance in observability and problems at scale.
- Ensure reliability, security, and performance exceed our defined SLAs
- Work with engineers from across the company to help troubleshoot issues, deploy new products and services, and increase velocity while decreasing cognitive load
- Lead the design and deployment of monitoring/observability services to detect and alert the team of needed action.
- Ingest, aggregate, transform, and utilize data from a multitude of sources in our real time data pipeline.
- Oversee the availability, performance, and supportability of our observability infrastructure.
- Create processes around alert response operations and support the team to ensure the reliable delivery of oracle data.
- Make recommendations to ensure sufficient metrics are collected to create alerts with every new feature release.
- Champion reliability and security by taking the time to do your work right the first time
**Requirements**:
- 7+ years of relevant professional experience. You probably have worked on a devops, infrastructure, SRE, and/or platform team before
- Ability to develop software outside of the scope of typical infrastructure requirements and configurations
- Experience programming in C, C++, Java, Python, Go, Perl, or Ruby
- Expert knowledge in all aspects of designing, developing, and managing large real-time systems
- Experience with monitoring and logging. You know how to export metrics using Prometheus, have built a Grafana dashboard or two, and have experience with a centralized logging solution like an ELK Stack, Splunk or Grafana Stack.
- Experience with distributed systems and container orchestration. You have maintained or even built Kubernetes clusters before and feel comfortable deploying completely new services on them
- Strong communication skills. You can give and receive constructive feedback, and you do not shy away from planning meetings and code reviews
**Desired Qualifications**
- Excitement for blockchain, Web 3.0, and similar decentralized technologies.
- Experience running any infrastructure in the blockchain/web3 space
- Ability to scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity
- Experience working remotely in a distributed team
- A strong desire to grow and challenge yourself. We would expect you to constantly find ways to improve and automate services to reduce toil
**Some of the tools and services we use daily or almost daily are**:
- AWS; Terraform/Terragrunt; Kubernetes, Calico and ArgoCD; Prometheus and Grafana; GitHub Actions; Packer
- We expect you to be comfortable with most of those tools and very proficient in several of them.
- All roles with Chainlink Labs are global and remote-based. Unless otherwise stated, we ask that you try to overlap some working hours with Eastern Standard Time (EST)._
**Commitment to Equal Opportunity
-
Senior DevOps
hace 2 semanas
Buenos Aires, Argentina Itps A tiempo completoSenior DevOps / Site Reliability Engineer (Azure) (Ref-Lch) We are looking for a highly skilled Senior DevOps / Site Reliability Engineer with deep experience in Azure cloud, CI/CD automation, and secure workload identity. This role is ideal for someone who masters modern DevOps practices, understands cloud architecture at scale, and can lead the design and...
-
Site Reliability Engineer
hace 4 días
Buenos Aires, Argentina Wise Athena A tiempo completo**Join Our Team as an SRE!** Wise Athena looking for a **Site Reliability Engineer (SRE)** to join our dynamic and innovative team! At our company, we’re revolutionizing Revenue Growth Management (RGM) with the power of AI. You will work with a passionate, forward-thinking team. This is a fully remote position. **Key Responsibilities** - **Problem...
-
Site Reliability Engineer
hace 3 horas
Buenos Aires, Argentina CONEXIONHR A tiempo completoWe are looking for a mission-driven Site Reliability Engineer to support and scale the infrastructure powering our secure, mission-critical SaaS platform. Our architecture spans traditional Windows-based .NET/IIS apps and modern cloud-native services using AWS, Docker, Kubernetes and Terraform. You’ll play a key role in ensuring uptime, reliability, and...
-
Senior Site Reliability Engineer
hace 1 semana
Capital Federal, Buenos Aires, Argentina Business Commercial Management A tiempo completoBCM Uruguay is Hiring! Senior Site Reliability Engineer - Remote Remote - LATAM **English Level**: B2+ / C1 - Advanced Contractor - USD ⏱ Full-Time Para empresa multinacional de servicios en ingeniería digital, especialista en software de última generación y en desarrollo de productos digitales. Cuando una idea aparece, nacen la motivación y el deseo...
-
Site Reliability Engineer for Adfs
hace 7 días
Buenos Aires, Argentina JPMorganChase A tiempo completoPlay a key role in ensuring system reliability at one of the world’s most iconic and largest financial institutions. As a Site Reliability Engineer II at JPMorgan Chase within the [insert LOB or sub LOB], you will use technology to solve business problems and leverage software engineering best practices as we strive towards excellence. This role often...
-
Site Reliability Engineer
hace 1 semana
Buenos Aires, Argentina Right Balance A tiempo completo**Overview** We're looking for a Site Reliability Engineer. Headquartered in Los Angeles, California, Right Balance provides top-tier technology talent for innovative companies in the US. We’re in the top 50 companies to watch in LA. **Engagement Details** Our client is a USA-based company producing video solutions with the mission to advance scientific...
-
Site Reliability Engineer Ii
hace 7 días
Buenos Aires, Argentina JPMorganChase A tiempo completoAs a Site Reliability Engineer II at JPMorgan Chase within the Global Finance Technology group, you will use technology to solve business problems and leverage software engineering best practices as we strive towards excellence. This role often works independently to execute small to medium projects, but you’ll also have the opportunity to collaborate with...
-
MySQL Site Reliability Engineer Ii
hace 1 semana
Buenos Aires, Argentina JPMorgan Chase & Co A tiempo completo**JOB DESCRIPTION** As a Site Reliability Engineer II at JPMorgan Chase, you will use technology to solve business problems and leverage software engineering best practices as we strive towards excellence. This role often works independently to execute small to medium projects, but you’ll also have the opportunity to collaborate with cross functional teams...
-
Site Reliability Engineer
hace 2 semanas
Buenos Aires, Argentina Careers at SunDevs A tiempo completo**Descripción del puesto**: Como Site Reliability Engineer en SunDevs, colaborarás con otros ingenieros de software senior y Platform Engineers para diseñar y desarrollar sistemas y plataformas en la nube altamente disponibles, escalables, seguras y mantenibles para resolver grandes desafíos. Brindarás asesoramiento y guía a nuestros ingenieros de...
-
Senior Site Reliability Engineer
hace 2 días
Buenos Aires, Argentina Cabify A tiempo completoDo you want to change the world? At Cabify, that's what we're doing. We aim to make cities better places to live by improving mobility for the people living in them, connecting riders to drivers, providing mobility alternatives such as scooters and mopeds and many others to come, all at the touch of a button. Maybe one day cities will be places where nobody...