Senior Site Reliability Engineer, Observability

hace 2 semanas

Buenos Aires, Buenos Aires C.F., Argentina Chainlink Labs A tiempo completo

About Chainlink
Chainlink is the industry-standard oracle platform bringing the capital markets onchain and powering the majority of decentralized finance (DeFi). The Chainlink stack provides the essential data, interoperability, compliance, and privacy standards needed to power advanced blockchain use cases for institutional tokenized assets, lending, payments, stablecoins, and more. Since inventing decentralized oracle networks, Chainlink has enabled tens of trillions in transaction value and now secures the vast majority of DeFi.

Many of the world's largest financial services institutions have also adopted Chainlink's standards and infrastructure, including Swift, Euroclear, Mastercard, Fidelity International, UBS, S&P Dow Jones Indices, FTSE Russell, WisdomTree, ANZ, and top protocols such as Aave, Lido, GMX and many others. Chainlink leverages a novel fee model where offchain and onchain revenue from enterprise adoption is converted to LINK tokens and stored in a strategic Chainlink Reserve. Learn more at

The Observability Team enables Chainlink development and empowers engineers to continue building and supporting crucial products and services that have a profound impact in the blockchain industry. Reliability is vital to the success of our company. As a Senior SRE, you will help us accelerate and enable other engineering teams by increasing self-service and decreasing cognitive load.

This job would be perfect for someone who has a strong DevOps mentality, is passionate about building and maintaining a mature GitOps environment, and has experience focusing on observability. The entire engineering team is expanding, and you would have plenty of opportunities to build, learn, and grow.

We all have different backgrounds and are determined to help you succeed no matter where you are or who you are. If you think you would do a great job at Chainlink, we are looking forward to speaking with you, even if you don't match 100% of the job requirements: those describe people we've usually had a great time working with, but they're not a tick-box exercise.

Your Impact

Build and orchestrate Modern OTEL-based Observability Platform
Support multiple telemetry types, like metrics, logs and traces.
Define and support modern governance in observability and problems at scale.
Ensure reliability, security, and performance exceed our defined SLAs
Work with engineers from across the company to help troubleshoot issues, deploy new products and services, and increase velocity while decreasing cognitive load
Lead the design and deployment of monitoring/observability services to detect and alert the team of needed action.
Ingest, aggregate, transform, and utilize data from a multitude of sources in our real time data pipeline.
Oversee the availability, performance, and supportability of our observability infrastructure.
Create processes around alert response operations and support the team to ensure the reliable delivery of oracle data.
Make recommendations to ensure sufficient metrics are collected to create alerts with every new feature release.
Champion reliability and security by taking the time to do your work right the first time

Requirements

7+ years of relevant professional experience. You probably have worked on a devops, infrastructure, SRE, and/or platform team before
Ability to develop software outside of the scope of typical infrastructure requirements and configurations
Experience programming in C, C++, Java, Python, Go, Perl, or Ruby
Expert knowledge in all aspects of designing, developing, and managing large real-time systems
Experience with monitoring and logging. You know how to export metrics using Prometheus, have built a Grafana dashboard or two, and have experience with a centralized logging solution like an ELK Stack, Splunk or Grafana Stack.
Experience with distributed systems and container orchestration. You have maintained or even built Kubernetes clusters before and feel comfortable deploying completely new services on them
Strong communication skills. You can give and receive constructive feedback, and you do not shy away from planning meetings and code reviews

Desired Qualifications

Excitement for blockchain, Web 3.0, and similar decentralized technologies.
Experience running any infrastructure in the blockchain/web3 space
Ability to scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity
Experience working remotely in a distributed team
A strong desire to grow and challenge yourself. We would expect you to constantly find ways to improve and automate services to reduce toil

Some of the tools and services we use daily or almost daily are:

AWS; Terraform/Terragrunt; Kubernetes, Calico and ArgoCD; Prometheus and Grafana; GitHub Actions; Packer
We expect you to be comfortable with most of those tools and very proficient in several of them.

All roles with Chainlink Labs are global and remote-based. Unless otherwise stated, we ask that you try to overlap some working hours with Eastern Standard Time (EST).
We carefully review all applications and aim to provide a response to every candidate within two weeks after the job posting closes.
The closing date is listed on the job advert, so we encourage you to take the time to thoughtfully prepare your application.
We want to fully consider your experience and skills, and you will hear from us regarding the status of your application shortly after the closing date.
Commitment to Equal Opportunity
Chainlink Labs is an equal opportunity employer. All qualified applicants will receive equal consideration for employment in compliance with applicable laws, regulations, or ordinances. If you need assistance or accommodation due to a disability or special need when applying for a role or in our recruitment process, please contact us via this form.

Global Data Privacy Notice for Job Candidates and Applicants
Information collected and processed as part of your Chainlink Labs Careers profile, and any job applications you choose to submit is subject to our Privacy Policy. By submitting your application, you are agreeing to our use and processing of your data as required.

Site Reliability Engineering

hace 17 horas

Buenos Aires, Buenos Aires C.F., Argentina Capgemini A tiempo completo

At Capgemini Engineering, the world leader in engineering services, we bring together a global team of engineers, scientists, and architects to help the world's most innovative companies unleash their potential. From autonomous cars to life-saving robots, our digital and software technology experts think outside the box as they provide unique R&D and...
Site Reliability Engineer

hace 2 semanas

Buenos Aires, Buenos Aires C.F., Argentina Blockscout Limited A tiempo completo

Blockscout is a leading provider of indexing and UI services for EVM chains. Our team hosts explorers for many of the largest chains in the industry. Reliability is vital to our company's success. We are looking for a Site Reliability Engineer to strengthen our DevOps and Support teams.Key responsibilitiesMonitor systems: Proactively watch production systems...
Site Reliability Engineer

hace 2 semanas

Buenos Aires, Buenos Aires C.F., Argentina Sur A tiempo completo

As the Site Reliability Engineer you will support and scale the infrastructure powering their secure, mission-critical SaaS platform. You must be confident in operating and debugging both modern infrastructure (cloud-native, containerized services) and classic Windows production environments (IIS, SQL Server AlwaysOn, Service Broker), with the ability to...
Site Reliability Engineer

hace 2 semanas

Buenos Aires, Buenos Aires C.F., Argentina Sur A tiempo completo

As the Site Reliability Engineer you will support and scale the infrastructure powering their secure, mission-critical SaaS platform.You must be confident in operating and debugging both modern infrastructure (cloud-native, containerized services) and classic Windows production environments (IIS, SQL Server AlwaysOn, Service Broker), with the ability to...
Site Reliability Engineer

hace 2 semanas

Buenos Aires, Buenos Aires C.F., Argentina DevRev A tiempo completo

DevRevAt DevRev, we're building the future of work with Computer – your AI teammate.Computer is not just another tool. It's built on the belief that the future of work should be about genuine human connection and collaboration – not piling on more apps.Computer is the best kind of teammate: it amplifies your strengths, takes repetition and frustration...
Site Reliability Engineer

hace 2 semanas

Buenos Aires, Buenos Aires C.F., Argentina DevRev A tiempo completo

DevRevAt DevRev, we're building the future of work with Computer – your AI teammate.Computer is not just another tool. It's built on the belief that the future of work should be about genuine human connection and collaboration – not piling on more apps.Computer is the best kind of teammate: it amplifies your strengths, takes repetition and frustration...
Site Reliability Engineer

hace 1 semana

Buenos Aires, Buenos Aires C.F., Argentina Grid Dynamics A tiempo completo

We are looking for aSite Reliability Engineerto join a new team at one of our clients, a major American pet care retailer offering supplies, services, and care solutions. This is an opportunity to join a large, well-established organization that combines retail, services, and digital solutions to improve the lives of pets and their owners, in a collaborative...
Site Reliability Engineer

hace 17 horas

Buenos Aires, Buenos Aires C.F., Argentina Capchase A tiempo completo

Capchase is the #1 platform for vendor financing in tech. We help software and hardware vendors offer flexible installment payments as part of the sales process, improving conversion rates and cashflow. We provide an awesome buyer experience.Capchase was founded in 2020 and is headquartered in NYC. We've provided over $2.5B in funding to thousands of...
Site Reliability Engineer

hace 17 horas

Buenos Aires, Buenos Aires C.F., Argentina Paramo Technologies A tiempo completo

To apply for this position, you must be based in the Americas, preferably Latin America (the United States of America is not applicable). Applications from other locations will be disqualified from this selection process.We area cutting-edge e-commerce company developing products for our own technological platform.Our creative, smart and dedicated teams pool...
Site Reliability Engineer

hace 2 días

Buenos Aires, Buenos Aires C.F., Argentina Chevron A tiempo completo

Improves and protects the software and systems behind all of organization's IT services, including management of scalability, availability, latency, performance, security, and capacity, and delivering of software faster, better, and cheaper.The Chevron Business Support Center (BASSC), located in Buenos Aires (Puerto Madero), Argentina, is accepting online...

América

Europa

Asia / Oceanía

África

Senior Site Reliability Engineer, Observability