Cloud Sre Engineer

hace 3 semanas


Microcentro, Argentina Capgemini Engineering A tiempo completo

**Cloud SRE Engineer**:
We are seeking a talented and experienced Cloud Site Reliability Engineer (SRE) with expertise in Splunk to join our dynamic team. In this role, you will be responsible for ensuring the reliability, availability, and performance of our cloud-based systems while leveraging Splunk to gain insights and drive improvements in system monitoring, troubleshooting, and optimization.

**Responsibilities**:

- Design, build, and maintain highly available and scalable cloud-based infrastructure and services.
- Implement and manage monitoring, alerting, and logging solutions using Splunk to ensure real-time visibility into system performance and health.
- Develop and maintain automation scripts and tools for deployment, configuration management, and infrastructure provisioning.
- Collaborate with cross-functional teams to define service-level objectives (SLOs) and service-level indicators (SLIs) and establish reliable incident response processes.
- Analyze system metrics, logs, and traces to identify trends, performance bottlenecks, and areas for optimization.
- Drive continuous improvement initiatives to enhance system reliability, scalability, and efficiency.
- Participate in on-call rotation and respond to incidents in a timely manner, following established incident management procedures.
- Stay up-to-date with industry best practices, emerging technologies, and trends in cloud computing and SRE methodologies.

**Qualifications**:

- Bachelor's degree in Computer Science, Engineering, or related field. Master's degree preferred.
- 3 years of experience in Cloud Site Reliability Engineering (SRE) or a related role.
- Strong proficiency in cloud platforms such as AWS, Azure, or Google Cloud Platform.
- Extensive hands-on experience with Splunk, including search query language (SPL), data ingestion, dashboards, and alerting.
- Proficiency in scripting languages such as Python, Bash, or PowerShell.
- Solid understanding of networking, security, and distributed systems concepts.
- Experience with containerization and orchestration technologies such as Docker, Kubernetes, or ECS.
- Excellent problem-solving skills and a proactive approach to identifying and resolving technical issues.
- Strong communication skills and ability to collaborate effectively with cross-functional teams.
- Splunk certification(s) such as Splunk Certified Admin or Splunk Certified Architect is a plus.