We are seeking an experienced Lead Site Reliability Engineer to spearhead our infrastructure reliability initiatives and guide a team of talented engineers. In this role, you will shape technical strategy, mentor team members and drive operational excellence across our cloud-based platforms and distributed services. Responsibilities Lead the design and evolution of resilient, scalable infrastructure across multiple cloud providers Mentor and guide a team of engineers, fostering technical growth and best practices Define reliability standards, SLOs and operational policies for production environments Architect automation frameworks to streamline deployments and infrastructure management Oversee CI/CD strategy and ensure efficient software delivery workflows Coordinate incident response efforts and lead post-mortem analyses to prevent recurrence Partner with engineering leadership to align reliability goals with business priorities Champion observability practices to enhance system visibility and proactive issue detection Provide technical direction for microservices and event-driven architecture initiatives Evaluate emerging tools and technologies to enhance the reliability ecosystem Drive capacity planning, cost optimization and performance tuning across platforms Requirements 5+ years of experience in DevOps or Site Reliability Engineering Expertise in AWS, Azure and GCP Competency in Kubernetes, Terraform and Ansible Skills in GitHub and Jenkins Knowledge of microservices, APIs and event-driven processing Strong written and verbal English communication skills (B2+) We offer Learning Culture - We want you to be the best version of yourself, that is why we offer unlimited access to learning platforms, a wide range of internal courses, and all the knowledge you need to grow professionally Health Coverage - Health and wellness are important, that is why we have you and up to four family members in a premiere health plan. We have a couple of options, so you can choose what is best for you and your family Visual Benefit - Seeing your work for us would be a sight for sore eyes. We want your vision to always be at 100% which is why we offer up to $200.000 COP for any visual health expenses Life Insurance Plan - We have partnered with MetLife to offer a full-coverage Ife insurance plan. So, your family is covered, even if you are gone. Medical Leave Coverage - We are one of the few companies that cover 100% of your medical leave, for up to 90 days. Your health is the most important thing to us Professional Growth Opportunities - We have designed a highly competitive and complete development process, where you will have all the tools to get where you have always wanted to be, personally and professionally Stock Option Purchase Plan - As an EPAMer you can be more than just an employee, you will also have the opportunity to purchase stock at a reduced price and become a part owner of our organization Additional Income - Besides your regular salary, you will also have the chance to earn extra income by referring talent, being a technical interviewer, and many more ways Community Benefit - You will be part of a worldwide community of over 50,000 employees, where you can learn, challenge yourself, stand out, and share your knowledge and experience with multicultural teams! We accept CVs in English only. #J-18808-Ljbffr
Lead Site Reliability Engineer
EPAM SYSTEMS
Remote, Remote
Publicado hace 12 días
Denunciar empleo