Overview At BairesDev®, we've been leading the way in technology projects for over 15 years. We deliver cutting-edge solutions to giants like Google and the most innovative startups in Silicon Valley. Our diverse 4,000+ team, composed of the world's Top 1% of tech talent, works remotely on roles that drive significant impact worldwide. When you apply for this position, you're taking the first step in a process that goes beyond the ordinary. We aim to align your passions and skills with our vacancies, setting you on a path to exceptional career development and success. Incident Response Engineer As an Incident Response Engineer, you will own the entire incident lifecycle to ensure high system availability and minimal downtime. You will serve as the primary point of contact during production outages, coordinating technical response efforts and driving issues to resolution through clear communication and expert troubleshooting. What You\'ll Do Manage the end-to-end incident lifecycle, including detection, triage, mitigation, and resolution of production issues. Utilize PagerDuty and monitoring tools to maintain high-velocity response times and meet established SLOs/SLIs. Facilitate blameless post-mortems to identify root causes and prevent recurrence of critical system failures. Develop and maintain comprehensive runbooks and automated response scripts to reduce mean time to recovery (MTTR). Partner with SRE and engineering teams to improve system observability and design more resilient architectures. Oversee incident communication channels, providing timely updates to technical stakeholders and leadership during active events. What We Are Looking For 4+ years of experience in Infrastructure, Site Reliability Engineering, or Incident Management. Proven expertise in owning the incident lifecycle, including detection, response, and post-mortem analysis. Advanced proficiency in using PagerDuty for incident orchestration and communication. Hands-on experience building runbooks and defining SLOs/SLIs to improve system reliability. Deep understanding of Linux systems and modern observability practices. Advanced proficiency in English. How we do make your work (and your life) easier 100% remote work (from anywhere). Excellent compensation in USD or your local currency if preferred Hardware and software setup for you to work from home. Flexible hours: create your own schedule. Paid parental leaves, vacations, and national holidays. Innovative and multicultural work environment: collaborate and learn from the global Top 1% of talent. Supportive environment with mentorship, promotions, skill development, and diverse growth opportunities. Join a global team where your unique talents can truly thrive and make a significant impact! Apply now! #J-18808-Ljbffr
Incident Response Engineer | Ref#288484
BAIRESDEV
bogotá, distrito capital, bogotá, distrito capital
Publicado hace 23 días
Denunciar empleo