We are seeking a highly skilled Cloud Engineer with deep expertise in monitoring distributed microservices deployed in Azure environments. The ideal candidate will lead the design and implementation of end-to-end observability solutions using Splunk, ensuring robust performance, scalability, and reliability across our microservices architecture. This role will drive monitoring strategies, incident response automation, and operational excellence for mission-critical applications. Responsibilities Design, implement, and maintain comprehensive monitoring solutions for distributed microservices running on Azure Kubernetes Service (AKS) Leverage Splunk for log ingestion, custom dashboards, alerting, and advanced analytics Integrate Istio service mesh observability (telemetry, tracing, logging) into monitoring frameworks Implement Twistlock (Prisma Cloud) policies for container and workload security monitoring Configure and monitor Azure-native services : APIM, Cosmos DB, SQL Server, and Azure Networking Use Terraform to manage infrastructure as code, ensuring observability is embedded in deployments Work with Azure DevOps (AzDO) to build and optimize CI/CD pipelines with integrated monitoring hooks Utilize Azure Chaos Studio to test system resilience and incorporate findings into monitoring improvements Support automated API and performance testing using Karate Labs in conjunction with observability tooling Collaborate with development, security, and operations teams to define SLAs, SLOs, and SLIs Participate in incident response, root cause analysis, and continuous improvement processes Required Qualifications 5+ years of experience in DevOps, Site Reliability Engineering, or Cloud Operations roles 3+ years of hands-on experience with Splunk in a microservices environment Proven experience with Azure Kubernetes Service (AKS) and Istio Strong understanding of Azure services , including Cosmos DB, SQL Server, APIM , and virtual networking Experience implementing Twistlock , Terraform , and Azure DevOps (AzDO) pipelines Familiarity with Azure Chaos Studio and Karate Labs for testing and validation Strong scripting and automation skills (e.g., PowerShell, Bash, Python) Excellent troubleshooting, communication, and collaboration skills Preferred Qualifications Azure certifications (e.g., AZ-400, AZ-104, AZ-305) Experience with other observability tools (Prometheus, Grafana, OpenTelemetry) Knowledge of DevSecOps practices and secure CI/CD pipelines About VRIZE INC VRIZE is a Global Digital & Data Engineering company, committed to delivering end-to-end Digital solutions and services to its customers worldwide. We offer business-friendly solutions across industry verticals that include Banking, Financial Services, Healthcare & Insurance, Manufacturing, and Retail. The company has strategic business alliances with industry leaders such as Adobe, IBM Sterling Commerce, IBM, Microsoft, Docker, Sisense, Competera, Snowflake, and Tableau. VRIZE is headquartered out of Tampa (Florida) with a team size of 410 employees globally, currently, 100% of the clients undertaken are in the United States. Delivery centers are distributed in the US, Canada, Serbia, and India. Our continued success depends on remaining at the forefront of disruptive developments in information technology and leaders/team members joining the force are expected to replicate this success. VRIZE is an equal-opportunity employer. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, marital status, age, national origin, ancestry, disability, medical condition, pregnancy, genetic information, gender, sexual orientation, gender identity or expression, veteran status, or any other status protected under federal, state, or local law. Individuals with disabilities are provided reasonable accommodation. Seniority level Mid-Senior level Employment type Full-time Job function Information Technology Industries IT Services and IT Consulting We’re unlocking community knowledge in a new way. Experts add insights directly into each article, started with the help of AI. #J-18808-Ljbffr