Caseware is one of Canada's original Fintech companies, having led the global audit and accounting software industry for over 30 years, with more than 500,000 users across 130 countries and available in 16 different languages. While you might not have heard of us yet, over 36,000 accounting and audit professionals list Caseware as a skill on their LinkedIn profiles. Why This Role Matters As a leader in cloud-native SaaS, we are accelerating our shift to an AI-first future—embedding generative AI and autonomous agents across our platform to deliver smarter, faster user experiences. We are on the lookout for a visionary AI Test Architect to build the next-generation "Quality Intelligence" platform: one that leverages generative AI for automated test creation, self-healing execution, predictive defect analytics, and rigorous validation of our AI features built inhouse for our global audience. As our foundational AI Test Architect, you'll design scalable, ethical frameworks that ensure reliability, safety, and compliance while accelerating release velocity (targeting *****% faster cycles through AI-augmented testing). Your work will reduce risk in production AI agents, minimize hallucinations, bias, and security exposures, and empower the entire engineering organization to adopt AI-augmented quality practices that supplement traditional mature frameworks we have. This high-impact role sits at the intersection of Platform Engineering, AI, and Quality—shaping how we build trustworthy intelligence at scale. Location: This is a fully remote position located in Colombia. You will be reporting to: Jai Joshi. What You’ll Be Doing AI-Driven Quality Strategy & Architecture Architect a comprehensive "Quality Intelligence" platform using generative AI to predict defect hotspots, intelligently optimize regression suites, auto-generate tests, and enable self-healing automation. Define enterprise-wide AI-first testing strategy, including non-deterministic evaluation paradigms, continuous monitoring for drift and hallucination, and integration across the full SDLC. Establish governance for ethical AI testing, aligning with emerging standards. LLM & Agent Evaluation Frameworks Design and implement advanced benchmarks, red teaming protocols, and adversarial testing for internal AI agents and generative features—focusing on hallucination rates, bias, fairness, prompt injection, jailbreaks, and goal misalignment. Build evaluation pipelines with statistical rigor using tools such as LangFuse, LangSmith, DeepEval, RAGAS, or Arize Phoenix for metrics such as faithfulness, context precision, and safety compliance. Architect harnesses for agentic workflows, tool-calling, planning, multi-agent simulations, and post-deployment observability. Cross-Functional Leadership & Evangelism Collaborate with product, data science, ML engineering, and security teams to influence AI feature design with quality guardrails from day one. Evangelize and mentor by upskilling traditional QA engineers into AI-augmented testers through workshops, playbooks, and communities of practice. Drive adoption of AI quality best practices organization-wide, including metrics dashboards for DORA and AI-specific indicators. Define and implement AI-specific quality telemetry integrated with tools like Langfuse. Establish feedback loops for model iteration, A/B testing guardrails, and proactive risk mitigation in production. Challenges You’ll Architect Solutions For Building reliable evaluation for non-deterministic, agentic AI in a fast-moving SaaS landscape. Scaling self-healing and generative test automation without introducing new flakiness or security debt. Balancing innovation speed with rigorous red teaming and ethical safeguards for customer-facing AI. Success in the First 6-12 Months Launch the "Quality Intelligence" platform foundation with AI-augmented pipelines covering 70%+ of critical paths. Establish red teaming and red-teaming-as-code processes that reduce high-severity AI risks by 40%+. Upskill 50%+ of QA and engineering teams on AI testing fundamentals and deliver measurable velocity and safety gains. Establish a baseline 90%+ faithfulness score for all RAG-powered features. What You Will Bring 8+ years in Quality Engineering or Test Architecture within cloud-native SaaS environments, with 2+ years focused on AI, ML, or LLM testing and validation. Deep expertise in AWS (serverless, microservices, IaC with Terraform and CloudFormation) and GitHub CI/CD ecosystems. Proficiency in architecting LLM-based applications and testing frameworks, with experience in LangChain, LangGraph, LangSmith, or equivalent. Mastery of modern automation tools such as Playwright and Cypress, with hands‑on experience integrating self-healing AI plugins or generative test tools. Experience with LLM evaluation tools such as Bedrock Evaluations, Prompt Management, Guardrails, DeepEval, RAGAS, Arize Phoenix, and Langfuse. Experience with red teaming frameworks and tools such as Cobalt Strike, Sliver, and Nmap, and knowledge of adversarial testing methodologies. Proven leadership: mentoring teams, defining standards, and driving cross-functional change in ambiguous, high-growth settings. Bachelor's or Master's degree in Computer Science, AI, ML, or related field; relevant certifications are a strong plus. Strong English language communication and collaboration skills. What’s in it for you Innovation is at our core: we work with cutting‑edge technology in accounting and financial reporting, constantly pushing the boundaries to create impactful software solutions. We are committed to a collaborative culture where your ideas are valued and knowledge sharing is encouraged within a supportive, inclusive team. Work‑life balance is important to us: we offer flexible work options, remote opportunities, and generous time‑off policies to ensure a healthy work‑life balance. We offer competitive compensation, including a competitive salary and comprehensive benefits such as health insurance and retirement plans. Your contributions directly affect how our clients manage financial processes and drive their success. Recognition and rewards matter to us: we celebrate hard work through recognition programs, performance bonuses, and opportunities for career growth. We embrace global opportunities: work on international projects and collaborate with a diverse, global team. EEO Statement One of Caseware's core values is Many Voices, One Team and with that in mind, we’re dedicated to building teams as diverse as our customers in an equitable and inclusive way. We welcome and encourage candidates of all backgrounds to apply. Should you require accommodations or have any questions at any point during the application or interview process, please e‑mail our People Operations team. #J-18808-Ljbffr
Test Architect
CASEWARE
bogotá, distrito capital, bogotá, distrito capital
Publicado hace 22 días
Denunciar empleo