The integration of Artificial Intelligence into Quality Assurance is profoundly transforming both its processes and the role of QA within the software development lifecycle. This article examines the current state of AI adoption in QA — its benefits, risks, and implementation costs — as well as the emergence of new metrics designed to assess the effectiveness and reliability of these systems.
It also addresses the evolution of the QA role toward a more strategic profile, embedded within a quality model assisted by intelligent systems, where human intervention remains an essential factor for oversight, validation, and results control.
The Origins and Evolution of QA, and the Rise of AI
With the emergence of software and digital applications, quality control adopted a predominantly reactive approach focused almost exclusively on defect detection. However, the growing complexity of systems exposed the limitations of this model, driving a shift toward a more preventive and collaborative approach to quality assurance. This transition was supported by practices such as shift-left testing, test automation, and continuous testing within CI/CD environments — establishing QA as a core discipline within the software development lifecycle.
Against this backdrop, the rise of Artificial Intelligence introduced a new paradigm in how quality processes are conceived. This is not merely an incremental evolution, but a structural shift in the way validation processes are designed, prioritized, and executed.
The Impact of AI on the SDLC and QA
The impact of AI, however, has not been confined to QA alone. Its integration has unfolded progressively and transversally, affecting both development and validation phases — generating a direct impact on the final quality of software.
On one hand, development teams have incorporated generative AI tools for code generation, such as Copilot or Claude, significantly increasing delivery speed. Yet this advancement also introduces new risks related to the quality and maintainability of generated code, due to potential inconsistencies with the broader application context.
On the other hand, QA teams have integrated AI across multiple stages of the testing process, transforming the way quality assurance strategies are designed, executed, and maintained.
According to various industry reports — including QA and Software Testing in 2025 (based on over 100 development teams) and BrowserStack’s State of AI in Software Testing 2026 (based on over 250 technical leaders) — more than 60% of organizations have already incorporated AI into parts of their testing workflows, particularly in regression, smoke testing, and risk-based prioritization.
AI adoption is also extending to other areas of the SDLC, such as business analysis — where it supports requirements and feature definition — and design, facilitating the generation of interfaces and prototypes in tools like Figma. This reflects an increasingly transversal impact across the entire software development lifecycle.
As a result, the sense that AI has become a standard part of the toolstack for all stakeholders in the software development lifecycle is growing across the industry. This adoption is generating impact at both operational and strategic levels, redefining processes, roles, and quality metrics.
Benefits
Following several years of generative AI model adoption, the following key benefits can be identified within the QA domain:
- Test Case Generation: Automatic generation of test cases from code, functional requirements, or user stories.
- Example: Given a user story such as « the user should be able to reset their password, » the system automatically generates cases covering valid/invalid passwords, expired sessions, multiple failed attempts, field format validations, and more.
- Test Prioritization: Intelligent test prioritization based on criticality, change impact, and risk analysis.
- Example: Following a change to the checkout flow, the system automatically prioritizes tests related to tax calculations, discounts, and payment gateways.
- Log Analysis & Processing: Analysis, rewriting, and summarization of logs, along with detection of duplicate test cases or incidents.
- Example: In an execution that has generated hundreds of log lines, the system groups repeated errors, summarizes the issue into a single incident, and reduces noise and manual analysis time.
- Self-Healing Tests: Automatic test maintenance, adapting to changes in interfaces or system flows.
- Example: If a button changes from
id="submit-btn"toid="submit-button", the system automatically updates the selector without requiring manual intervention.
- Example: If a button changes from
- Root Cause Analysis: Automated failure analysis and support in identifying root causes.
- Example: Faced with a login test failure, the system correlates backend logs, authentication changes, and database errors — suggesting a token service issue as the root cause.
- LLM-based Evaluation: Automated results evaluation using LLM models capable of analyzing test outputs, system responses, and logs to determine their validity or relevance based on defined criteria.
- Example: Rather than validating only status codes, an LLM assesses whether an API error message is contextually coherent with the nature of the failure.
- Agentic Testing Systems: Autonomous agent-based systems capable of planning, exploring applications, generating scenarios, executing tests, and reporting results iteratively — adapting their behavior based on outcomes.
- Example: An autonomous agent explores an application, identifies critical flows, dynamically generates tests, executes scenarios, and adjusts its strategy based on results.
Taken together, these advances accelerate the testing cycle across its various phases — analysis, design, execution, and reporting — particularly in well-structured environments with sufficient context available.
Risks
That said, AI integration also introduces significant new risks and limitations:
- Incomplete Test Cases: Generation of incomplete or incorrect test cases due to biases in training data. Some reports indicate that between 20% and 40% of automatically generated tests require manual review or correction.
- Example: The system generates tests for a registration form but omits critical scenarios such as security validations, due to biases in the training data.
- Scenario Complexity: Difficulty modeling complex scenarios, particularly in critical systems.
- Example: In a banking system, the model may fail to correctly represent flows that depend on multiple regulatory conditions, intermediate states, or external systems.
- Contextual Understanding Gaps: Difficulty detecting defects arising from business logic, system integration, or contextual coherence.
- Example: A test passes at a technical level because the system fails to detect an incorrectly applied discount, not understanding the business logic associated with that promotion.
- False Positives/Negatives: Inaccurate defect detection — either reporting non-existent errors or failing to identify real failures under certain conditions.
- Example: The system accepts an incorrect data result as valid because it is structurally and formally well-formed.
- Excessive Dependency: Potential erosion of technical knowledge within teams due to over-reliance on automated tooling.
- Automation Bias: A tendency to accept AI-generated results without sufficient validation. Research suggests that up to 30–40% of incorrect decisions made by AI systems go unchallenged.
- ROI: Difficulty objectively measuring the return on investment.
- Hallucinations: Model hallucinations — the generation of incorrect but apparently coherent results. Estimated rates range from 5% to 30% in complex tasks, depending on context.
- Non-Functional Testing: Limited capacity to deliver value in performance, scalability, security, or observability testing compared to functional testing.
These risks reflect a still-significant gap between the theoretical potential of AI and its actual performance in complex or critical contexts — where human oversight remains an essential element.
The Emergence of New Metrics
In this new landscape — where the integration of Large Language Models (LLMs) enables test case generation to be automated at scale — it becomes necessary to introduce new metrics capable of evaluating these non-deterministic systems through measurement approaches that go beyond simply quantifying how much is being tested, focusing instead on the real utility of that testing.
Unlike traditional testing, where outcomes are binary (pass/fail), AI-based systems require metrics that capture degrees of adequacy, coherence, and usefulness of the generated responses.
Some of the most relevant and emerging proposals include:
- Test Effectiveness Rate (TER): The proportion of tests that detect real defects relative to the total executed.
- Signal-to-Noise Ratio: The relationship between relevant results (valid defects) and generated noise (false positives or redundant tests).
- AI-generated Test Reliability: The degree of confidence in automatically generated test cases, assessed through cross-validation, golden datasets, or model-assisted review.
- Defect Detection Efficiency (DDE): The ability to detect defects in early stages of the development cycle.
- Actual Coverage vs. Generated Coverage: The difference between the theoretical coverage generated by AI and the effective coverage of critical functionalities.
- Test Maintenance Overhead: The effort required to maintain, correct, or filter automatically generated tests.
- LLM Evaluation Score: Assessment of the quality of generated responses using evaluator models (LLM-as-a-judge), based on criteria such as relevance, coherence, and correctness.
- Hallucination Rate: The proportion of AI-generated responses containing incorrect or unverifiable information.
- Task Success Rate: The percentage of tasks correctly completed by autonomous systems or AI-based assistants.
- Consistency Score: The degree of stability of generated responses when faced with equivalent or slightly modified inputs.
These metrics reflect a paradigm shift in quality evaluation — moving from a deterministic model based on coverage and execution, to a probabilistic model centered on the reliability, consistency, and utility of AI-assisted systems.
Adapting the QA Role in an AI-Assisted Environment
Beyond its impact on development and QA processes and on validation metrics, AI adoption is driving a significant transformation that directly affects the competencies and responsibilities of QA professionals.
Traditionally, the QA role focused on requirements analysis, test case design, test execution, and defect reporting. In the current context, this role is evolving toward a more strategic profile — oriented toward the oversight, validation, and governance of automated systems.
This consolidates the human-in-the-loop paradigm, in which the QA professional takes on supervisory, validation, and audit functions that may vary depending on the seniority of the profile.
Differential Impact by Experience Level
Junior profiles (testers) AI acts as an accelerator for learning and productivity, enabling:
- Assisted test case generation
- Standardization of defect reports
- Increased execution speed
- Reduced technical barrier to entry
Mid-level profiles (analysts) Value is centered on:
- Improved requirements analysis
- Supervision and validation of AI-generated scenarios
- Incorporation of business knowledge into models
- Identification of edge cases and complex dependencies
Senior profiles (leads) AI facilitates:
- Definition and optimization of quality strategies
- Advanced metrics analysis and new KPI development
- Filtering of noise generated by large-scale automation
- Alignment between technical quality and business objectives
Transversal capabilities Across all levels, a new key competency is emerging: the ability to craft effective prompts and provide adequate context to AI systems.
Knowledge of DevOps practices is also gaining relevance — enabling the integration of these systems into CI/CD pipelines and supporting selective test execution, where systems themselves determine which tests to run based on code changes, dependencies, and defect history, and prioritize them according to risk.
Feedback loops allow these systems to learn continuously from results, progressively optimizing coverage, prioritization, and testing effectiveness.
However, this advanced automation demands constant oversight to prevent biases, incorrect decisions, or loss of control over the quality process. As a result, the QA professional evolves into an orchestrator of quality in AI-assisted environments.
New Role: QA for AI Systems and Agents
Yet the transformation of QA from functional tester to quality orchestrator is not the only role-level shift the industry is experiencing.
The proliferation of AI-based systems introduces a new dimension in QA: the need to validate non-deterministic systems.
Unlike traditional software — where expected behavior is fixed and verifiable through deterministic assertions — AI systems generate probabilistic and variable outputs for the same input. As a result, QA must validate not so much the accuracy of a specific response, but the adequacy of behavior within an acceptable range. This involves assessing aspects such as:
- Coherence and relevance of responses
- Robustness against diverse or adversarial inputs
- Consistency of results when faced with equivalent inputs
- Presence of biases in generated responses
- Model degradation over time (model drift)
In this context, LLM evaluation frameworks become especially relevant — combining the use of golden datasets, automated evaluation through evaluator models (LLM-as-a-judge), and human validation.
In short, a new QA role is emerging — one in which the object of testing is no longer the various application types previously worked with, but rather the assurance of quality in non-deterministic models, where the validation focus shifts from expected outputs to the adequacy of behavior within a variable and acceptable range.
Costs and Challenges of AI Adoption in QA
All of this AI adoption and the transformation it drives across development and QA processes represents a significant investment — not only at the technological level, but also organizationally, operationally, and in terms of talent. This transformation, closely tied to the evolution of the QA role, introduces new demands that must be addressed from a strategic perspective.
Technical Costs
- Integration of AI tools into existing pipelines
- Architectural adaptation to support advanced automation
- Management of more complex infrastructures (processing, storage, observability)
- Need for additional tooling to monitor, audit, and validate AI systems
Operational Costs
- Increased process complexity
- Continuous oversight of automated systems
- Management of noise generated by large-scale automation
- Maintenance of models, prompts, and associated configurations
Organizational and Talent Costs
- Need for upskilling in new competencies (prompt engineering, AI literacy, DevOps)
- Greater demand for technically proficient profiles capable of validating AI-generated results
- Risk of technological dependency and loss of internal knowledge if not properly managed
Economic Costs
- Licensing fees for specialized AI-based tools
- Computational costs associated with advanced model usage
- Investment in team training and upskilling
- Potential increase in senior profiles required for oversight and validation
Various industry studies reflect that initial implementation costs can be significantly higher than those of traditional frameworks, particularly during integration phases. Furthermore, the lack of specialized talent and the difficulty of integrating with legacy systems rank among the main barriers to adoption — which ultimately depends on model maturation, organizational adaptation, and team learning curves.
Accordingly, AI adoption in QA must be approached as a medium-to-long-term strategic investment, not as an immediate cost optimization.
Substitution or Complementarity?
With all of the above in mind, let us address one of the most recurring debates in the industry: will Artificial Intelligence replace QA professionals?
Current evidence points clearly toward a scenario of complementarity. AI acts as a co-pilot that automates repetitive, low-value tasks — allowing professionals to focus on higher-complexity activities such as exploratory testing, complex scenario validation, user experience evaluation, and contextual analysis, playing a more strategic role centered on validation, oversight, and decision-making.
In fact, academic research indicates that AI adoption in testing still lags behind its use in development — evidencing a testing gap where human capabilities remain critical to guaranteeing the final quality of software.
Ultimately, far from disappearing, the role is evolving: the greater the automation, the greater the need for oversight, technical judgment, and business understanding.
As Margarita Simonova notes in the Forbes Technology Council piece The State of Testing in 2025: AI suggests, but the decision still belongs to humans.
Conclusion
Artificial Intelligence has established itself as a transformative force in QA, redefining both the processes and the roles associated with quality assurance.
Far from representing a threat, its adoption constitutes an opportunity to evolve toward a more efficient, strategic, and contextually aligned model — one suited to the growing complexity of modern software development.
In a context characterized by the acceleration of code generation and the mass production of software, QA takes on an even more critical role as a guarantor of quality. The effective integration of AI will enable professionals not only to increase their productivity, but also to reinforce their positioning as key actors within the SDLC.
Nevertheless, a realistic perspective is essential in the current climate of heightened expectations around AI. While its capabilities are significant, its implementation is far from fully autonomous or free of limitations. Issues such as inconsistent output generation, lack of business context, the presence of biases, and the need for constant oversight demonstrate that these technologies still require substantial human intervention.
In this sense, the value of AI lies not in replacing the QA professional, but in amplifying their capabilities. The gap between expected potential and current reality stems largely from the quality of integration, the adequacy of context provided, and the critical capacity of teams to interpret and validate AI-generated results.
In this new landscape, competitive advantage will not reside merely in adopting AI, but in the ability to integrate it critically, efficiently, and in alignment with product quality objectives. Because, ultimately, quality is not a property of software — it is the result of the decisions made by those who build and validate it.
References:
BrowserStack. (2026). State of AI in Software Testing 2026. Recuperado de https://www.browserstack.com/blog/inside-the-state-of-ai-in-software-testing-2026/
CopilotQA. (2025). QA and Software Testing in 2025: Trends, Challenges, and AI Adoption. Recuperado de https://copilotqa.com/qa-and-software-testing-in-2025/
Forbes Technology Council. (2025). The State of Testing in 2025: The AI Adoption Gap. Recuperado de https://www.forbes.com/councils/forbestechcouncil/2025/12/15/the-state-of-testing-in-2025-the-ai-adoption-gap/
Forbes Technology Council. (2025). AI Is About to Reshape Millions of Software QA Jobs. Recuperado de https://www.forbes.com/councils/forbestechcouncil/2025/10/06/ai-is-about-to-reshape-millions-of-software-qa-jobs/
Wifitalents. (2025). AI in Quality Assurance Testing: Statistics and Trends. Recuperado de https://wifitalents.com/ai-quality-assurance-testing-industry-statistics/
Anthropic. (2024). Understanding AI Hallucinations and Model Behavior. Recuperado de https://www.anthropic.com/research
Financial Times. (2025). AI hallucinations become a growing concern for enterprises. Recuperado de https://www.ft.com/content/e074d3a9-7fd8-447d-ac0a-e0de756ac5c5
arXiv. (2026). An Empirical Study on AI-Assisted Software Testing in Real-World Repositories. Recuperado de https://arxiv.org/abs/2603.13724
arXiv. (2026). The Testing Gap: Adoption of AI in Software Development vs Quality Assurance. Recuperado de https://arxiv.org/abs/2601.21305
arXiv. (2025). Challenges and Limitations of AI in Software Testing: A Systematic Review. Recuperado de https://arxiv.org/abs/2504.04921



