Gerd Altmann | Pixabay
In an era where software systems grow ever more complex and distributed, ensuring quality at every layer of infrastructure has become both critical and challenging. Forward-thinking leaders are turning to artificial intelligence (AI) and machine learning (ML) to meet this challenge.
One such innovator is Varun Mukka, an Engineering Architect in Infrastructure Quality known for integrating AI/ML into test automation and continuous verification. Mukka's approach exemplifies how intelligent automation can transform traditional quality assurance from a slow, reactive task into a proactive, data-driven discipline.
At a time when fewer than half of companies have adopted AI in their testing workflows, he stands at the cutting edge of a major industry shift.
Mukka was an Engineering Architect in Infrastructure Quality at Okta, specializing in Infrastructure Quality. He has extensive experience in software quality engineering and has pioneered the use of AI and ML to enhance testing processes.
Mukka leads efforts to embed machine learning into test automation frameworks and to champion continuous verification practices that ensure systems remain reliable even as they scale. By blending classical quality engineering with emerging AI techniques, he has driven innovative changes in how organizations verify complex cloud infrastructure.
The field of infrastructure quality assurance itself has evolved rapidly alongside DevOps. Traditionally, testing ensured new code met requirements, but modern systems demand continuous verification—the ongoing validation of system behavior in real-time.
Continuous verification (CV) has been described as a discipline, contrasting with older reactive testing methods. This proactive philosophy, combined with AI's ability to detect patterns and anomalies, is gaining momentum across the industry.
According to Capgemini's World Quality Report 2023, 72% of organizations are actively exploring AI for QA initiatives by 2024. Against this backdrop, Mukka's work in marrying AI with infrastructure quality positions him as a visionary leader in quality engineering.
As an Engineering Architect, Mukka early on recognized the untapped potential of AI to improve how we ensure infrastructure reliability. He began experimenting with machine learning in test automation well before it was common practice.
"I realized early on that AI could transform how we approach quality assurance, turning a traditionally reactive process into a proactive one," he says, reflecting on the motivation behind his initiatives. "By leveraging machine learning to analyze patterns and predict issues, we can address potential failures before they occur and fundamentally improve reliability."
Embracing this mindset required challenging the status quo of manual testing. Mukka championed the idea that AI algorithms could sift through log data, user behaviors, and system metrics far faster than humans, uncovering subtle issues in complex cloud environments.
His forward-looking perspective set the stage for significant changes in his organization's testing culture. Mukka's pioneering efforts came at a time when, today, only a minority of companies actively use AI for test automation, yet those who do report improved defect detection and efficiency.
"We started small by introducing AI-driven checks in areas where traditional scripts struggled," Mukka explains. "When those early projects caught critical bugs that we would have missed otherwise, it validated our approach."
"It wasn't about replacing engineers—it was about augmenting our capabilities to deliver a more robust infrastructure." Indeed, industry surveys now confirm that AI in testing is gaining traction, with adoption rising from 7% in 2023 to 16% in 2025 as organizations witness the benefits.
By leading this change early, Mukka positioned his team ahead of the curve in infrastructure quality innovation.
One of Mukka's core focus areas is continuous verification—continuously checking that systems behave as expected even after deployment. He stresses that in today's fast-paced DevOps pipelines, testing cannot stop at release.
"Continuous verification means we're always validating our infrastructure and services in real-time," Mukka explains. "Instead of testing once and assuming things will remain good, we use live data and AI models to constantly check system health."
"It's about being proactive: catching issues due to environment changes or integrations before users notice anything wrong." In contrast to traditional QA that validates known requirements, continuous verification employs ongoing monitoring and experimentation to uncover unknown failure modes.
Mukka implemented this by integrating testing hooks into production and staging environments—for example, using ML algorithms to watch metrics for anomalies post-deployment. This approach aligns with modern SRE (Site Reliability Engineering) practices where success isn't just deploying without errors, but ensuring the software stays error-free under real-world conditions.
The importance of continuous verification is increasingly recognized in the industry as systems grow more complex. Small failures can hide in the labyrinth of microservices, containers, and APIs.
Mukka recounts how continuous verification, enhanced with AI, gives his team an edge in maintaining high uptime. "We've tied our testing systems into our monitoring stack, so if a new build causes even a slight uptick in error rates, the machine learning models flag it immediately," he says.
"This lets us initiate rollbacks or fixes within minutes. In the past, it might have taken hours or days for a human to notice these subtle issues—now our AI not only notices but also helps diagnose the cause."
His experience is backed by emerging tools in the DevOps world. For instance, now leverage ML to decide when to roll back changes based on live metrics, significantly reducing the risk of a bad deployment.
By embracing continuous verification, Mukka ensures that quality assurance is not a one-time gate but a continuous guardian. This proactive strategy is essential when growing system complexity could soon make manual oversight impossible.
Continuous verification, especially when augmented by AI, offers a way to manage this complexity and maintain trust in the infrastructure around the clock.
Integrating machine learning into test automation has profoundly changed Mukka's testing workflows. One major impact has been on the speed and scope of testing.
Traditional automated tests had to be scripted manually and often checked only the scenarios the writers anticipated. With ML, Mukka's team can generate and prioritize test cases dynamically.
"Machine learning brought a paradigm shift for us in testing," Mukka notes. "We use ML models to analyze past outages and user reports, which helps us generate new test cases targeting high-risk areas."
"It's not just running more tests—it's running smarter tests. We've seen test coverage and depth improve without proportional increases in manual effort."
In practice, his team built systems that learn from each test run, identifying which parts of the infrastructure are most failure-prone and adjusting the test focus accordingly. This intelligence means the team catches issues that previously went unnoticed.
The result is a faster feedback loop to developers and fewer surprises in production. Industry reports show that 39% of teams have experienced efficiency gains in test automation thanks to AI assistance.
Mukka's initiatives mirror these gains: by letting ML handle the tedious pattern recognition, his engineers concentrate on creative problem-solving and edge cases. The use of AI/ML has also improved product quality and reliability for Mukka's organization.
An AI-enhanced testing regime can run continuously and adapt to changes, which translates into more stable releases. "Since adopting ML in our quality process, our key quality metrics have all trended upward," Mukka observes.
"We're catching roughly 30% more issues before code hits production and our test suites run significantly faster thanks to intelligent pruning of redundant tests. The net effect is faster delivery of features with confidence that we haven't broken anything critical."
These outcomes align with broader industry findings. The World Quality Report 2023–24 found that 75% of organizations now invest in AI to optimize QA, with 65% citing higher productivity as the main benefit.
Similarly, a majority of companies using generative AI in testing report faster automation and improved efficiency. Mukka's on-the-ground results provide a concrete example: his team accelerated their deployment cycle while simultaneously reducing post-release incidents.
By transforming their testing with machine learning, they have achieved a level of software quality and speed that would have been very hard to reach with manual methods alone. This demonstrates the real-world impact of AI-driven innovation in infrastructure quality.
While the benefits of AI in testing are clear, Mukka is candid that the journey was not without challenges. Introducing AI/ML into a legacy testing process required overcoming skepticism, technical hurdles, and initial failures.
One major challenge was ensuring the AI models had enough high-quality data to learn from. "In the beginning, our ML models were only as good as the data we fed them," Mukka recalls.
"We had to invest time in gathering and cleaning years of test results and production incident logs. There were moments when the AI gave us false alarms or missed obvious bugs because it was learning."
"We realized we needed to continuously train and fine-tune the models. It taught us that AI in testing isn't a fire-and-forget solution—it's like raising a child, requiring patience and good guidance."
His experience underscores the importance of data preparation. Industry experts agree that training AI models on high-quality, representative data is critical; otherwise, the predictions and results can be unreliable.
Mukka's team addressed this by curating datasets and periodically retraining their ML algorithms as systems evolved. Another challenge Mukka faced was the human factor—getting his QA engineers and developers comfortable with AI assistance.
Change can be daunting, and there were concerns about trust and job relevance. "At first, some team members were wary that an AI might replace their role or lead us astray with opaque decisions," he notes.
"We tackled this by making the AI's suggestions transparent and involving the team in interpreting the results. We treated the AI as an assistant, not an oracle."
"Over time, as everyone saw the AI catching tricky bugs or saving hours of repetitive work, trust grew and the skepticism turned into enthusiasm." Additionally, Mukka emphasizes training and upskilling.
"We encouraged our QA folks to learn the basics of how the ML models work. This demystified the technology and empowered them to improve our AI tools."
Such upskilling is now widely recommended in the industry; reports highlight that organizations need to invest in developing AI expertise in QA teams to leverage these tools fully. By navigating these technical and cultural challenges—from data quality to team education—Mukka successfully integrated AI into the workflow.
His experience provides a roadmap for others: start with small pilot projects, involve the team early, and incrementally build trust in the AI by demonstrating tangible wins.
To illustrate the power of AI in infrastructure quality, Mukka shared a striking example from his work. His team developed an AI-driven monitoring and test system to safeguard their platform's reliability.
"One of our biggest wins was an internally developed anomaly detection tool that watches our infrastructure 24/7," Mukka says. "We trained a machine learning model on our application logs and metrics. It learned what 'normal' looks like across our services—things like typical memory usage, response times, and database queries. Not long after we deployed it, the model caught a subtle memory leak in a new microservice within an hour of rollout."
"No traditional test had flagged it. Thanks to the alert, we were able to roll back and fix the issue before any customers were affected."
This incident proved the value of AI-driven continuous verification in a real-world scenario. The ML system effectively acted as an ever-vigilant tester in production, finding a needle-in-a-haystack problem that could have grown into a major outage.
The outcome was improved confidence in deployments and a faster remediation cycle, as the team didn't have to wait for user reports or scheduled manual checks to discover critical issues. Mukka's success story is reminiscent of approaches used by leading tech companies to maintain quality at scale.
For example, Facebook (Meta) deployed an intelligent testing tool, Sapienz, which "automatically designs, runs, and reports the results of tens of thousands of test cases every day" on their mobile app, catching issues within minutes of code being written.
Similarly, many organizations are adopting anomaly detection and self-healing systems in their operations. Mukka points out that these examples inspired his work.
"Seeing companies like Netflix and Facebook use AI to push quality and reliability gave us confidence. Our context at Okta was different, but the principle was the same—use smart algorithms to handle scale and complexity."
"Now, whenever we roll out a major change, our AI monitors are effectively doing a mini 'health check' every few seconds. It's like having an extra team member who never sleeps."
The broader industry is moving in this direction as well: Gartner predicts that "by 2026, over 60% of enterprises will operationalize AI-driven monitoring tools" (Gartner, "Innovation Insight for AIOps," 2023). In Mukka's case, the AI-driven solution has become a cornerstone of their infrastructure quality strategy.
It demonstrates significant impact: faster detection of problems, automatic prevention of incidents, and ultimately higher trust from the business and users that the system will behave as expected.
By combining AI-driven testing with continuous verification, Mukka has markedly enhanced both the reliability of his company's systems and the efficiency of the engineering process. One clear benefit is a reduction in production issues and downtime.
With tests running continuously and AI models predicting points of failure, problems are identified and resolved much earlier in the development cycle. "Our failure rate after releases went down dramatically," Mukka reports. "In the past, we might have discovered an issue hours or days after a deployment. Now, our pipeline's continuous tests and verifications often catch those issues immediately."
"It means we fix things before they escalate—customers never see most of the glitches anymore." Early detection of defects is indeed a hallmark benefit of continuous testing in DevOps.
By shifting quality checks to happen throughout development and after deployment, his team reduces the risk of a small bug snowballing into a major outage. This has improved service reliability metrics like uptime and error rates.
Mukka's approach aligns with industry best practices: continuous testing and verification are known to enhance overall software quality and safety by ensuring each code change is vetted in real conditions. In other words, quality isn't an afterthought—it's built and maintained continuously, which for an identity and security platform like Okta, is essential.
Efficiency gains have come hand-in-hand with reliability. Automating tests with AI and running them continuously means engineers get faster feedback, enabling quicker iterations and releases.
Mukka notes that their deployment frequency increased once the new practices settled in. "We were able to speed up our release cycles because the team spends less time in lengthy manual test phases," he says. "AI helps prioritize the most important tests, and continuous verification gives us the confidence to deploy more often. When an issue does arise, it's usually caught and fixed in the same cycle."
"This agility was hard to imagine in the old days of big, infrequent releases." The faster time-to-market is a direct result of streamlining quality checks—a benefit widely reported by DevOps adopters.
Mukka also emphasizes that efficiency isn't just about speed, but smarter resource use. Engineers are freed from tedious regression testing and can focus on designing better features and improving the tests themselves.
Additionally, by automatically pruning redundant tests and using AI to target risky areas, they avoid wasting computing resources on running thousands of unnecessary test cases. The combination of these factors—fewer production issues, quicker delivery, and optimized testing workload—shows how AI-driven continuous verification can achieve the often elusive goal of "better quality and faster delivery."
It's a win-win that Mukka and his team have demonstrated, serving as a model for modern quality engineering.
Looking ahead, Mukka envisions an exciting future for infrastructure quality engineering, heavily influenced by AI and automation. He believes we are only at the early stages of what AI/ML will do for quality assurance.
"In the next few years, I see AI becoming an even more integral part of the toolset for quality engineers," Mukka predicts. "We'll have intelligent systems that not only detect issues but also fix some of them automatically. Imagine a deployment pipeline where an AI finds a misconfiguration and immediately applies a safe correction—that's where we're headed. I also expect testing will become more autonomous."
"We're starting to experiment with AI agents that can generate test scenarios on the fly and even simulate user behavior in complex workflows without being explicitly told what to do." This vision extends the current trends to a more autonomous QA paradigm, where AI acts like a co-pilot to engineers.
Industry trends support this direction: respondents to a recent Gartner survey expect that generative AI and other advanced tools will significantly influence testing in the next three years.
The concept of "agentic AI"—autonomous AI systems that make decisions and take actions—is gaining traction, with analysts predicting that such AI could handle at least 15% of routine work decisions by 2028. In the context of quality engineering, that could mean AI-driven testing agents handling a substantial portion of day-to-day validation tasks.
Mukka is quick to add that human expertise will remain vital, even as AI's role expands. He foresees the role of QA professionals evolving rather than disappearing.
"Engineers will focus more on defining quality goals, interpreting AI findings, and handling the creative and complex aspects of quality that AI can't easily grasp," he says. "The mundane stuff—running countless test variations, monitoring every little metric—that can be offloaded to AI."
"But the insight to understand a novel failure, or to design a clever test strategy, that will always need human intuition." He encourages quality engineers to embrace AI as a collaborator.
The future he paints is one of human-machine collaboration: AI doing the heavy lifting and humans providing direction and critical thinking. This outlook is optimistic and empowering.
It aligns with the broader industry sentiment that AI will amplify human capabilities in software development rather than replace them. As AI technology matures, Mukka expects continuous verification to become smarter and more self-sufficient, possibly evolving into self-adaptive systems that automatically adjust quality checks as a system changes.
Ultimately, his vision of infrastructure quality is one where AI-driven tools ensure software reliability in the background, enabling development teams to move at high velocity without sacrificing stability. It's a future where quality is "baked in" by intelligent automation, guided by the strategic oversight of engineers—a future Mukka is actively helping to create.
Given his experience, Mukka often advises other engineers and organizations on how to successfully adopt AI in their testing and infrastructure quality practices. A key piece of advice he offers is to start small and focused.
"Don't try to overhaul everything in one go," Mukka advises. "Pick one or two areas where AI can make an immediate impact—maybe it's an AI tool to generate test data, or a machine learning model to prioritize your regression tests."
"Implement it, learn from the results, and iterate. Early quick wins will build the confidence and justification for expanding AI's role." This incremental approach is echoed by industry best practices: experts recommend integrating AI into specific testing scenarios first, rather than attempting a wholesale replacement of existing processes.
By starting with a manageable pilot project, teams can work out kinks in the technology and build trust in the outcomes. Mukka's own journey began with such pilots, which helped demonstrate the value of AI to his stakeholders.
He also emphasizes setting clear success criteria (for example, reducing test execution time by X% or catching certain categories of bugs) so that the benefits of the AI experiment can be objectively measured. Another crucial piece of guidance from Mukka is to invest in people and processes around the AI tools.
"AI in testing isn't magic—your team needs to understand it and your processes might need adjustment," he notes. "Upskill your testers so they are comfortable with data and basic AI concepts. Encourage collaboration between developers, QA, and data scientists if you have them. And establish feedback loops: monitor and tune AI tools regularly and continuously tweak them."
"If an AI recommendation is wrong, treat it as a learning opportunity to improve the model." Maintaining human oversight and continuously refining the AI models ensures that the technology remains effective and aligned with quality goals.
Mukka also stresses the importance of keeping test assets and data well-organized—since AI tools often rely on parsing test cases, logs, and results, having them in machine-readable formats makes a big difference in outcomes. Finally, he tells teams not to lose sight of the fundamental testing principles.
"AI can augment your testing, but it won't replace good test design and analysis. Use AI to handle scale and complexity, but always validate its findings with your expertise," he says.
This balanced approach ensures that AI is used as a powerful tool within a robust quality engineering strategy. With these practical pieces of advice, Mukka's experience becomes a playbook for others aiming to bring AI-driven innovation to their own infrastructure quality and testing practices.
Mukka's journey illustrates the profound impact that AI-driven innovation can have on infrastructure quality and testing. From the introduction of machine learning to catch hidden issues, to the establishment of continuous verification as a guardrail for software in production, Mukka has shown how traditional QA can evolve into a smarter, faster, and more proactive practice.
Under his leadership, AI and ML moved from buzzwords to tangible results: higher reliability, faster releases, and a culture of continuous improvement in quality engineering. His vision for the future—where autonomous testing agents work alongside human engineers—offers a glimpse of how the next generation of software quality assurance might look.
It's an optimistic vision in which quality is assured not by slowing down change, but by innovating the way we verify and validate at every step. As organizations everywhere grapple with the demands of complex systems and rapid delivery, Mukka's story serves as both inspiration and blueprint.
The takeaway is clear: embracing AI in infrastructure quality is not just a theoretical idea, but a practical path to robust systems and efficient engineering. And with pioneers like Mukka leading the way, the software industry is poised to make quality assurance smarter and more resilient than ever before.