Testing AI: What Are You Missing During the POC Phase?


New technologies bring new challenges. Although artificial intelligence (AI) has been around for a long time, it is still a new phenomenon in software development and testing. A key AI feature is its ability to learn and evolve. Therefore, the test results you get today may not be relevant tomorrow. Such changing conditions require a specific approach to testing AI software which may differ from testing other products.

So, how to organize QA for AI and be always up to date about the system state? This post answers the above question and gives useful tips for QA teams that are tasked with testing AI.


AI was invented by humans, which means it is fallible. If you want to minimize errors, it is essential to conduct thorough testing at every development stage. Start by creating a testing strategy as if you were doing it for a regular software product: study requirements, available tools, and expected output. Before you enter upon testing, understand the product’s nature.

Is your testing object purely AI-based software, or does it use some data produced by smart algorithms? If the latter is the case, you don’t have to come up with something special. Just use the testing methods you were using before for any other piece of software. The main task here is to make sure the program works as it should when using the AI’s output.

If the application under test can gain new knowledge and make decisions, you are dealing with a machine learning system. In this case, the expected result will not be static, so you need a different testing scheme.

First of all, you have to get a complete list of datasets used to educate the program:

  • A training dataset contains multiple examples used by an AI system to learn and train.
  • A validation dataset is used to configure the architecture of the classifier.
  • A test dataset helps estimate program productivity.

Understanding how the above datasets interact for training an AI model will help you create an efficient technique for testing AI software.


Testing AI applications is a challenge itself. However, there are common pitfalls that testers most often encounter.


Unlike software code, AI cannot follow a strict algorithm. Imagine writing a code for a self-driving car. There is no way to play all possible scenarios with if-else, case, and switch statements. Yet, you can teach a car to make decisions depending on the current situation.

Let’s assume a car “sees” a stone on the road. What should it do? Stop, go around, or keep driving? The correct option depends on many factors: traffic intensity, stone size, vehicle speed, etc. AI software testers can evaluate the decision within acceptable boundaries. To set the exact boundaries, you need to limit the input data to only the essential information.


AI systems can be biased if they make mistakes or draw the wrong conclusions during machine learning. Some time ago, CNN published an article claiming that “facial recognition systems show rampant racial bias.” The story was as follows. Some office buildings installed a facial recognition system to restrict access. But when the system started working, it turned out that it only allowed white people in. This is because the AI ​​was only trained on white people. To avoid such situations, QA for artificial intelligence should use various datasets and checking modes.


AI can face the problem of a lack of data in an ever-changing environment. Imagine you need to test a system that recognizes smartphone models. Firstly, there are lots of such models now; secondly, new ones are constantly appearing. To keep the AI ​​system up to date, you need to update its knowledge base continually, which is a very time-consuming process. To make things easier, QA engineers can use special tools for testing AI. For example, ImageDataGenerator, which preprocesses a set of pictures and prepares data for pattern recognition.


When an AI system goes into production, it does not stop learning. The bad news is that you cannot control the resources from which it takes information. This process is similar to parenting. You can teach your child to make the right choices, but you no longer control them after leaving home. This is what happened with Microsoft’s Tay bot. It learned not just to speak but conduct a talk with people. As soon as it was given a Twitter account, it announced that Hitler was a hero and began to promote racism and genocide. To minimize the likelihood of failure, you need to simulate as many situations as possible during the testing phase and continue to monitor the system after it is released.


AI systems use a massive data pool while learning. Often, they have access to sensitive information, violating confidentiality. Federated method of machine learning solves this problem. It assumes training the algorithm on many decentralized servers containing local data samples without exchanging them. The QA’s task is to make sure the system correctly interprets and applies the received data without breaking privacy boundaries.


Building an AI system is a long and expensive process. Before a company starts investing in a project, it should make sure the game is worth the candle. Proof of concept (PoC) and minimum viable product (MVP) are two techniques that help evaluate the project value.

PoC aims to confirm the project is technically viable and economically beneficial for the company. It involves the collaboration of developers, testers, and business analysts. In a good scenario, its accuracy reaches 80–85%.

For a QA engineer, testing the PoC stage means checking the most complex part or feature from a technical point of view. If you prove it works, there is a big chance of implementing the project as a whole.

The next step is launching an MVP. It is a prototype with a minimum set of functions that can cover customers’ basic needs. If it is in demand and makes a profit, consider it a strong reason to develop a full-fledged product.

To effectively conduct PoC, MVP, and subsequent AI testing phases, QA engineers need a number of special skills. Here are the five essentials for AI automation testers:

1. Expertise in AI frameworks and tools. The most demanded ones are Testim and Appvance. You can use them to run AI-generated tests for your ever-changing product.

2. Machine Learning basics. It is vital to understand the algorithms by which the program is trained to anticipate its future decisions.

3. Cloud computing. Apps are more and more migrating to the cloud, and AI is no exception. Testers should be equally comfortable in both server and serverless environments to ensure software stability.

4. Data science fundamentals. AI uses large amounts of data to learn and evolve. Data science helps testers understand methods of data analyzing and extracting valuable information.

5. Visual testing. It allows comparing the visible output with a baseline image. Visual tests can be run on individual components or a whole app by integrating with automated testing frameworks like Selenium, Cypress, and WebDriverIO.


In testing AI software, the main thing is to understand how it was trained. QA experts should examine all used datasets to predict the most likely system responses in certain situations. Before presenting the product to the world, it is crucial to check its working efficiency during PoC and MVP phases. If you have a non-AI alternative that will show the same results, it is best to opt for it since developing and testing an AI system is a time-consuming and expensive process.



Software Development Company

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store