AI-assisted testing

1. AI-assisted testing

Welcome back! In this video, we'll explore a critical but often overlooked aspect of production software: testing and security.

2. Continuing with Atlas

We're continuing to integrate the Atlas toolbox into our tourism use case at Wayfarer Labs. Before adding Atlas to our stack, we must ensure the codebase is well tested, secure, and resilient to future changes. Traditional testing techniques still apply, but AI can significantly accelerate and improve this process—especially when working with an unfamiliar codebase.

3. Why AI-assisted testing?

AI-assisted testing helps uncover untested paths, edge cases, and potential vulnerabilities that are easy to miss during manual review. A good starting point is to ask an AI model to assess the current testing maturity of the codebase.

4. Assessing testing maturity

Instead of a vague assessment, we use a prompt that asks for a simple rubric. This gives us structured, actionable feedback. Pause the video and take a moment to read the prompt on screen if you like. Here, we ask the model to evaluate critical paths, edge cases, regression protection, and CI automation on a scale of zero to five.

5. Testing maturity results

We can see how the model reports low overall testing maturity. Critical paths and regression protection score one out of five, edge cases score a two out of five, and CI automation scores zero out of five. The model also tells us that Atlas is protected by only basic unit tests and lacking integration and regression tests.

6. Establishing a baseline

Before modifying the code base, we establish a baseline using coverage tools. We can ask the AI to suggest the exact commands needed to measure test coverage. That will be our baseline command: pytest with coverage flags helps detect untested paths. We can also share the command output with the model.

7. Analyzing coverage results

The model analyzes the output and reports that, while a few areas have reasonable coverage, many critical modules have very low or even zero coverage. That's the case for "daily pipeline" and "loaders", central modules in the codebase.

8. Testing strategies

There are multiple testing strategies we could combine: exploratory testing to understand system behavior, functional testing to verify expected outputs, regression testing to protect existing behavior, and automated testing to ensure fast, repeatable feedback on every change.

9. Building a test suite

A test suite is a structured collection of tests, data, tools, and rules that ensure system behavior remains consistent over time. We can use AI to help us build it incrementally. Here, we ask the model to generate unit and integration tests for the codebase. Because the model may generate many tests, we can also ask it to list them, along with a brief description of the behavior each test covers. That will help reviewing them.

10. Verifying coverage improvement

We can see how the model effectively generates all tests and the list we requested. After adding and reviewing the tests, we verify coverage by running the pytest command suggested earlier. The coverage numbers have skyrocketed!

11. Runtime protections

Another issue we could face with Atlas appears when injecting our custom data as input. To mitigate this, we can ask the model to add runtime protections: input and schema checks, clear error messages, safe defaults, and guardrails at pipeline boundaries. There are lots of possibilities!

12. CI automation

Finally, since all these tests must run automatically, we can prompt the model to generate a minimal CI script that runs the full test suite we just generated. When we run this CI locally, we can see that AI has helped us increase coverage to around eighty percent.

13. Let's practice!

Now it's your turn to practice!

Create Your Free Account

By continuing, you accept our Terms of Use, our Privacy Policy and that your data is stored in the USA.