CodeSpeak can improve test coverage in your project
Today we release CodeSpeak 0.3.2. Please find the full release notes at the end of this post.
The key feature in this release is automated test coverage improvement. TL;DR: you can run codespeak coverage and CodeSpeak will run your tests, measure coverage, and add tests to bring it as high as possible.
Why test coverage matters
It wouldn't be much of an exaggeration to say that AI code generation is as good as the test suite that verifies the changes. The power of coding agents is more than just generating correct code from scratch (sometimes), the much more impressive thing they do is finding and correcting their own mistakes. And the better the test suite, the more bugs it can catch, and therefore the better the results that AI code generators can deliver.
While CodeSpeak is not a chat-based tool, it of course uses the best agentic code generating technology under the hood, and therefore benefits from good tests as much as any other agentic coding tool.
What is code coverage
What can we measure about the quality of a test suite? Structural metrics like the number of tests aren't very informative in most cases. One important aspect that can be captured by a metric is code coverage, i.e. what percentage of the code is the suite actually testing. Most of the time this is measured as a percentage of all lines of code that have been run by the test suite. Granted, some lines are not executable: comments, some declarations, etc. These are usually excluded from the calculation.
For example:
if temperature > ALARM_THRESHOLD:
indicator_color = RED
else:
indicator_color = GREENA good test suite will run both code paths: with temperature > ALARM_THRESHOLD and with temperature <= ALARM_THRESHOLD, and get 100% line coverage on this code. If we forget to test, for example, the case of temperature <= ALARM_THRESHOLD, the coverage will be 66.6% (2 out of 3 executable lines, else itself is not executable), and this is how we know that the suite can be improved.
The codespeak coverage command described below finds such gaps in test suites and adds missing tests until it reaches the desired coverage level (which may actually be lower than 100%, depending on the nature of your codebase).
How to use codespeak coverage
To illustrate the usage of codespeak coverage, we'll use our clone of microsoft/MarkItDown, the anything-to-markdown converter (โญ๏ธ84.9K on GitHub), see a previous blog post on using Mixed mode.
Project setup
Install CodeSpeak first (if you already have it, run uv tool upgrade codespeak-cli).
Prerequisites
Install uv
CodeSpeak uses uv as its Python package manager.
curl -LsSf https://astral.sh/uv/install.sh | shRestart your terminal (or run source ~/.bashrc / source ~/.zshrc), then verify:
uv --versionGet an Anthropic API key
CodeSpeak is BYOK (Bring Your Own Key). Get an API key at platform.claude.com/settings/keys.
You can provide the key in two ways:
- Paste it when CodeSpeak prompts you (this creates an
.env.localfile in your project directory) - Set the environment variable:
export ANTHROPIC_API_KEY=<your-key>
Install CodeSpeak
uv tool install codespeak-cliVerify the installation:
codespeak --versionLog in
codespeak loginLog in with Google or email/password.
Now, let's clone CodeSpeak's fork of the MarkItDown repository. This repository already has CodeSpeak project initialized.
git clone https://github.com/codespeak-dev/markitdown markitdown-codespeak && cd markitdown-codespeakInitialise the environment and install dependencies:
uv venv --python=3.12 .venv
source .venv/bin/activate
uv pip install hatchImprove coverage
Now, let's make CodeSpeak bring Python test coverage to 100%:
codespeak coverage --target 100 --max-iterations 5The build will fail with the following message:
A placeholder for the test runner command was added to codespeak.json for spec
'packages/markitdown/src/markitdown/converters/eml_converter.cs.md'.
Please fill it in with the actual command, or run 'codespeak coverage --auto-configure --spec <spec>'
to auto-detect it. Use {tests_report_file} placeholder for the test results output in pytest-json-report format
and {tests_coverage_report_file} placeholder for the test coverage results in pytest-cov JSON format.
Wait, what has just happened? In order to run tests with coverage, CodeSpeak needs to know how to do it. CodeSpeak found no registered command that runs relevant tests for the managed Python code. So, it added a placeholder in codespeak.json which now needs to be filled. To help you, CodeSpeak can automatically detect this command and put it in the file for you. Let's try that!
codespeak coverage --auto-configureCodeSpeak analyzed the project and came up with a meaningful command, which it put in codespeak.json:
Auto-configured test runner for spec 'packages/markitdown/src/markitdown/converters/eml_converter.cs.md': cd packages/markitdown && hatch run pytest tests/ --json-report
--json-report-file={tests_report_file} --cov=src/markitdown --cov-branch --cov-report=json:{tests_coverage_report_file} --tb=short
So now, we're good to go and run our original command!
codespeak coverage --target 100 --max-iterations 5CodeSpeak will now execute up to 5 iterations trying to bring test coverage for the managed Python code to 100%. During initial test run, it will show you the command it uses. It will also report the test coverage the project had initially and after each iteration:
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ CodeSpeak Progress โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ โ Improving coverage (2m 24s) โ
โ โฐโ โ Analyzing testable Python files (6.6s) โ
โ โฐโ โ _eml_converter.py (6.6s) โ
โ โฐโ โ Running and validating tests (initial run) (1m 44s) โ
โ Using command: cd packages/markitdown && hatch run pytest tests/ --json-report โ
โ --json-report-file=/Users/ks/projects/codespeak-blog-post-2026-03-01/markitdown-codespeak/.codespeak/ignored/tests_report.json --cov=src/markitdown --cov-branch โ
โ --cov-report=json:/Users/ks/projects/codespeak-blog-post-2026-03-01/markitdown-codespeak/.codespeak/ignored/coverage_report.json --tb=short โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Alpha Preview โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Initial state: ran 227 tests, observed 6 test failures. Coverage: 84%
Note there are 6 pre-existing test failures, because some new tests are trying to download test data from the original GitHub repo, which the fork does not include. During test coverage improvement, all pre-existing test failures will be ignored. For this project, you can disable the failing tests by adding GITHUB_ACTIONS=1 to the test runner command in codespeak.json.
Verify the results
After build completes, let's look at the full output:
โญโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ CodeSpeak Progress โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฎ
โ โ Improving coverage (8m 44s) โ
โ โฐโ โ Analyzing testable Python files (6.6s) โ
โ โฐโ โ _eml_converter.py (6.6s) โ
โ โฐโ โ Running and validating tests (initial run) (1m 57s) โ
โ โฐโ โ Improving coverage (6m 7s) โ
โ โฐโ โ Collect context & plan work (43.9s) โ
โ โฐโ โ Create test file for EML converter missing coverage (26.5s) โ
โ โฐโ โ Write test for _get_body_and_attachments covering nested message with payload as Message (0.0s) โ
โ โฐโ โ Write test for _human_readable_size covering KB, MB, and GB ranges (0.0s) โ
โ โฐโ โ Write test for EmlConverter.accepts covering rejection case (0.0s) โ
โ โฐโ โ Run validate_tests to check coverage (4m 45s) โ
โ โฐโ โ Running and validating tests (after iteration 1) (2m 5s) โ
โ โฐโ โ Running and validating tests (after iteration 2) (1m 55s) โ
โฐโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ Alpha Preview โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโฏ
Initial state: ran 227 tests, observed 6 test failures. Coverage: 84%
Iteration 1: ran 231 tests, observed 8 test failures. Coverage: 88%
Iteration 2: ran 231 tests, observed 6 test failures. Coverage: 100%
Reached target coverage.
Done!
CodeSpeak added 4 tests and achieved 100% coverage ๐ Now, this test sute can catch more bugs and support better agentic code generation.
The Road Ahead
This early version of codespeak coverage is the first step on our journey of perfecting test suites with CodeSpeak. Generating reliable code is crucial for our mission, and we'll keep improving the toolchain to add more capabilities in this area.
A few things we are planning to do in the future:
- support more languages (the current version only supports Python),
- branch coverage and other more sophisticated metrics,
- mutation testing,
- better CI/in-cloud support for test improvements.
Full Changelog since 0.3.1
New
- Added
codespeak coveragecommand to automatically improve test coverage for Python code, including auto-detection of your project's test runner configuration. codespeak takeoverno longer requires specs to be pre-configured.- Further improved build cancellation speed when using the MCP server integration.
Bug fixes
- Fixed "prompt is too long" errors that could occur in large mixed mode projects.
- Fixed the current Python environment leaking into child processes, which could cause dependency conflicts during builds.
- Improved error reporting when external API calls fail during a build.
- Cleaned up build progress output to reduce visual clutter.
See Also
- First step in modularity: Spec dependencies and Managed files
New features: Managed files, Spec dependencies/imports - First glimpse of
codespeak takeover: Transition from Code to Specs in Real Projects
New features: Extract a spec from existing code, improvements to Mixed Mode and error handling