CodeSpeak Quick Start: Work in an Existing Project
Let's use CodeSpeak to add a feature to an existing project. We call this mixed mode ā only part of the codebase is controlled by CodeSpeak.
Prerequisites
Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
Now, restart your terminal or run source ~/.bashrc (source ~/.zshrc, depending on what terminal you are using).
Make sure uv is available:
uv --version
Get an Anthropic API key
CodeSpeak uses BYOK (Bring Your Own Key). Please get an API key at:
Configure ANTHROPIC_API_KEY variable:
- either just š paste your key when CodeSpeak asks you to (this will create an
.env.localfile in your project dir), - or
export ANTHROPIC_API_KEY=...
Install CodeSpeak
To install CodeSpeak with uv:
uv tool install codespeak-cli
Log in with Google or email/password:
codespeak login
Clone the repo
We'll add EML support to MarkItDown, Microsoft's document-to-markdown converter.
git clone git@github.com:microsoft/markitdown.git
cd markitdown
Setup the project
Following MarkItDown README, let's set up a venv to make sure it works.
uv venv --python=3.12 .venv
source .venv/bin/activate
You can verify the tests pass with
pushd packages/markitdown
uv pip install hatch
hatch test
popd
This should produce some output like
================================== test session starts ===================================
<...>
collected 196 items
tests/test_cli_misc.py .. [ 1%]
tests/test_cli_vectors.py .................................................. [ 26%]
<...>
======================= 194 passed, 2 skipped in 94.08s (0:01:34) ========================
You can also verify markitdown itself works by converting one of the existing test files:
uv pip install -e 'packages/markitdown[all]'
markitdown packages/markitdown/tests/test_files/test_with_comment.docx
Initialise CodeSpeak in mixed mode
codespeak init --mixed
# Initialized CodeSpeak project in mixed mode
This creates a codespeak.json at the repo root. Mixed mode means CodeSpeak manages only the files you specify ā the rest of the codebase stays untouched.
Optionally, create an AGENTS.md file to help CodeSpeak's agents navigate the project faster:
A virtual environment is pre-configured at the project root (`.venv/`). Hatch is installed there.
# Running Tests
From `packages/markitdown/`, run `GITHUB_ACTIONS=1 hatch test`. Skipping remote URL testing is necessary for any new work.
The full test suite takes several minutes.
# Adding Tests
The primary testing mechanism is the **test vector framework**:
1. Add test fixture files to `tests/test_files/`
2. Add `FileTestVector` entries to `tests/_test_vectors.py`
The parametrized tests in `test_module_vectors.py` will automatically exercise your converter through all standard code paths.
Configure and add a spec
In order to add our new feature, let's create packages/markitdown/src/markitdown/converters/eml_converter.cs.md ā right next to the existing converters:
# EmlConverter
Converts RFC 5322 email files (.eml) to Markdown using Python's built-in `email` module.
## Accepts
`.eml` extension or `message/rfc822` MIME type.
## Output Structure
1. **Headers section**: From, To, Cc, Subject, Date as `**Key:** value` pairs
2. **Body**: plain text preferred; if only HTML, convert to markdown
3. **Attachments section** (if any): list with filename, MIME type, human-readable size
## Parsing Requirements
- Decode RFC 2047 encoded headers (e.g., `=?UTF-8?B?...?=`)
- Decode body content (base64, quoted-printable)
- Handle multipart: walk parts, prefer `text/plain` over `text/html`
- For `message/rfc822` parts: recursively format as quoted nested message
- Extract attachment metadata without decoding attachment content
Register this spec in codespeak.json:
"specs": [
"packages/markitdown/src/markitdown/converters/eml_converter.cs.md"
]
In mixed mode, CodeSpeak won't touch existing project files by default ā it only creates new ones. But our new converter needs to be wired into MarkItDown's plugin system: imported in __init__.py and registered in _markitdown.py. We explicitly allow this by adding the following files to whitelisted_files in codespeak.json:
"whitelisted_files": [
"packages/markitdown/src/markitdown/converters/__init__.py",
"packages/markitdown/src/markitdown/_markitdown.py",
"packages/markitdown/tests/_test_vectors.py"
]
Build
Complex mixed-mode projects work best with Claude Opus 4.6. Set the model with an environment variable and start the build:
CODESPEAK_ANTHROPIC_STANDARD_MODEL=claude-opus-4-6 codespeak build
On the first run, you'll be prompted to log in and add your API key. After that, CodeSpeak will execute the build. This can take some time:
Connecting to build.codespeak.dev:50053...
Remote build started (ID: 079e2794-b84f-41cf-8cbb-77c692b845d5)
āāāāāāā āāāāāāā āāāāāāā āāāāāāāāāāāāāāāāāāāāāāā āāāāāāāā āāāāāā āāā āāā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā āāāā
āāā āāā āāāāāā āāāāāāāāā āāāāāāāāāāāāāāāāāāāāāā āāāāāāāāāāāāāāā
āāā āāā āāāāāā āāāāāāāāā āāāāāāāāāāāāāāā āāāāāā āāāāāāāāāāāāāāā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāā āāāāāāāāāāā āāāāāā āāā
āāāāāāā āāāāāāā āāāāāāā āāāāāāāāāāāāāāāāāāā āāāāāāāāāāā āāāāāā āāā
āāāāāāāāāāāāāāāāāāāāāāāāāāāāā CodeSpeak Progress āāāāāāāāāāāāāāāāāāāāāāāāāāāāā®
ā ā Process specification (0.3s) ā
ā ā Collect project information (0.1s) ā
ā ā Implement specification (1m 33s) ā
ā ā°ā ā Collect context & plan work (1m 33s) ā
ā ā Generate and run tests in mixed mode (15m 46s) ā
ā ā°ā ā Run existing tests to ensure they pass (1m 12s) ā
ā ā°ā ā Create test EML files for different scenarios (1m 0s) ā
ā ā°ā ā Write focused unit tests for core EML functionality (6m 12s) ā
ā ā°ā ā ... ā
ā ā°ā ā Identify and fix issues in the nested message handling (2m 18s) ā
ā ā Finalize mixed mode run (0.1s) ā
ā°āāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāāÆ
Processing spec 1/1: packages/markitdown/src/markitdown/converters/eml_converter.cs.md
App built successfully.
Inspect the results
Now you can inspect the newly generated files:
$ git status
Changes not staged for commit:
modified: packages/markitdown/src/markitdown/_markitdown.py
modified: packages/markitdown/src/markitdown/converters/__init__.py
modified: packages/markitdown/tests/_test_vectors.py
Untracked files:
packages/markitdown/src/markitdown/converters/_eml_converter.py
packages/markitdown/tests/test_files/test_email.eml
packages/markitdown/tests/test_files/test_email_html_only.eml
packages/markitdown/tests/test_files/test_email_nested.eml
CodeSpeak created _eml_converter.py, wired it into the three whitelisted files, and generated sample .eml fixtures.
Run tests
pushd packages/markitdown
GITHUB_ACTIONS=1 hatch test
popd
platform linux -- Python 3.14.2, pytest-9.0.2
collected 229 items
tests/test_cli_misc.py .. [ 0%]
tests/test_cli_vectors.py .......................sssssssssssssss.. [ 27%]
<...>
tests/test_pdf_tables.py ............... [100%]
192 passed, 37 skipped in 47.65s
Try it out
CodeSpeak generated test .eml files during the build. Try the new converter on one:
markitdown packages/markitdown/tests/test_files/test_email.emlSee Also
- CodeSpeak in Mixed Projects: Add a Feature to Django Oscar
Walk through adding a dashboard report to Django Oscar using CodeSpeak mixed mode. - CodeSpeak Quick Start: Build a Project From Scratch
Walk through creating a CodeSpeak project from scratch with Python and uv.