Work in an Existing Project

โš ๏ธ CodeSpeak is in Alpha Preview: many things are rough around the edges. Please use at your own risk and report any issues to our Discord. Thank you!

In mixed mode, CodeSpeak manages only the files you specify โ€” the rest of the codebase stays untouched. This tutorial walks through adding EML support to Microsoft MarkItDown, a document-to-markdown converter.

Prerequisites: Complete the Installation steps first.

Clone the repo

We'll add EML support to MarkItDown, Microsoft's document-to-markdown converter.

git clone git@github.com:microsoft/markitdown.git
cd markitdown

Set up the project

Following MarkItDown README, let's set up a venv to make sure it works.

uv venv --python=3.12 .venv
source .venv/bin/activate

You can verify the tests pass with

pushd packages/markitdown
uv pip install hatch
hatch test
popd

This should produce some output like

================================== test session starts ===================================
<...>
collected 196 items

tests/test_cli_misc.py ..                                                                               [  1%]
tests/test_cli_vectors.py ..................................................                            [ 26%]
<...>

======================= 194 passed, 2 skipped in 94.08s (0:01:34) ========================

You can also verify markitdown itself works by converting one of the existing test files:

uv pip install -e 'packages/markitdown[all]'
markitdown packages/markitdown/tests/test_files/test_with_comment.docx

Initialize CodeSpeak

codespeak init

This creates a codespeak.json at the repo root. CodeSpeak manages only the files you specify โ€” the rest of the codebase stays untouched.

Optionally, create an AGENTS.md file to help CodeSpeak's agents navigate the project faster:

A virtual environment is pre-configured at the project root (`.venv/`). Hatch is installed there.

# Running Tests

From `packages/markitdown/`, run `GITHUB_ACTIONS=1 hatch test`. Skipping remote URL testing is necessary for any new work.

The full test suite takes several minutes.

# Adding Tests

The primary testing mechanism is the **test vector framework**:

1. Add test fixture files to `tests/test_files/`
2. Add `FileTestVector` entries to `tests/_test_vectors.py`

The parametrized tests in `test_module_vectors.py` will automatically exercise your converter through all standard code paths.

Create a spec

In order to add our new feature, let's create packages/markitdown/src/markitdown/converters/eml_converter.cs.md โ€” right next to the existing converters:

# EmlConverter

Converts RFC 5322 email files (.eml) to Markdown using Python's built-in `email` module.

## Accepts

`.eml` extension or `message/rfc822` MIME type.

## Output Structure

1. **Headers section**: From, To, Cc, Subject, Date as `**Key:** value` pairs
2. **Body**: plain text preferred; if only HTML, convert to markdown
3. **Attachments section** (if any): list with filename, MIME type, human-readable size

## Parsing Requirements

- Decode RFC 2047 encoded headers (e.g., `=?UTF-8?B?...?=`)
- Decode body content (base64, quoted-printable)
- Handle multipart: walk parts, prefer `text/plain` over `text/html`
- For `message/rfc822` parts: recursively format as quoted nested message
- Extract attachment metadata without decoding attachment content

Configure codespeak.json

Register this spec in codespeak.json:

"specs": [
  "packages/markitdown/src/markitdown/converters/eml_converter.cs.md"
]

CodeSpeak won't touch existing project files by default โ€” it only creates new ones. But our new converter needs to be wired into MarkItDown's plugin system: imported in __init__.py and registered in _markitdown.py. We explicitly allow this by adding the following files to whitelisted_files in codespeak.json:

"whitelisted_files": [
  "packages/markitdown/src/markitdown/converters/__init__.py",
  "packages/markitdown/src/markitdown/_markitdown.py",
  "packages/markitdown/tests/_test_vectors.py"
]

Build

Complex mixed-mode projects work best with Claude Opus 4.6. Set the model with an environment variable and start the build:

CODESPEAK_ANTHROPIC_STANDARD_MODEL=claude-opus-4-6 codespeak build

The build takes a few minutes. When complete:

Processing spec 1/1: packages/markitdown/src/markitdown/converters/eml_converter.cs.md
App built successfully.

Inspect the results

$ git status

Changes not staged for commit:
        modified:   packages/markitdown/src/markitdown/_markitdown.py
        modified:   packages/markitdown/src/markitdown/converters/__init__.py
        modified:   packages/markitdown/tests/_test_vectors.py

Untracked files:
        packages/markitdown/src/markitdown/converters/_eml_converter.py
        packages/markitdown/tests/test_files/test_email.eml
        packages/markitdown/tests/test_files/test_email_html_only.eml
        packages/markitdown/tests/test_files/test_email_nested.eml

CodeSpeak created _eml_converter.py, wired it into the three whitelisted files, and generated sample .eml fixtures.

Run tests

pushd packages/markitdown
GITHUB_ACTIONS=1 hatch test
popd
192 passed, 37 skipped in 47.65s

Try it out

CodeSpeak generated test .eml files during the build. Try the new converter on one:

markitdown packages/markitdown/tests/test_files/test_email.eml

Next steps