← Blog
|release|By Dmitry Savvinov|

First glimpse of codespeak takeover: Transition from Code to Specs in Real Projects

āš ļø CodeSpeak is in Alpha Preview: many things are rough around the edges. Please use at your own risk and report any issues to our Discord. Thank you!

Today we release CodeSpeak 0.3.1. Please find the full release notes at the end of this post.

One of the core promises of CodeSpeak is "Maintain specs, not code". Application code often contains a lot of tedious details that are obvious to humans. LLMs make these details obvious to machines as well, so that humans can focus on the essense (the business logic, the architecture, the tricky parts of the system, etc).

An important part of this vision is migrating parts of existing projects from the "old world" of code to the "new world" of specs. Maintenance is a lot easier when all you need to do is edit a concise human-readable text instead of much longer code.

For this scenario, CodeSpeak 0.3.1 introduces takeover. In this blog post, we'll look at how it works: convert some code to a spec, then fix a real issue by editing the spec and rebuilding the project for CodeSpeak to update the code. We'll use a real project, microsoft/MarkItDown (a document-to-markdown converter) and fix issue #1468 from its GitHub repo:

Issue #1468

Set up the project

Install CodeSpeak first (you already have it, run uv tool upgrade codespeak-cli).

Prerequisites

Install uv

curl -LsSf https://astral.sh/uv/install.sh | sh

Now, restart your terminal or run source ~/.bashrc (source ~/.zshrc, depending on what terminal you are using).

Make sure uv is available:

uv --version

Get an Anthropic API key

CodeSpeak uses BYOK (Bring Your Own Key). Please get an API key at:

Configure ANTHROPIC_API_KEY variable:

  • either just šŸ“‹ paste your key when CodeSpeak asks you to (this will create an .env.local file in your project dir),
  • or export ANTHROPIC_API_KEY=...

Install CodeSpeak

To install CodeSpeak with uv:

uv tool install codespeak-cli

Log in with Google or email/password:

codespeak login

Now, let's clone MarkItDown and initialise CodeSpeak in mixed mode:

git clone git@github.com:microsoft/markitdown.git
cd markitdown
codespeak init --mixed

Run takeover

The source file responsible for Outlook MSG conversion is _outlook_msg_converter.py. Let's extract a spec from it with codespeak takeover:

codespeak takeover packages/markitdown/src/markitdown/converters/_outlook_msg_converter.py

CodeSpeak reads the source and extracts a spec:

╭─────────────────────────── CodeSpeak Progress ───────────────────────────╮
│ āœ“ Extract specification (43.4s)                                          │
│ ╰─ āœ“ Collect context & plan work (43.2s)                                 │
╰────────────────────────── 🚧 Alpha Preview 🚧 ────────────────────────────╯

This creates _outlook_msg_converter.cs.md next to the original file and registers it in codespeak.json. (See this commit in our clone of the repo.)

Review the generated spec

Here's a fragment of the generated spec:

# Outlook MSG Converter Specification

The Outlook MSG converter transforms Microsoft Outlook `.msg` email
files into structured markdown documents by parsing the OLE file format
and extracting email metadata and content.

## Conversion Process

### Email Metadata Extraction

Extracts standard email headers from specific OLE streams:
- **From**: Stream `__substg1.0_0C1F001F`
- **To**: Stream `__substg1.0_0E04001F`
- **Subject**: Stream `__substg1.0_0037001F`

### Email Body Content

Extracts message body from stream `__substg1.0_1000001F`.

...

The spec captures what the converter does — which streams it reads, how it validates files, how it formats output — without reproducing the Python code itself.

Let's fix an issue!

The issue #1468 says that Cc, Bcc, Date, and attachments are missing. Instead of editing the code, let's add them to the spec:

 Extracts standard email headers from specific OLE streams:
 - **From**: Stream `__substg1.0_0C1F001F`
 - **To**: Stream `__substg1.0_0E04001F`
+- **Cc**: Stream `__substg1.0_0E03001F`
+- **Bcc**: Stream `__substg1.0_0E02001F`
 - **Subject**: Stream `__substg1.0_0037001F`
+- **Date**: Stream `__substg1.0_0039001F` (client submit time, human-readable format)
+
+### Attachments
+
+Enumerates attachment sub-storages (`__attach_version1.0_#XXXXXXXX`)
+and extracts metadata for each:
+- **Filename**: Stream `__substg1.0_3707001F` (display name),
+  falling back to `__substg1.0_3704001F` (filename)
+- **Size**: Determined from the length of the attachment data
+  stream `__substg1.0_37010102`
+
+Attachment content is not decoded or included — only metadata is listed.

We also update the output format section to include the new fields:

 **From:** [sender address]
 **To:** [recipient address]
+**Cc:** [cc addresses]
+**Bcc:** [bcc addresses]
 **Subject:** [email subject]
+**Date:** [sent date/time]

 ## Content

 [email body content]
+
+## Attachments
+
+- [filename] (size in human-readable format)
+- ...

Finally, we add a test input — a real .msg file with a known attachment. This gives CodeSpeak a concrete example to test against during the build:

+## Test inputs
+
+- test_files/unicode.msg is expected to have a TIF attachment

(See commit eb7afee in our clone of the repo.)

Build

To propagate the changes to the code, let's build the project:

codespeak build
╭─ CodeSpeak Progress: Building Outlook MSG converter with OLE parsing... ─╮
│ āœ“ Process specification (0.0s)                                           │
│ āœ“ Collect project information (0.0s)                                     │
│ āœ“ Implement specification (3m 1s)                                        │
│ ╰─ āœ“ Collect context & plan work (3m 1s)                                 │
│ āœ“ Generate and run tests in mixed mode (5m 15s)                          │
│ āœ“ Finalize mixed mode run (0.0s)                                         │
╰──────────────────────────── 🚧 Alpha Preview 🚧 ──────────────────────────╯
App built successfully.

Inspect the results

CodeSpeak added Cc, Bcc, Date extraction and a full attachments section to the converter. Here's part of the diff:

 headers = {
     "From": self._get_stream_data(msg, "__substg1.0_0C1F001F"),
     "To": self._get_stream_data(msg, "__substg1.0_0E04001F"),
+    "Cc": self._get_stream_data(msg, "__substg1.0_0E03001F"),
+    "Bcc": self._get_stream_data(msg, "__substg1.0_0E02001F"),
     "Subject": self._get_stream_data(msg, "__substg1.0_0037001F"),
+    "Date": self._get_stream_data(msg, "__substg1.0_0039001F"),
 }

It also generated _get_attachments, _get_attachment_size, and _format_file_size helper methods.

Because we added the test input to the spec, CodeSpeak also generated test vectors that verify against the real .msg file — checking that Cc, Subject, attachment filenames, and human-readable sizes all appear in the output:

FileTestVector(
    filename="unicode.msg",
    must_include=[
        "**From:** brizhou@gmail.com",
        "**Cc:** Brian Zhou",
        "**Subject:** Test for TIF files",
        "## Attachments",
        ".tif",
        "(946.9 KB)",
    ],
),

All from a spec edit, no manual Python coding (see commit a163a38). The change of +23-3 lines of spec generated +221-25 lines of code (~10x)!

Done!

Tests are passing. This issue is fixed šŸŽ‰

From here on, you can keep editing the spec and rebuilding. The converter is now fully managed by CodeSpeak. Next time you need to change anything in parts of the project not taken over by CodeSpeak, you can just run codespeak takeover and focus on the logic, not the code.

The Road Ahead

We've just started with takeover: the current version gives you a feel of what it can be, but it's only a tiny first step, and needs a lot of improvement.

The aspects we are planning to improve:

  • making sure that the spec doesn't miss anything important and doesn't include anything unnecessary,
  • making sure that if we delete the code, an equivalent implementation can be generated from the spec (passing all the tests, etc),
  • verifying that, when editing the spec, we can generate adequate changes in the code (spec diff -> code diff),
  • ideally, also maintaining good test coverage to make sure everything keeps working over time.

Full Changelog since 0.2.3

New

  • Added codespeak takeover command to bring existing source files under spec management (the subject of this post).
  • Added --skip-tests support in mixed mode, so you can skip the test phase when iterating on a spec.
  • The whitelist in mixed mode now supports glob patterns like src/**/*.py.
  • The build progress panel now shows a summary of the current change in its title.
  • CodeSpeak now notifies you when a newer version is available.
  • CodeSpeak now checks your API key balance before starting a build, so you don't wait minutes only to hit a billing error.
  • If codespeak.json has errors, CodeSpeak now reports what's wrong instead of failing with a confusing message.
  • If your project uses a stack CodeSpeak doesn't support yet, you now get a clear warning upfront.
  • Error tracebacks are now syntax-highlighted, so you see the real cause directly.

Bug fixes

  • Fixed change requests in multi-spec mixed mode projects to require --spec for disambiguating which spec to apply to.
  • Fixed duplicated progress output when processing multiple specs.
  • Fixed crashes on very large specs.
  • Fixed empty lines being stripped from process output.
  • Fixed git index.lock contention, improving stability when IDEs or other git clients access the repo during a build.
  • Fixed several git index handling edge cases.
  • Fixed a confusing error message when ripgrep is not installed.
  • Fixed keystrokes being echoed to the terminal during builds.
  • Improved build cancellation speed.