Diction - Creative Programming

The power of large language models is of course the ability to generate text. Text can be converted to speech. Why not make an app to practice spelling? In the Netherlands there’s the annual “Groot Dictee der Nederlandse Taal”, a recurring dictation contest happening since 1990. This post is about making a web app with Claude Code using Claude to generate the sentences of the dictation and a text-to-speech service (TTS) to have them spoken aloud. Afterwards, one can review their written result with the actual text used.

Background

The “Groot Dictee der Nederlandse Taal”, usually written by a renowned Dutch or Belgian author, used to be a television event taking place in the Senate room of our Parliament; quite a happening! Nowadays it is a radio event with the possibility to attend in person.

Do you remember having dictation in high school, where your teacher would read aloud a sentence and you had to write the words down on paper, hand in the paper and hope for an A+ (or a 10)?

Well, the “Groot Dictee der Nederlandse Taal” is exactly that, but then for fun. One of the Belgian contestants (yes, in Belgium about 60% of the population speaks Dutch) usually wins…! I don’t know why, but they’re good at dictations… that makes a good reason to practice, although I must admit I haven’t tested myself with the “Groot Dictee” (Grand Dictation) for years and usually I ended up with forty-something mistakes anyway. They don’t make it easy on you!

Enter Diction, a dictation generator webapp that can read out loud, made with AI.

Tech stack

In my global CLAUDE.md file I’ve expressed that I want Claude Code to always write modular code. For this webapp I wanted a simple Express webserver with little routes and NO TypeScript, just good old JavaScript. After writing an extensive prompt describing my idea, Claude Code started building and in one shot we had a good webapp that did all I asked for; I only had to put two API keys in a .env file, et voila!

Data flow

The end user of the app wants to hear spoken sentences to write along with (the core idea of a dictation). A spoken sentence should be repeatable. When you are done writing, you want to validate your writing against the dictation text. Initial user input are the topics for the dictation text and the amount of sentences preferred.

Diction will, when requested to generate a new dictation, make an API request to Claude using the Anthropic SDK with an extensive prompt requesting long and difficult sentences. The response is stored in a temporary directory. Each sentence is then sent to a text-to-speech service via API as well, which will respond with an MP3 file. These are stored next to the dictation text.

Diction webapp screenshot

When done, the webapp shows native browser players for each sentence and a button to unhide the sentence text.

Keeping in mind to keep it simple, stupid, I only prompted Claude additionally to check the code for security flaws (which it found and fixed) and to add minor sugar to the layout. Why? Because the app just works!

Tuning

Most of the time spent while working on this was on refining the prompt to Claude for the generation of the sentences, as I notice the following:

Claude is so eager to generate nice difficult Dutch sentences that it was making unique, basically correct, Dutch contractions, but these were too uncommon to be part of a dictation.
I increased the sentence length from 10-20 words to a more challenging value of 30-45 (I will probably make this configurable in a next iteration).
I guided Claude with adding three example sentences from former Dutch dictations but had to add explicitly that these were factually correct Dutch sentences and that Claude’s output should really be that too.

Being cheap I ran out of free credits with ElevenLabs in one night, not willing to pay a monthly fee for a hobby project. And I was only testing! But thanks to my global CLAUDE.md instruction to write modular code, I could quite easily plug in another TTS service and make the preference configurable. Now I can swap between services and ride free tiers for the time being.

Example audio of a Dutch sentence generated by AI. The sentence is written by Claude, the audio is created by ElevenLabs.

But, why?

I like making things no-one cares about but me. Spelling is nice. And though GPTs might be forgiving, sometimes they just output bogus when you misspell a word. It is good to master your native language.

Next up

But what’s next up? Two additions I’m about to add in: multi lingual support starting with English and extracting options from the prompt into user configuration.

And then: start practicing with the family!

Sample sentences for the prompt were taken from dictations from dictees.nl