Towards self-improving software
The building blocks for software that automatically improves itself are already here.
Given my recent experience using Claude Code, I wanted to try a simple idea. What if I had Claude Code autonomously fix bugs as they occurred in an application? The setup would be straightforward:
- A simple Python Flask app as the target application
- An error handler to catch server-side errors and pass them to Claude Code
- Claude Code with access to the app software and running without user input
Along with a wrapper script to handle restarting the server after Claude Code finishes, this would be all that’s needed to build a self-improving web app.
First, I had Claude Code build a simple Flask app for managing to-dos. I had it include a custom error handler. The handler generates a JSON file with a prompt for Claude that includes the error details:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
import json
import traceback
import os
CRASH_FILE = os.path.join(os.path.dirname(os.path.abspath(__file__)), "crash.json")
def build_claude_prompt(traceback_str, method, path, project_dir):
return f"""A crash occurred in the todo app. Here are the details:
HTTP {method} {path}
Traceback:
{traceback_str}
Instructions:
1. Read the relevant source files to understand the bug
2. Fix the bug that caused this crash
3. Run the tests with: .venv/bin/python -m pytest tests/ -v
4. Commit the fix with: git add -A && git commit -m "fix: <describe what you fixed>"
5. Write a short, friendly message for the user explaining what you fixed to {project_dir}/fix_message.json
Format: message
Keep it under 1-2 sentences, written from AI first-person perspective.
Important:
- The project is at: {project_dir}
- Only fix the bug that caused this specific crash
- Do not change any other functionality
- Make sure tests pass before committing
- Do NOT restart the server — the run script handles that
"""
def shutdown_server_after_delay(delay=2):
"""Shut down the Flask server after a short delay to allow the response to be sent."""
import threading
def _shutdown():
import time
time.sleep(delay)
os._exit(1)
threading.Thread(target=_shutdown, daemon=True).start()
def handle_crash(error, method, path):
tb_str = traceback.format_exception(type(error), error, error.__traceback__)
tb_text = "".join(tb_str)
project_dir = os.path.dirname(os.path.abspath(__file__))
prompt = build_claude_prompt(tb_text, method, path, project_dir)
print(f"CRASH DETECTED: {type(error).__name__}: {error}")
print(f"Route: {method} {path}")
print(f"Server will shut down. Claude will investigate the fix...")
with open(CRASH_FILE, "w") as f:
json.dump({"prompt": prompt}, f)
shutdown_server_after_delay()
Next, I used a wrapper script to run the app in a loop, launching Claude Code when it crashes:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
#!/bin/bash
# Self-improving server runner
# Runs the Flask app in a loop. When it crashes (exit code 1),
# reads crash.json, invokes claude to fix the bug, then restarts.
cd "$(dirname "$0")"
source .venv/bin/activate
CRASH_FILE="crash.json"
while true; do
echo "Starting Todo App on http://localhost:5001"
python app.py
EXIT_CODE=$?
if [ $EXIT_CODE -eq 0 ]; then
echo "Server exited cleanly."
break
fi
if [ ! -f "$CRASH_FILE" ]; then
echo "Server crashed but no crash.json found. Restarting..."
sleep 1
continue
fi
PROMPT=$(python -c "import json; print(json.load(open('$CRASH_FILE'))['prompt'])")
rm -f "$CRASH_FILE"
echo "Invoking Claude to fix the bug..."
claude --dangerously-skip-permissions -p "$PROMPT"
echo "Claude finished. Restarting server..."
sleep 1
done
To test the setup, I introduced 3 major bugs in the todo app:
- Marking a to-do as done tries to update the wrong column in the database
- Editing a to-do triggers a divide-by-zero error from an unused calculation involving string length
- Deleting a to-do makes an invalid database call
Now for the test. Here’s a recording of me using the app to trigger each of the three bugs:
For each bug that’s triggered, the server exits after generating a prompt for Claude that includes the crash details and instructions to fix the issue. Once Claude is done, the server is restarted by the wrapper script. Claude’s fix message then appears when the user is redirected away from the error page.
Here’s a snippet of the logs for the first crash:
1
2
3
4
5
6
7
8
sqlite3.OperationalError: no such column: is_completed
CRASH DETECTED: OperationalError: no such column: is_completed
Route: POST /toggle/4
Server will shut down. Claude will investigate the fix...
Invoking Claude to fix the bug...
Fixed. The toggle route was using `is_completed` but the table schema defines the column as `completed`. Changed line 77 in `app.py` to use the correct column name.
Claude finished. Restarting server...
Starting Todo App on http://localhost:5001
Flexible autonomy
In this example, I had Claude Code immediately fix the issue (it even committed the changes to the app’s git repo, but I didn’t show that here). But there are various degrees of autonomy one can implement with self-improving software. I could have instructed Claude Code to fix the issue in a new branch and generate a pull request in GitHub, allowing me to review and approve it before it was deployed to the running app.
Beyond bugs
This proof-of-concept was focused on autonomously fixing bugs, but there are other ways to implement self-improving software. For example, as I wrote about in my previous blog post, I built a personal finance app from scratch and have been using it regularly for weeks now. I recently had Claude Code add PostHog analytics to the app, which continuously gathers metrics about how I use the app. With PostHog’s MCP server, I can have the AI periodically look through the metrics and identify possible usability enhancements based on how I specifically use the app. For me, this is a more exciting avenue to explore. Imagine software that continuously adapts to how you use it, reducing friction over time so that it becomes easier and more intuitive.