Every company is an AI company, and yet every problem is a people problem.

One of the lessons I learnt running a fairly large team (500+) previously is, every problem, in the end, is a people problem. It's not the technology. It's not the process.

-> Maybe the ability to solve people problems becomes even more important in this AI world?

(The other thing I learnt is, revenue solves most problems, temporarily.)

# Dec 20, 2025

An interesting pattern working with AI is: AI likes clarity and clear naming.

If your codebase is well organized and consistently named, it's much easier for the AI to follow those patterns.

A similar thing happens in effective organizations. If your ways of working are well organized and consistently named, it's much easier and faster for humans to get work done.

-> And to bring that thought all the way around, if your organization is well organized in this way, it will be much easier to leverage AI in it.

-> Introducing AI can help force clarity and simplicity in your organization?

AI wants clarity.

# Dec 20, 2025

One of my longer Claude 4.5 prompts:

create a spec at .claude/features/ for a statementStrategy model. (id, statement_id, written_by_sputnik_or_human, human_writer_user_id, approved_by_user_id,  is_current_version (boolean), response_category (admit, deny, nonadmission), response_concise_title, response, response_reasoning). This feature lets the LLM create a  strategy for each statement in a lawsuitEvent. We want individual creation (create for this statement), and then a Job to create all within this lawsuitEvent. Look in  app/Ai and in Actions and in Livewire for patterns. For the UI, on the statement detail page, an empty strategy box is shown underneath the statement and its  footnotes. 'STRATEGY' subtle title, on the right 3 dots with a flux dropdown: Generate, Write. This opens up a Strategy Flyout, that we will work out the details  later, where a user can write or generate the strategy. (Also later, in /studio there will be a section where users can run tje Job to generate strategy for all  statements in a lawsuitEvent). We keep multiple versions, and a field is_current_version (boolean) is used to know what version to display. In the flyout later users  can go back to a previous version. Users can also write themselves. A user can also approve a specific strategy, then that is set as approved, until we un-approve it.  So that a lawyer can review and approve the strategy. Go write a spec, and ask me questions. In the UI in the full card, we show 'strategy .. sputnik suggests/written  by username; concise title (button to approve, response_category); response; reasoing with a dropdown to expand full reasoning (hidden by default). Show little layouts of the various card versions in the doc. Ask me clarifying questions.

# Dec 16, 2025

This conversation on Paul Krugman's Substack (Nobel Prize economy winner) is just really interesting.

But most of inference is not you and I. This is a mistake we make all the time. People think that we are the story. With respect to inference, most of the global inference from consumers—from you and I and others—could be satisfied by a single data center in northern Virginia. That’s how small a fraction of the total load we are with respect to inference worldwide.

So 60%, let’s say, is training. We’re maybe 5 or 6% of the total workload of data centers. That bit in the middle, a huge chunk of that, is software itself—is coding, which turns out to be a huge profligate use of tokens.

# Dec 6, 2025

From a recent discussion by Paul Krugman, powered land companies.

Basically companies scooping up land that comes with a power connection...

We see the emergence of these companies called powered land companies, which are kind of analogous to what went on in the days leading up to LA taking over the Owens Valley’s water supply, where you show up with numbered companies and you buy up locations and no one knows exactly what you’re doing, and it’s all in anticipation of eventually one day someone wanting that and you say, “haha, I’m already here and I’ve already got the rights to access to power here and so if you want to build a data center, away you go,” and we’ve seen there’s a whole host of these so-called powered land companies that have no interest in building data centers.

# Dec 6, 2025

James Wallace: we can now do evals on agent software engineering process. (link)

  1. Come up with a software engineering task

  2. Set up 3 different engineering processes

  3. Run the task with agents (let's say 10 times each)

  4. Compare output

For the first time in human history we can run real experiments on software engineering processes where we have the same project, implemented by the same team of agents, where only the process differs to see which software engineering techniques actually work 🤯

Related: how do we do evals of the Claude instructions that we build (the instruction scaffolding around our agents)? I know at least for Laravel Boost, they set up a bunch of test projects and did manual evals.

# Aug 30, 2025

Writing and reading specs was always too hard for humans. I've written long technical specs. Nobody reads them. Nobody likes to read them. And they are out of date immediately.

LLMs do love to write and read though.

# Aug 26, 2025

With today's models and coding agents, workflow > model

As in, a strong workflow beats a stronger model.

# Aug 26, 2025

There is no way Anthropic isn't doubling down on Claude Code for non coders. It's by far the most effective agent right now, their model works really well with it, and it's just screaming to be adjusted for other use cases.

# Aug 26, 2025

Which frameworks have an official MCP server?

Laravel. Laravel Boost just came out, and it is very good. (link)

  1. You install it.

  2. It picks up the versions of all the Laravel ecosystem tools and packages you use, from your Composer.json file (composer is the php package manager).

  3. Then it provides an MCP server that gives your agent (any of them) access to documentation search (search instructions included), for the correct versions of all the packages you use.

That's just incredibly useful. Which other frameworks have MCP servers for their docs?

# Aug 26, 2025

The pattern I'm at right now with Claude Code for larger features is: first write a spec together, then have it go through the spec and implement.

  1. Specs are easy to read ("keep it concise"), easy to edit, and it's easy to see what they missed.

  2. I ask it to put them in /claude/specs/

  3. If there are database fields to be created, I do think that through myself and tell it ("create a model Post with fields title, published (boolean) and text")

Plus, you have the spec to reference if you want to build on this in the future.

It works incredibly well

# Aug 26, 2025

Gradient text is a fun trick, just added some to my upcoming Model Context Experience site. Claude was, as usual, very helpful.

# Aug 13, 2025

Claude Code system prompt

The Claude Code system prompt here is (as usual) fascinating.

There are a lot of instructions on things it got wrong out of the box, like these:

  1. "Do what has been asked; nothing more, nothing less."

  2. "NEVER create files unless they're absolutely necessary for achieving your goal."

  3. "ALWAYS prefer editing an existing file to creating a new one."

but mostly it's surprising how few instructions there really are.

40% of the prompt are tool use instructions.

They're pretty standard, so let's skip those. There are some interesting bits like

  1. "ALWAYS prefer editing an existing file to creating a new one".

20% are development workflows.

There's a lot of stuff about GIT workflows and testing.

  1. "VERY IMPORTANT: run the lint and typecheck commands if they were provided"

Another 25% are behavioral instructions. Clearly developed while using it. Things like

  1. "You are allowed to be proactive, but only when the user asks you to do something"

Then there are some I found particularly interesting:

  1. "You MUST answer concisely with fewer than 4 lines of text"

  2. "One word answers are best" <- HA!

  3. "Avoid introductions, conclusions, and explanations"

  4. "DO NOT ADD ***ANY*** COMMENTS unless asked"

  5. "You should NOT answer with unnecessary preamble or postamble"

  6. "Do not add additional code explanation summary unless requested"

And finally, a ton of examples.

What I'm thinking:

  1. Reading system prompts is a great way to develop intuition on how to better use these tools.

  2. It's also a great way to get better at writing prompts. (For example, add examples.)

  3. But mostly, I'd LOVE to learn how the evals for this are set up.

# Aug 12, 2025

Select a tech stack for your team, when LLMs are part of your team

I'm hearing from quite a few people that tech stack selection (which technology do you build your new thing on) is being influenced by AI. If a specific technology (say, Python, or React) is popular and widely used, it means that LLMs have been widely trained on them, and are therefore likely to be good at them. And teams are starting to take that into account as a tech stack selection criteria.

It kind of makes sense. Selecting a tech stack is typically done based on both your needs and the existing skillset of your team (who have to use the tech and maintain it). If you think of LLM agents as a growing set of members of your team, it makes sense to take their skills into account.

I wonder when that starts bleeding over into non-engineering use cases?

# Aug 4, 2025

Gemini 2.5 Pro is really good at transcribing large and complex PDFs (it transcribes tables beautifully etc), but it only does the first 10 or so pages. Then it says "Due to the extensive length and dense data in the remaining 400+ pages, a complete verbatim transcription is not feasible here." Is there a better way?

# Jul 24, 2025

Now that we're all figuring out how to use Claude Code, Conductor built a UI that makes it easy to run multiple Claude Code's in parallel. I can totally see multiple UI's being built on Claude Code - it's basically the best agent experience out there right now, but pretty geeky. (link)

# Jul 22, 2025

I do believe Simon coined "vibe scraping" (and he's coined quite a few AI-y words) - vibe coding something that scrapes a website for data. (link)

# Jul 22, 2025

More than the year of agents, this feels like the year of evals.

# Jul 22, 2025

SQLite feels like the perfect working-memory container for agents. Small, self-contained in one file, powerful, well understood by LLMs. LLMs can save stuff in there in between sessions, it's structured, it's powerful, you can take it with you.

# Jul 18, 2025

Reinforcement Learning from Human Feedback, a free online book by Nathan Lambert is a treasure trove of information. As an example, how are chatbots trained on personaility?

# Jul 15, 2025

Simon: "My personal theory is that getting a significant productivity boost from LLM assistance and AI tools has a much steeper learning curve than most people expect." Right now, in coding, that is very true. Then again, engineers are used to investing in their own productivity. They call it DX, developer experience. And they will spend as much time as available customizing their work setup, so customizing Claude Code, as an example, is a natural fit. (link)

I wonder what other professional groups are similar in that way - used to investing lots of time in optimizing their own productivity.

# Jul 15, 2025

Living in Spain, I get a lot of official emails in Spanish, and I'm really enjoying the Gmail Gemini summaries at the top. My Spanish is good but those email threads are often a lot. Same for long emails from the kids' schools. The summaries help me be confident that I didn't miss anything. (link)

# Jul 15, 2025

Simon Willison has been saying (and showing) that this might be a great time to start blogging again, so after 10 years or so, I've revived my blog. I expect I'll write mostly about AI and climate, but we'll see. I had (of course) to build my own blogging software, which took a day or so with a little help from Claude. It's really custom to what I like. (link)

As an example of how I'm using Claude, it's roughly in the "talk to a junior engineer" style that works really well right now. It's fairly specific, but I don't have to write the boring bits myself. As an example, I just wrote this:

Create a migration that inserts the first category in the db called "Other", and adjust the livewire component for creating new posts to select the first (id=1) category as default.

# Jul 15, 2025