The Best Thing Falcon Does Is Not Think

My first interaction with my OpenClaw instance, Falcon, cost me five dollars. I.e. my hello world conversation. That was eye-opening.

I turned on OpenClaw, typed “Hi,” and started asking basic questions about what was possible. Just getting oriented. Within minutes I’d burned through credits on what amounted to small talk. That was the moment I realized this wasn’t going to work the way I expected.

I had assumed the magic was the conversation. Talk to your agent, tell it what to do, and watch it handle things. That’s the promise, and honestly, that promise is what drew me to OpenClaw in the first place. But after months of building with Falcon, I’ve arrived at a conclusion I didn’t expect: the systems I trust most are the ones where the LLM never fires at all.

I stumbled upon what many others are seeing, often called harness engineering. The LLM is the CPU, or the horse in another analogy. OpenClaw itself is a harness. Understanding what is possible and how best to use these harnesses has been a fascinating journey.

The Conversational Ceiling

My early approach was fully conversational. I started building skills with Falcon the way most people do. News aggregation, calendar management, todo tracking, following market data points, watching specific tech news. I set up cron jobs to handle recurring tasks and assumed they’d hum along in the background.

They didn’t.

The cron jobs started to go off the rails. Responses got strange. Context disappeared. I had built a mileage tracker for work travel. Falcon would remind me every few days to log anything I’d forgotten. It worked beautifully for a while. Then one day I responded to his reminder with a simple entry to add, and the thread devolved. He forgot he had the skill. I had to remind him it existed. He then thought the data was stored in JSON files, not the database we’d set up. We finally got the entry logged, but a task that should have taken ten seconds turned into a frustrating back-and-forth.

This wasn’t a one-time glitch. It was a pattern. Conversations drift. Context decays. Instructions that were followed perfectly on Tuesday get interpreted differently on Thursday. The conversational plane, it turns out, is inherently non-deterministic. Even with tight instructions, you never get the same result twice.

And then there were the early architecture decisions I left up to Falcon. He made a quick mess. JSON files scattered across directories. Scripts in inconsistent locations. When he needed to reference something he’d built the week before, he’d forget where it was and create new files instead. I was spending more time cleaning up after the conversation than I was getting value from it.

Something had to change.

Deterministic First

The shift happened gradually, but the principle I landed on is simple: start every project by asking how to build it deterministically. Only add the LLM when the task genuinely requires reasoning.

This is the opposite of how most tinkerers approach OpenClaw. The natural instinct is to start conversational and optimize later when things break or get expensive. I flipped that. Deterministic infrastructure first. Conversational intelligence layered on top, only where it’s needed.

Here’s what that looks like in practice.

Every new idea starts as a project. A project is a set of scripts, a database schema, and robust instruction files so Falcon always knows what that project was designed to do. One of the first things I built was a dedicated PostgreSQL layer called db-core. Any project that needs data storage uses db-core to interact with the database safely and consistently. No more JSON files scattered everywhere. No more Falcon improvising storage solutions mid-conversation.

On top of each project, we build one or more skills. The skills gives Falcon the ability to interact with the project conversationally when that makes sense. It has its own instruction file explaining what it does and how. When Falcon picks up a conversation about priorities or mileage tracking, the skill instructions get him caught up immediately.

Then comes what I consider the secret weapon: slash commands. A slash command like /priorities or /todos bypasses the LLM entirely. It connects directly to the project scripts, queries the database, formats the response, and posts it into my Telegram channel. No tokens burned. No model interpretation. No drift. Just the data I asked for, formatted by code I trust, delivered almost instantly. The speed difference alone is worth it. When I type /priorities, I get my answer in under a second. When I ask Falcon conversationally, I’m waiting for a model to spin up, interpret my request, figure out which skill to use, and compose a response. That’s seconds of compute time I’m paying for to get the same information.

The Escape Hatches

Two more pieces complete the architecture, and both exist because I learned not to depend on the conversational plane for anything critical.

The first is Falcon Control. This falls along the lines of Mission Control systems many others are building as well. This is a standalone web application that Falcon built and maintains. It provides a UI for accessing all the data and systems we’ve created together. Stats, CRUD operations, project status, everything I might need to see at a glance.

The important part: it requires zero LLM involvement. The entire thing runs on APIs that call scripts and database queries directly. If Falcon goes offline, if OpenClaw crashes, if I’ve burned through my token budget for the current 5 hour window, week, month, etc, Falcon Control still works. I have always-available, mobile and desktop access to my underlying data regardless of whether my agent is running.

I investigated a handful of existing projects that aim to achieve the same outcome. I prefer something tailored to myself, and with the cost of development down to near zero, it was a no-brainer to build my own.

The second is launchd. OpenClaw has its own cron system, and I use it for some things. But for the jobs that absolutely cannot fail, I moved them to macOS launchd jobs. These are completely removed from OpenClaw. They run on the operating system itself. Even though many of these jobs execute scripts that Falcon and I created within our project workspace, the execution is deterministic and independent.

Scanners, crawlers, anything that doesn’t require actual AI reasoning runs on launchd. When a task does need LLM intelligence, the launchd job makes a direct API call with a tightly tailored system and user prompt, keeping token usage minimal.

This means I have a set of jobs that I know are running at all times. It doesn’t matter what my token balance looks like. It doesn’t matter if Falcon is even online. The autonomic functions of my system just work. Sure, Falcon helps me manage and maintain these jobs, they just don’t depend on him to execute.

The Organism

I didn’t plan this architecture. It emerged from months of me hitting walls and finding workarounds. I’m not alone, I’m seeing quite a bit of chatter on this very discussion. Looking at it now, there’s a clear metaphor that captures why it works.

Think about the human body. When you start running, you breathe harder. Your heart rate increases. Your body temperature rises and triggers sweating. None of this requires conscious thought. Your body sends chemical signals, and subsystems respond deterministically. The brain doesn’t manage your heartbeat. It doesn’t decide to digest your lunch. Those are autonomic functions that just run.

The brain gets involved when something novel happens. When you need to make a decision. When you encounter a problem that requires actual reasoning.

That’s what I’ve built with Falcon. The launchd jobs are autonomic functions. The PostgreSQL layer is the circulatory system, moving data reliably wherever it needs to go. Falcon Control is proprioception, awareness of my system’s state without requiring conscious thought. The slash commands are reflexes, fast responses to known stimuli that don’t need to involve higher cognition.

And Falcon himself, the conversational agent, is the conscious mind. Essential for building new things. Essential for novel problems. Essential as my primary interface when I want to talk through an idea or add a quick entry while I’m on the treadmill. But I don’t want my conscious mind managing my heartbeat, and I don’t want my LLM managing my database queries.

What the Industry Calls This

After I’d already built most of this system through trial and error (it’s very hard to stay up with the blistering pace of AI’s evolution), I discovered that the broader AI industry has a name for what I was doing. As I mentioned in the beginning, this is called harness engineering. The idea is that the model isn’t the product. The infrastructure wrapping the model is.

Anthropic, OpenAI, Microsoft, and others are all converging on the same conclusion from the top down. Build deterministic infrastructure. Use the LLM for reasoning, not for plumbing. The agent is the CPU. The harness is the operating system.

I didn’t arrive here by reading their papers (again… how much can we really absorb, yes?). I arrived here by getting a five dollar bill for saying hello and then spending an hour arguing with Falcon about which launchd job to update. But the fact that both the architects building SDKs and one person running a single agent on a Mac Mini are reaching the same answer independently tells me the answer is probably right.

Building With Falcon

Here’s what the actual build process looks like when I start a new project.

I open a chat with Falcon and describe what I want. For my priority tracking system, I told him I wanted a way to submit my three most important work and personal tasks each Monday, get reminded of them every morning, track completions, and maintain historical data. I told him to model the behavior after our existing todos solution, which gave him a proven pattern to follow.

Then I told him to work one step at a time. Build the project first. Test it. Build the skill. Test it. Build the slash command. Test it. Build the Falcon Control UI module. Test it. At each step, we commit our work to git.

If I was strictly in Claude Code, I’d look to use gstack and many of its skills to build out an entire project plan, challenged, tested, designed, etc. Then still build and test one piece at a time. I found with Falcon, such complex plans can be more of a challenge, yet we still discuss the overall concept before we get started, while I hold his hand through each step.

I’d give Falcon a task, walk away for five to ten minutes, come back, review, give feedback, and move on. The whole thing took about two hours of elapsed time, though I wasn’t on it continuously.

This works well because we’ve established patterns. Each new project follows the same architecture. Falcon isn’t starting from scratch every time. He’s applying a template we’ve proven works. The compound benefit of the deterministic-first approach is that every new project gets easier.

But I want to be honest about when it doesn’t work.

Recently I was architecting an upgrade to Falcon Control. We were building a project plan, working out the design before writing code. I gave what I considered an update to the plan. Falcon interpreted it as an instruction to build and went at it full force, missing critical context. The result wasn’t salvageable. We reverted to the previous git commit and started over. This is exactly why every project and skill gets a git repo from day one.

Just today, a slash command stopped returning valid data from an external API. Debugging what changed required me to hold Falcon’s hand through the investigation. Then actually updating the slash command took over an hour of back-and-forth because he couldn’t locate the right code. Last night, changing the frequency of a launchd job turned into an argument. I had to manually look up the job name and convince Falcon of what needed updating.

Falcon is a great builder but a mediocre operator of what he built. He can architect a full-stack system from scratch in an afternoon. But when you need him to go back and maintain, debug, or modify those systems in context, he struggles. He forgets where things are. He argues about what’s deployed. He mistakes planning for a go signal.

This is precisely why the deterministic-first approach matters. You build things to run without the agent because the agent can’t be relied upon to operate them consistently after the fact.

The Honest Assessment

I use Falcon every day. I enjoy building with him. The Telegram interface is genuinely how I interact with most of my systems now, and I wouldn’t want to give that up. When he’s on point, the experience is remarkable. The deterministic code we’ve built together runs reliably. The cron jobs deliver valuable reminders. Calendar integration, todo management, information aggregation, once these things are built and working, they tend to stay working.

But I want to be straight about where things actually stand. I don’t need Falcon. I like what he does. My calendar worked before OpenClaw. My priorities existed in my head or in a notes app. Nothing Falcon does for me is irreplaceable.

What keeps me building is the potential and the trajectory. I’m only limited by my imagination, my time, and my token budget. Each project teaches me something about how to structure the next one. The deterministic infrastructure compounds. And the conversational layer, for all its frustrations, remains the fastest way I’ve found to go from an idea to a working system.

If cost weren’t a constraint, I’d be investigating how to build more agents to manage different aspects of my life. I’m not there yet. It isn’t in my flow of living, and I don’t trust it to produce the results I need. But I can see it from here.

What I’d Tell You Before You Start

If you’re setting up OpenClaw for the first time, or if you’ve been running it for a while and feeling the same friction I described, here’s the approach I wish I’d started with.

Build deterministically first. For every new idea, ask yourself: does this task need the LLM to think, or can a script handle it? If the input is predictable and the output is predictable, it shouldn’t touch the model. Save your tokens for the work that genuinely requires reasoning.

Understand your costs before you start. Your OpenClaw instance lives and dies on model usage. Know what you’re paying per token, set expectations for your monthly spend, and design your architecture to minimize unnecessary inference. Slash commands that bypass the LLM aren’t just faster. They’re free.

Keep your projects structured. Use a shared database layer. Write robust instruction files for every project and skill so the agent can always get caught up on what something was designed to do. Initialize git repos in every working directory because your agent will mess things up and you will need to revert.

Move critical jobs off the conversational plane. If a task absolutely must run on schedule, put it on launchd or whatever your OS-native scheduler provides. OpenClaw cron jobs use prompting, and prompting is never deterministic. Your most important automations deserve infrastructure that doesn’t depend on a model having a good day.

Build one step at a time. Don’t let your agent run ahead. Test each component before moving to the next. Commit at milestones. The five minutes you spend validating each step will save you the hour you’d spend debugging a mess your agent created while you weren’t looking.

And watch your upgrades. Every OpenClaw update seems to break something, usually because security is being hardened. The upgrades are worth it, but go in expecting to fix things afterward.

The conversational plane is powerful. I use it constantly and I’m not suggesting you avoid it. But the best agent system I’ve built is one where the agent mostly isn’t involved at runtime. Falcon was the architect. The deterministic infrastructure is the product. And the LLM earns its tokens only when there’s genuinely something to think about.