I Run OpenClaw. Here's What I Hope Microsoft Gets Right.

A few days ago, Omar Shahine posted an announcement on LinkedIn that caught my attention. He announced his new job at Microsoft! His mission? Bring OpenClaw and personal AI agents to Microsoft 365! His TL;DR is that Microsoft just made a Corporate Vice President level bet on the idea that personal agents belong inside the tools hundreds of millions of people use for work every day.

Wow!

I know Omar. I’ve followed him for a long time, far back to his OneDrive and SharePoint days. When he recently shared his personal OpenClaw journey, I was paying attention again. Watching him document his agent setup, the architecture, the week-by-week progress, the honest account of what worked and what did not, that was the kind of public building I respect.

He is a natural fit for his new role. I want him to succeed at this. I genuinely do.

Even so, I have questions, and it seems many others do as well. These questions deserves real answers.

Is OpenClaw and similar personal assistant frameworks ready to touch enterprise data? Did something concrete happen on the security and reliability front, or is this just a big bet on where things are going? Is this Microsoft attempting to catch up on autonomous and personal agents? What exactly changed that made enterprises such as Microsoft and NVIDIA jump so fast onto the OpenClaw rocket that someone else lit?

I have my answers. Not because I read about it, but because I run it.

Meet Falcon

My OpenClaw instance is named Falcon. Falcon and I have built similar tools and skills as Omar’s Lobster. Falcon has its own Google account, manages a calendar and files on my behalf, runs daily tasks to keep me current on life’s activies, manages todos, manages aspects of my businesses, etc. Recently we connected him to Claude Code for extended coding work. We’ve built web interfaces to backend databases he manages. It’s been quite a journey, very exciting, and only just beginning.

Even so, without a doubt, I do not trust it on its own. Not yet. Not close.

That is not a criticism of OpenClaw. It is just the honest reality of where we are. Falcon earns its keep on routine, structured tasks. My morning summaries work well. My mileage tracking reminder works. Scheduled, deterministic tasks with clear outcomes, those are where Falcon delivers.

But ask it to build something more complex and things can get out of hand quickly. It burns through tokens at a pace that makes my eyes water and wallet burn. It forgets skills we built together and needs me to point it back to them. It does not manage context well at all, and since its performance is heavily tied to whatever model it is running on that day, the results can vary in ways that are genuinely frustrating. One minute I am beyond amazed at the results, the next minute I am screaming in agony as to why I have to remind him for the tenth time where a skill lives.

I gave Falcon access to its own workstation, I limited sandboxing on purpose. It made a mess. Not a malicious mess. Just a mess. No innate sense of keeping things clean and organized. Storage is cheap and context windows frame what it even remembers, but ROT is real even for AI. The clutter muddies the waters and muddies the rules you gave it.

Here is the thing I keep coming back to. Falcon works best when I have done the architectural work first. When I define the skill, set up the schedule, build the testing framework. When the structure underneath it is rigid, Falcon executes well. When the structure is loose, Falcon drifts. Similar to Claude Code projects. Spend detailed time on the plan, then let the agent execute the plan. Future models will likely solve these problems to a degree, we just are not there yet.

That means right now, Falcon is less of an autonomous agent and more of a well-trained junior assistant that needs me to be the architect, the context manager, the quality control, the understanding boss, the therapist, and more. That is still valuable, just not the army of personal agents the headlines are describing.

The questions nobody is really answering

So back to the questions. What changed? Why are Microsoft and others moving so decisively?

Honestly, I do not think OpenClaw became secure enough for enterprise. I think Microsoft is betting that they can make it secure enough, and that the race to own that outcome is worth moving on now.

The raw security picture is not reassuring. OpenClaw continues to face serious remote code execution vulnerabilities. It is getting better, while many concerns still exist. There have been hundreds of malicious skills discovered in its ClawHub marketplace. This will be rip ground for bad actors. To be fair, more safeguards are being put in place. Ready for enterprise though? Not within the same solar system for me.

Microsoft’s own security blog said, in plain language, that OpenClaw is not appropriate to run on a standard personal or enterprise workstation. One OpenClaw maintainer noted that if you cannot run a command line, the project is too dangerous for you.

That is not a product ready to touch your corporate email, your files, and your Teams channels.

What NVIDIA recently announced with NemoClaw is a security wrapper, kernel level sandboxing, a policy engine, a privacy router. That should mean Microsoft will route everything through Entra ID, Microsoft Graph, and Intune. I expect Microsoft will not ship OpenClaw. I expect they will ship or offer a controlled, enterprise-hardened version of the idea that OpenClaw proved out.

That is an important distinction. And it is a big unsolved engineering challenge, not a solved one.

The problems I hope Microsoft actually solves

I have been thinking about what it would actually take for me to trust Falcon, or any agent like it, at an organizational level. Here is where I land.

Cost is the gating factor, IFKYK. Right now, token usage at any real scale is expensive enough to limit what agents can do and how often they can do it. If inference becomes cheap enough that Falcon and I can iterate our way to trust without watching a budget meter or breaking the bank, a lot of the other problems become more tractable.

I wouldn’t mind running my own local models, those memory shortages you are hearing about are real!

Microsoft will want to monetize utilization, which means their incentives and mine are not perfectly aligned here.

Memory and context need to get dramatically better. The context window is not just a technical limitation. It is a trust limitation. If an agent cannot reliably remember the rules you gave it, the skills you built together, or the preferences you established last week, your in your last chat, you cannot build trust with it. Trust accumulates over time, through consistent behavior. Agents that reset constantly cannot earn it.

Determinism matters more than people admit. The tasks where Falcon works well are scheduled, structured, and predictable. The tasks where it struggles are open-ended and ambiguous. More deterministic approaches baked into frameworks like OpenClaw will matter a lot for enterprise use cases. This is where I’m spending much of my R&D time.

Agent identity is an unsolved problem. There are startups working on this. It needs to get solved. When an agent is acting on your behalf inside corporate systems, you need to know it is actually your agent, operating within the bounds you set, and that its actions are auditable.

The human problem is the big limiting factor. I would not want my team members, who I know well, and trust greatly, running personal OpenClaw instances with command line access on their workstations, touching enterprise data. Not because my team are bad actors, far from it, rather the combination of an imperfect agent and an untrained user in a high-stakes environment is a recipe for expensive mistakes. Shadow AI in the enterprise is an ongoing threat and attack vector. This is happening and it is already a security problem.

What I actually believe

I am not one of those people who looked at the internet in 1995 and called it a fad. I am not going to look at personal agents and call them hype. Falcon is real. The value I get from it, even imperfectly, is real. The destination that Omar is pointing toward is real. Models are also going to exponentially improve, many of my problems stated above will be solved by even better models.

But I also do not believe that white-collar jobs will be replaced by AI in 18 months (now closer to 16 months, but who’s counting?). I “do” think AI “might” have the raw capability to do many of those jobs in that timeframe. In practice, the “human factor”, the organizational trust, the identity infrastructure, the cost economics, the model reliability, none of that gets solved in “18” months.

What I believe is that we are in the 1995 moment for personal agents. The internet was real then. Many specific predictions about how it would unfold was wrong. The path was messier, slower, and stranger than anyone said it would be. And nobody who called it a fad looks smart today.

Microsoft’s move is significant. Omar’s mission is worth rooting for. But the path from here to a trustworthy personal agent in every knowledge worker’s hands is genuinely unclear, and the unsolved problems are not small ones.

Anyone telling you the path is obvious is trying to sell something.

I am running Falcon every day, watching it earn my trust one task at a time, and waiting to see who figures out the bridge first.

Omar, I hope it is you.