Yeah, spot on. I had an agent delete some files it shouldn't have as well, similarly to me making the same mistake. I think system prompts should default to using `trash` over `rm`.
For now that's just in my AGENTS.md, and gets honored most of the time.
You can always use something like this [1], which will make sure any file removed on the command line via rm (or other utilities, like git rm) ends up in the trash instead
Amazing observation, and I'm certainly guilty of it too, but it is just way too convenient not to sandbox it, and some tasks right away depend on not being sandboxed.
For anything other than writing code directly in a fully contained git project, where sandboxing might work well, it requires access to system wide tools, user configuration and more.
Occasionally I tell the agent to do everything inside of docker, which works too and it leaves the system alone then mostly, but adds significant overhead and slightly degraded perceived quality / effectiveness.
I think the most important takeaways are to have reliable backup strategies, access control and security mechanisms, which is a win regardless.
Whether by the agent or the human, mistakes happen (like a rm -rf * ran in the wrong directory), and where they would be devastating, there should be other protections than just "hope it won't happen" or "rely on a sandbox to prevent agent error".
Also part of the process as whole. What if someone tries to attach us with insane amount of bandwidth is almost reasonable thought experiment at some point. Now it was this one. Can we handle it? How much could we handle? What is actually reasonable thing we could sustain. All somewhat interesting questions.
Or by using a proxy, yeah. Personally I would still prefer a multi provider harness over CC when using it with another provider, if alone for the visible reasoning, model switcher, cost estimation and so on. So far I've only preferred CC when I needed to work with Jupyter Notebooks because it has built-in tools for that.
Yeah, I used that too on my last Mac. But the page explicitly states the benefits of this approach (preventing it from launching all together without doing anything vs listening for the launch and killing it). It also does not use a menu bar icon, which is also good considering the limited space.
Yes, certainly. I've heard of people that let an agent run on one machine, point a USB Camera at the target and give the agent ssh access and something like imgsnap (cli webcam command) and then let it run autonomously. The agent can then try all sorts of things and also verify the results without asking the user.
I think that's quite a good workflow, giving at least a basic feedback loop for work that can't be tested with just software.
For extra credit on top of the webcam, you can also add a Fingerbot, and a pi pico with either an Ethernet port or a second USB port, so the host machine can talk to the target device as if it was a keyboard, and then also hard power it off via holding the power button if the work wedges the machine.
Sometimes you need to do a specific sequence of power button presses to reach certain states. Eg a laptop may have a specific power button press to perform a warm reboot (instead of a cold boot which you'd get by cutting power), which preserves the previous crash in RAM. This helps you figure out where/why it crashed, instead of having to guess.
I would say if it's only used as the build and publishing device and development happens elsewhere, this would work without problems. 8Gb for building the iOS app and testing on a real device or even an emulator would likely work. Apple's swap is also quite fast.
reply