AI has already fundamentally changed 2 areas:
- software development (speeds up programmer productivity by several times)
- marketing (creating websites, illustrations, advertisements — both images and video)
If you’re not yet using AI in these 2 areas, you’re behind. This is no longer a toy or an experiment.
Thoughts on Using AI
Almost every paragraph here could be expanded into at least a separate article, but these are brief ideas from the past few months.
Time is nothing — quality is everything: it’s better to use slower models that perform better. It will be faster in the end. This also means it’s more cost-effective to use the most expensive model for medium and complex tasks.
Due to cloud price dumping, there’s no point in buying your own hardware for commercial use (roughly speaking, for writing code; personal data is still better not to share with models, especially free services — they almost certainly have something in their terms about training). It’s better to get subscription plans rather than pay-per-use for the same reason.
If we ignore price, I’d go for about 1TB of GPU for running full-fledged open source models. More isn’t needed — there simply aren’t many such models. Less means compromises due to cost. For reference: a Mac Mini with 96GB RAM, which allows running 80B models and smaller (e.g., qwen3 coder next), costs around 400K rubles, which corresponds to 25 months of an expensive $200/month cloud model subscription. And with a basic $20 subscription, that’s 20 years, even without accounting for electricity and other costs.
Currently the best cloud model is Opus 4.6. The best local model for Mac laptops is gpt-oss:20b. There’s also GLM 4.7 Flash, but it runs too slowly on my hardware in agent mode. I also use translategemma:27b for translations.
Importantly, the Chinese aren’t far behind the Americans. And they open-source a lot. I haven’t seen Russian models in Ollama in the last six months. GigaCode doesn’t feel great, but of course it’s better than nothing.
Surprisingly, CLI agents turned out to be more convenient than VS Code plugins.
I don’t use RAG yet (maybe VS Code plugins use it internally — I mean explicit configuration). Usually it’s enough to explicitly specify 1-2 files.
It’s important to let the model verify its own results so it can fix its mistakes.
Models are good at writing prompts, including prompts for image generation.
You can ask models to respond in JSON/XML (to reuse in subsequent prompts). There’s also a current trend of creating prompts in JSON.
Prompt engineering has more or less stabilized — the rules are already known, practically nothing new is appearing. The main aspect now: provide context and the ability to verify.
A big prompt for a feature, short ones for improvements and bug fixes. For long prompts it’s better to always enable planning mode, for short ones it’s a matter of whether it’ll get it right on the first try or not.
Humans are still responsible for generated code (especially if they’re programmers). So you need to manually polish roughly 20% of the code after generation.
First there were just prompts. Then commands appeared as prompt macros (something like howtos from the 90s). Then MCP appeared as a way to integrate models with the outside world. Now skills have appeared as a combination of prompts, resource files, and scripts. So CLI tools are experiencing a renaissance due to their easy integration with skills and prompts.
A personal agent with a cloud LLM can be run on anything: resource requirements are minimal, only continuous operation matters. I’m planning to set something like this up soon — we’ll see how it goes.
VS Code with plugins (and/or CLI agent variants) can be used not only for programming — it’s a universal tool for working with LLMs with customization capabilities and local files, not just a browser chat.
Models are still imperfect, and from time to time various workarounds are invented (like running them multiple times on the same task, etc.). Sometimes you can automate, sometimes it’s faster to do something manually. Ideally, a skill should contain what you’d need to tell an experienced employee (even if they’re new to the project). It’s better to separate workarounds from necessary instructions.
Model fine-tuning: there’s practically no activity visible in this area. Skill-based configuration is more than sufficient and simpler.
You can run agents with local LLMs (ollama launch). It’s important to set the context to at least 64KB. This is for exotic cases, but it works quite well.
Currently it makes sense to work with multiple agent sessions in parallel: one does something while you prepare another. I usually have 2 sessions, sometimes 3. More than that doesn’t really work out. There are reports online of 5 parallel sessions on a computer and another 10 in the cloud. It would be interesting to understand how they manage that.
There are also local OCR models. I tried them out of curiosity — they work, but there’s not much to use them for.
Speaking of code development, the emergence of AI requires more architectural knowledge and approach from developers than in independent development. You have to correct someone else’s code rather than write it the way you were taught. Also, you have to write instructions on how to write code properly. So the importance of practices and architecture has only grown. AI is a multiplier: of both good and bad.