GLM-5.2 as a Coding Agent with OpenCode and LiteLLM

20.06.2026

With the release of GLM-5.2, Zhipu AI has introduced an impressive open-source model specifically designed for long-running tasks. If you don’t want to read for long, you can use the Zhipu AI Chat directly.

GLM-5.2 is particularly well suited for use as a coding agent in OpenCode. The key points:

1M token context
A context of one million tokens has become standard for large models like Claude Opus 4.8 or GPT-5.5, but many open-source models still work with smaller context sizes (e.g. 200k tokens). This allows GLM-5.2 to handle larger codebases, carry out longer implementations, and not lose track despite many tool calls.
Built for coding
On Terminal-Bench 2.1, GLM-5.2 achieves a score of 81.0. This makes it the strongest open-source model, trailing Claude Opus 4.8 (85.0) by only a few points. On SWE-bench Pro it scores 62.1, even surpassing proprietary models like GPT-5.5 (58.6).
Reasoning according to effort level
Through adjustable effort levels (High or Max), the model can be tuned between speed and accuracy depending on the task. For simple tasks a low effort is sufficient (e.g. for an orchestrator), while you can use maximum effort for difficult debugging sessions.
MIT license
GLM-5.2 is available under an MIT license without regional restrictions. This is a significant advantage over many other high-performance models.

Use with OpenCode

OpenCode is an interactive CLI tool for software engineering tasks and supports various LLM backends. GLM-5.2 can be integrated seamlessly as a backend model, since it can also be provided via the Z.ai API (or locally, with appropriate hardware).

But what makes OpenCode truly productive for me are three concepts that work well with the strengths of GLM-5.2:

Skills
Skills are specialized instructions and workflows that OpenCode loads on demand. Instead of overloading the model with a huge system prompt, a skill (e.g. for API design or a security review) is only loaded into the context once the task matches it. This keeps the context lean and uses GLM-5.2’s 1M token headroom specifically for the actual work.
Agents and subagents
Agents are standalone instances with a clearly defined assignment and their own toolset. An agent starts with a fresh context each time, completes its subtask autonomously, and returns a compact summary at the end. This allows things like codebase analysis, research, or writing tests to be cleanly encapsulated without burdening the main context with intermediate results.
Orchestrators with subagents
The real leverage lies in the orchestrator pattern: a higher-level agent breaks down a complex task and delegates the parts to multiple subagents – ideally in parallel. Each subagent works in its own isolated context window and returns only a condensed result. The orchestrator combines these results and decides on the next steps.
This is where GLM-5.2’s tunable reasoning depth particularly pays off: the orchestrator can run at a low effort level, since it mainly coordinates, while compute-intensive subagents use maximum effort when needed. Combined with the large context, the orchestrator keeps track even across many tool calls and multiple subagent runs.

Cost transparency through self-hosted LiteLLM

Another aspect that is crucial for me: I run a self-hosted LiteLLM as a proxy. LiteLLM acts as a unified interface to various LLM providers while offering full cost transparency. Every API call is logged, token usage is broken down, and costs are presented in a traceable way per model, per key, and per request.

This is especially relevant for a model like GLM-5.2, which can consume significant amounts of tokens due to its long contexts and coding-agent scenarios. Through LiteLLM I always have an overview of which model consumes how many tokens and what that costs. It also allows me to flexibly switch between different models and compare costs directly.

Conclusion

GLM-5.2 is a serious step forward for open-source models in the area of coding agents. The combination of the 1M token context, strong coding performance, and MIT license makes it an attractive choice for developers who value independence and transparency. Together with OpenCode as a CLI agent and a self-hosted LiteLLM, you get a setup in which you retain full control over your infrastructure and costs without having to sacrifice performance.

More information about GLM-5.2 can be found in the official blog post; the model weights are available on HuggingFace.