GLM 5.2

This is a small update rerunning similar set of tests with new GLM 5.2. I’ve limited tasks to (TOML-1.0, YAML 1.2) x (with-specification, without-specification) x (C++17, zig) and dropped HCL as it is too easy and does not discriminate well.

As before, I’ve used Fireworks API for GLM 5.2 call. Despite having open weights with permissive licence, it’s way too slow to run locally on my setup.

cumulative accuracy

As we can see, GLM 5.2 occupies ‘slightly above Sonnet 4.6, comparable to GPT-5.5’ performance niche. This is most likely the strongest open weight model so far, but I need to test Kimi K2.7 Code

References

Prior notes in this series:

Validator Bench: Next

Specifications and tooling:

TOML 1.0 specification
YAML 1.2 specification
Benchmark source code (llama-sandbox/validation-bench)

Models:

GLM 5.2 announcement
GLM 5.2 on Fireworks
Claude Sonnet 4.6 announcement
GPT-5.5 announcement
Kimi K2.7 Code quickstart