
The breakthrough sets a higher performance ceiling for AI‑driven UI translation, enabling more reliable multilingual experiences on high‑resolution displays and accelerating adoption of automated localization tools.
User‑interface localization has become a critical bottleneck as applications move to 4K and beyond, where tiny buttons and icons must be accurately identified across languages. Traditional models struggle with the sheer pixel density, leading to missed elements and poor translation quality. H Company’s Holo2 series addresses this gap by scaling model parameters and training on diverse GUI datasets, positioning the new Holo2‑235B‑A22B as a flagship solution for developers seeking robust visual grounding.
The core innovation lies in "agentic localization," a multi‑step inference process that refines predictions iteratively. By evaluating an initial guess, then adjusting focus in subsequent passes, the model captures subtle UI cues that single‑shot approaches miss. This methodology translates into a 10‑20% relative boost in accuracy, as evidenced by the jump from 70.6% to 78.5% on Screenspot‑Pro when moving from single‑step to three‑step mode. Such gains not only set a new benchmark but also demonstrate the practical value of iterative reasoning in visual language models.
For enterprises, the release on Hugging Face lowers the barrier to experimentation, allowing teams to integrate cutting‑edge UI grounding into localization pipelines without building models from scratch. The heightened accuracy promises faster time‑to‑market for localized software, reduced manual QA effort, and a more seamless user experience across regions. As the industry embraces higher‑resolution interfaces, models like Holo2‑235B‑A22B will likely become foundational components in the next generation of AI‑powered design tools.
By Ramzi De Coster
Two months since releasing our first batch of Holo2 models, H Company is back with our largest UI‑localization model yet: Holo2‑235B‑A22B Preview. This model achieves a new State‑of‑the‑Art (SOTA) record of 78.5 % on Screenspot‑Pro and 79.0 % on OSWorld G.
The model is available on Hugging Face (Hcompany/Holo2‑235B‑A22B) as a research release focused on UI‑element localization.
High‑resolution 4K interfaces are challenging for localization models. Small UI elements can be difficult to pinpoint on a large display. With agentic localization, however, Holo2 can iteratively refine its predictions, improving accuracy with each step and unlocking 10‑20 % relative gains across all Holo2 model sizes.
Single‑step accuracy: 70.6 % on ScreenSpot‑Pro.
Agent mode (3 steps): 78.5 %, setting a new state‑of‑the‑art on the most challenging GUI grounding benchmark.
Comments
Want to join the conversation?
Loading comments...