The Fastest Way to Match Characters on ARM Processors?

•April 19, 2026

Daniel Lemire’s blog•Apr 19, 2026

Key Takeaways

•SVE2 `match` replaces multiple equality checks with a single instruction
•Benchmark on Graviton4 shows 16 GB/s throughput, 25 % fewer instructions than NEON
•SVE2 works with existing NEON code, easing migration for ARM developers
•Variable‑length registers let SVE2 adapt to different chip designs without code changes
•Lack of mask‑to‑GPR transfer keeps SVE2 from matching AVX‑512’s mask efficiency

Pulse Analysis

Parsing JSON at gigabyte‑per‑second rates is a core requirement for modern APIs and data pipelines. Traditional ARM NEON SIMD, introduced with the AArch64 architecture, processes 16‑byte vectors and relies on a series of equality checks and table lookups to identify structural characters. While effective, this approach consumes more instructions and can become a bottleneck as data volumes grow. The emergence of Scalable Vector Extension 2 (SVE2) on recent ARM server chips offers a fresh paradigm: variable‑length registers and predicate‑based operations that streamline data classification.

The key innovation is the `match` (and its complement `nmatch`) instruction, which tests each byte of a vector against a small lookup set in a single cycle. In practice, loading the eight JSON structural symbols and four whitespace characters into SVE2 registers allows a single `svmatch_u8` call to generate a 16‑bit predicate mask for a 16‑byte chunk. Benchmarks on an AWS Graviton 4 (Neoverse V2, 2.8 GHz) reveal a throughput of 16 GB/s and an instruction‑per‑byte count of 0.55, a 25 % improvement over the best NEON implementation. The code remains compact, leveraging existing NEON intrinsics for compatibility, and avoids complex tail‑handling logic thanks to SVE2’s predicate masks.

For developers and cloud providers, these gains translate into lower CPU cycles per parsed document, reducing infrastructure costs and enabling higher request rates without scaling hardware. Although Apple’s silicon still lacks SVE2 support, major cloud vendors—Amazon, Google, Microsoft, and NVIDIA—have already deployed SVE2‑capable CPUs, making the instruction set a strategic asset for future‑proofing ARM‑based services. As JSON continues to dominate data interchange, adopting SVE2‑based classification could become a best‑practice for high‑performance parsing libraries.

The fastest way to match characters on ARM processors?

Read Original Article

Comments

Want to join the conversation?

The Fastest Way to Match Characters on ARM Processors?

Key Takeaways

Pulse Analysis

Ask Pulse AI:

Comments

Hardware Pulse