A world-first in model compression
Post-quantisation pruning applied to an interleaved MoE architecture.
Selode's proprietary VRAP post-quantisation pruning method solves intelligence collapse.
1,000+ downloads in two weeks. A world-first in model compression.
Aggressive 4-bit quantisation usually breaks reasoning. Models stay fluent but lose the structured logic that makes them useful. We call it intelligence collapse.
Selode's VRAP solves it.
The breakthrough
The first method engineered for interleaved architectures
VRAP is our proprietary post-quantisation pruning method. The compressed model retains the multi-step reasoning of its uncompressed counterpart. No collapse.
What it unlocks
Frontier capability, local hardware.
Frontier-level agentic reasoning on a single 24GB consumer GPU. One card, not two.
Repository-scale coding and long-context workflows. Run locally.
Entirely offline. Full data control. No cloud dependency.
The proof
Shipped to Hugging Face. The community responded.
Two weeks ago we applied VRAP to Qwen-3.6 35B and shipped the 21.2GB Apache 2.0 build on Hugging Face.
Why it matters
Running the other way from hyperscale.
Most of the AI conversation happens at hyperscale. Bigger clusters, bigger budgets, bigger dependencies.
We think the more interesting frontier runs the other way. Making world-class models efficient enough to live where the work actually happens.
On a developer's desktop. Inside a regulated environment. Behind an air-gap. On the edge.
This is our first public release in a pipeline focused on shrinking the hardware tax on frontier AI. More to come.
Get the model
Available now on Hugging Face. Apache 2.0 licensed.
Download on Hugging Face →huggingface.co/selode-ai/Qwen-3.6-35B-A3B-VRAP-4-bit-AWQ-21.2GB
Enterprise enquiries: enquiries@selode.ai