VRAP Architecture
VRAP architecture diagram

Post-quantisation pruning applied to an interleaved MoE architecture.

Model Release

Selode's proprietary VRAP post-quantisation pruning method solves intelligence collapse.

1,000+ downloads in two weeks. A world-first in model compression.

Aggressive 4-bit quantisation usually breaks reasoning. Models stay fluent but lose the structured logic that makes them useful. We call it intelligence collapse.

Selode's VRAP solves it.

The breakthrough

The first method engineered for interleaved architectures

VRAP is our proprietary post-quantisation pruning method. The compressed model retains the multi-step reasoning of its uncompressed counterpart. No collapse.

What it unlocks

Frontier capability, local hardware.

Frontier-level agentic reasoning on a single 24GB consumer GPU. One card, not two.

Repository-scale coding and long-context workflows. Run locally.

Entirely offline. Full data control. No cloud dependency.

The proof

Shipped to Hugging Face. The community responded.

Two weeks ago we applied VRAP to Qwen-3.6 35B and shipped the 21.2GB Apache 2.0 build on Hugging Face.

1,000+
downloads in two weeks, and climbing
VRAP demo output
Reasoning output from the VRAP-compressed Qwen-3.6 35B build.

Why it matters

Running the other way from hyperscale.

Most of the AI conversation happens at hyperscale. Bigger clusters, bigger budgets, bigger dependencies.

We think the more interesting frontier runs the other way. Making world-class models efficient enough to live where the work actually happens.

On a developer's desktop. Inside a regulated environment. Behind an air-gap. On the edge.

This is our first public release in a pipeline focused on shrinking the hardware tax on frontier AI. More to come.

Get the model

Available now on Hugging Face. Apache 2.0 licensed.

Download on Hugging Face →

huggingface.co/selode-ai/Qwen-3.6-35B-A3B-VRAP-4-bit-AWQ-21.2GB

Enterprise enquiries: enquiries@selode.ai