HN Offline: Intellect-2 Release: The First 32B Model Trained Through Globally Distributed RL

Intellect-2 Release: The First 32B Model Trained Through Globally Distributed RL

Philpax | 201 points | 27day ago | www.primeintellect.ai

throwanem|27day ago

There's a name and a logo. "Hubris" feels slightly beggared. https://en.m.wikipedia.org/wiki/The_Metamorphosis_of_Prime_I...

Extropy_|26day ago

This looks like a startup company. Why shouldn't it have a name and logo?

Philpax|26day ago

Their point is that the name and logo are clearly drawing from the Metamorphosis of Prime Intellect, with all the potential baggage that comes with it. It's an interesting choice.

throwanem|26day ago

The novel was the first popular codifier of the concepts of strongly superhuman ASI and hard-takeoff singularity, literally the work that introduced these ideas to the then quasi-New Atheist hangers-on among the kuro5hin crowd who became the initial core of what would develop into the follower base for singularitarianism. It was quite well written for that purpose, with enough sex and action to paper over the slow parts, and a real grasp of what it feels like when time contracts and dilates at once in those dolly-zoom moments where the universe is different forever and nothing outwardly changes. Combined with the seductive appeal and literally universal scope of the ideas that power its plot, it is no wonder the novel should have left so strong an impression on a few.

Someone intentionally invoking that history is interesting indeed. Someone doing it by accident might be more so. But I already gave that choice the name I judge it deserves.

varelse|26day ago

[dead]

bcoates|26day ago

Maybe torment nexus was taken

refulgentis|27day ago

I guess I'm bearish?

It's not that they trained a new model, but they took an existing model and RL'd it a bit?

The scores are very close to QwQ-32B, and at the end:

"Overall, as QwQ-32B was already extensively trained with RL, it was difficult to obtain huge amounts of generalized improvement on benchmarks beyond our improvements on the training dataset. To see stronger improvements, it is likely that better base models such as the now available Qwen3, or higher quality datasets and RL environments are needed."

fabmilo|27day ago

The interesting delta here is that this proves that we can distribute the training and get a functioning model. The scaling factor is way bigger than datacenters

comex|26day ago

But does that mean much when the training that produced the original model was not distributed?

refulgentis|26day ago

The RL, not the training. No?

itchyjunk|26day ago

RL is still training. Just like pretraining is still training. SFT is also training. This is how I look at it. Models weights are being updated in all cases.

refulgentis|26day ago

Simplifying it down to "adjusting any weights is training, ipso facto this is meaningful" obscures more light than it sheds (as they noted, RL doesn't get you very far, at all)

christianqchung|27day ago

Third party fine tuned open weighted LLMs tend to be good at a handful of benchmarks, but parity or lower on others compared to the original model. There are some exceptions like Nvidia's Nemotron series, but the differences generally are so small as to be imperceptible. Deepseek released finetunes of several Qwen and Llama models alongside R1, and while they were better in some select (mostly math) and coding domains, there were problems resulting from fine tuning that didn't result in them overtaking the original models in usage.

cess11|26day ago

Seems that's mostly a byproduct from working on the core business idea, GPU arbitrage.