xAI cluster is now the most powerful AI training system in the world — but questions remain over storage capacity, power usage and why it’s actually called Colossus
Elon Musk says the new Gen-AI training cluster was built in just 122 days
When you purchase through links on our site, we may earn an affiliate commission.Here’s how it works.
We recently got a glimpse of what$1 billion worth of AI GPUslooks like when Elon Musk shared a brief video tour of Cortex, X’s AI training supercomputer currently under construction at Tesla’s Giga Texas plant.
More recently, Musk took to his social media platform to announce that Colossus, a new 100k H100 training cluster, is now up and running.
Musk claims that Colossus is “the most powerful AI training system in the world” and that it was built “from start to finish” in just 122 days. That’s quite an achievement. Servers for the xAI cluster were reportedly provided byDelland Supermicro, with the cost of the project estimated to be between $3-4 billion.
This weekend, the @xAI team brought our Colossus 100k H100 training cluster online. From start to finish, it was done in 122 days. Colossus is the most powerful AI training system in the world. Moreover, it will double in size to 200k (50k H200s) in a few months. Excellent…September 2, 2024
Where does Colossus get its name?
Tom’s Hardwarenotes, “Although all of these clusters are formally operational and even training AI models, it is entirely unclear how many are actually online today. First, it takes some time to debug and optimize the settings of those superclusters. Second, X needs to ensure that they get enough power, and while Elon Musk’s company has been using 14 diesel generators to power its Memphis supercomputer, they were still not enough to feed all 100,000 H100 GPUs.”
The Colossus system is poised to eventually double in capacity, with plans to incorporate an additional 100,000 GPUs - 50,000 H100 units and 50,000 ofNvidia’s next-gen H200 chips. The supercluster will primarily be used to train xAI’s Grok-3, the company’s latest, most advanced AI model. We’ve yet to see any mention of storage for the new system, but it will need to be huge.
The naming of the new supercomputer has raised more than a few eyebrows, however, with people noting that it shares its name with a 1970 sci-fi movie (based on a 1966 novel by D.F. Jones) about a supercomputer that becomes sentient after being given control of the US nuclear arsenal. Things, predictably, go horribly wrong for humanity.
Both the novel and film explore timely themes of AI autonomy, the dangers of relinquishing control to machines, and the ethical implications of artificial intelligence. It’s possible that Musk wasn’t aware of this when the name was chosen for his new AI training system, and it might have been selected purely to emphasize the sheer scale of the supercluster. Then again, with Musk’s track record, it wouldn’t be surprising if the reference was entirely intentional - he knows exactly what he’s doing.
Are you a pro? Subscribe to our newsletter
Sign up to the TechRadar Pro newsletter to get all the top news, opinion, features and guidance your business needs to succeed!
More from TechRadar Pro
Wayne Williams is a freelancer writing news for TechRadar Pro. He has been writing about computers, technology, and the web for 30 years. In that time he wrote for most of the UK’s PC magazines, and launched, edited and published a number of them too.
New fanless cooling technology enhances energy efficiency for AI workloads by achieving a 90% reduction in cooling power consumption
Samsung plans record-breaking 400-layer NAND chip that could be key to breaking 200TB barrier for ultra large capacity AI hyperscaler SSDs
NYT Strands today — hints, answers and spangram for Sunday, November 10 (game #252)