Calamos Supports Greece
GreekReporter.comGreek NewsTechnologyTraining of a Trillion Parameter AI Model Begins

Training of a Trillion Parameter AI Model Begins

Aurora Exascale Supercomputer to Train 1-Trillion Parameter Scientific AI
Aurora Exascale Supercomputer for the training of a trillion parameter scientific AI model. Credit: Michael Linden / Flickr / CC BY-NC 2.0

Argonne National Laboratory (ANL), a top-tier research facility in the United States, has embarked on a new initiative to develop a cutting-edge AI model. This new project, named AuroraGPT, aims to be the go-to computational powerhouse for scientists.

ANL is feeding an immense amount of scientific knowledge into the development of this generative AI model. The training ground for AuroraGPT is none other than ANL’s powerful Aurora supercomputer. The computer boasts an impressive performance of over half an exaflop due to Intel’s Ponte Vecchio GPUs, which play a crucial role in computational capabilities.

The collaboration between Intel and ANL extends beyond their partnership. They are working hand-in-hand with various laboratories across the United States and around the world. Together, they are forging the path to turn scientific AI into a tangible reality that will benefit researchers globally, as reported by HPCwire.

Ogi Brkic, Vice President and General Manager for Data Center and HPC Solutions, explained in a press briefing, “It combines all the text, codes, specific scientific results, papers, into the model that science can use to speed up research.”

ScienceGPT or AuroraGPT will have a chatbot interface

Brkic labeled the model ScienceGPT, suggesting it will feature a chatbot interface so as to allow researchers to both pose and answer questions.

The potential applications of chatbots in scientific research are vast, spanning fields such as biology, cancer research, and climate change.

Training a model with intricate data is a time-consuming process demanding substantial computing resources. Currently, ANL and Intel are in the initial phases of hardware testing before initiating full-scale training for the model.

Although its functionality mirrors ChatGPT, it remains uncertain whether the generative model will be capable of handling multiple modes, such as generating images and videos.

In addition to this, inference will play a significant role as scientists interact with the chatbot and continually input more information into the model, says HPCwire.

Training of AuroraGPT begins

The training of AuroraGPT has recently commenced and is expected to span several months. Currently, the training is confined to 256 nodes but will subsequently expand to encompass all ten thousand nodes of the Aurora supercomputer.

OpenAI has not disclosed the duration of GPT-4 training, which occurs on Nvidia GPUs. In May, Google revealed its ongoing training of the large-language model Gemini, likely using its TPUs.

The primary hurdle in training extensive language models lies in memory requirements. Typically, training necessitates breaking down the process into smaller segments across numerous GPUs.

However, AuroraGPT overcomes this challenge through the utilization of Microsoft’s Megatron/DeepSpeed, ensuring parallel training occurs efficiently, as reported by HPCwire.

Intel and ANL are currently experimenting with the training of a one trillion parameter model using a sequence of sixty-four Aurora nodes.

See all the latest news from Greece and the world at Contact our newsroom to report an update or send your story, photos and videos. Follow GR on Google News and subscribe here to our daily email!

Related Posts