Argonne National Laboratory (ANL), a top-tier research facility in the United States, has embarked on a new initiative to develop a cutting-edge AI model. This new project, named AuroraGPT, aims to be the go-to computational powerhouse for scientists.
ANL is feeding an immense amount of scientific knowledge into the development of this generative AI model. The training ground for AuroraGPT is none other than ANL’s powerful Aurora supercomputer. The computer boasts an impressive performance of over half an exaflop due to Intel’s Ponte Vecchio GPUs, which play a crucial role in computational capabilities.
The collaboration between Intel and ANL extends beyond their partnership. They are working hand-in-hand with various laboratories across the United States and around the world. Together, they are forging the path to turn scientific AI into a tangible reality that will benefit researchers globally, as reported by HPCwire.
Ogi Brkic, Vice President and General Manager for Data Center and HPC Solutions, explained in a press briefing, “It combines all the text, codes, specific scientific results, papers, into the model that science can use to speed up research.”
ScienceGPT or AuroraGPT will have a chatbot interface
Brkic labeled the model ScienceGPT, suggesting it will feature a chatbot interface so as to allow researchers to both pose and answer questions.
The potential applications of chatbots in scientific research are vast, spanning fields such as biology, cancer research, and climate change.
Esta noticia no debería pasar desapercibida @HPE_Cray @Intel @argonne_lcf han desarrollado un LLM entrenado con un billón de parámetros!!!! para investigación científica #AuroraGPT #AuroraGenAI pic.twitter.com/QSa96bKtR3
— AdrIAno Galano (@adriano_galano) May 23, 2023
Training a model with intricate data is a time-consuming process demanding substantial computing resources. Currently, ANL and Intel are in the initial phases of hardware testing before initiating full-scale training for the model.
Although its functionality mirrors ChatGPT, it remains uncertain whether the generative model will be capable of handling multiple modes, such as generating images and videos.
In addition to this, inference will play a significant role as scientists interact with the chatbot and continually input more information into the model, says HPCwire.
Training of AuroraGPT begins
The training of AuroraGPT has recently commenced and is expected to span several months. Currently, the training is confined to 256 nodes but will subsequently expand to encompass all ten thousand nodes of the Aurora supercomputer.
OpenAI has not disclosed the duration of GPT-4 training, which occurs on Nvidia GPUs. In May, Google revealed its ongoing training of the large-language model Gemini, likely using its TPUs.
The primary hurdle in training extensive language models lies in memory requirements. Typically, training necessitates breaking down the process into smaller segments across numerous GPUs.
However, AuroraGPT overcomes this challenge through the utilization of Microsoft’s Megatron/DeepSpeed, ensuring parallel training occurs efficiently, as reported by HPCwire.
Intel and ANL are currently experimenting with the training of a one trillion parameter model using a sequence of sixty-four Aurora nodes.
Today at #ISC23, @codenative announced that Intel has completed the physical delivery of more than 10,000 blades to @argonne for the Aurora supercomputer! The 2 ExaFLOP supercomputer will house more than 20,000 CPUs and more than 60,000 GPUs. https://t.co/rZMuy2SOTb pic.twitter.com/LORSeJOcgn
— Intel Graphics (@IntelGraphics) May 22, 2023