Amazons recent announcement of a new A.I. chip Trainium2
Amazon's New AI Chip Trainium2
Amazon Web Services (AWS) recently unveiled an AI chip named Trainium2 at their Re:Invent conference this week, designed to enhance price performance and energy efficiency when it comes to machine learning training and generative AI applications. Delivering up to four times faster training speeds than its predecessor, EC2 UltraClusters of up to 100,000 chips may be used with it for foundation models training as well as language model development for applications using generative AI technologies.
Amazon Web Services (AWS) has traditionally relied on Nvidia processors to power its EC2 instances, but as AWS pushes the limits of cloud computing further it has begun investing in custom chips designed specifically for machine learning workloads. Last year it unveiled AWS Inferentia which provides fast inference. Now this week comes Trainium2 as their second custom chip designed specifically for model training.
As AI becomes more sophisticated, it requires increasingly larger and complex models for use in generative AI tasks such as chatbots and software code generation. Training these models consumes significant compute resources - typically provided by Nvidia GPUs - but due to supply constraints and rising prices for this hardware solution AWS is transitioning towards Trainium2.
AWS announced that its latest training chip would provide "measurable improvements in both performance and cost" for customers running machine learning workloads on EC2 instances, according to company representatives. It offers up to 30% better compute performance, 50% more cores and 75% greater memory bandwidth compared to its predecessor, Graviton3. Furthermore, UltraClusters featuring up to 100,000 chips could allow training foundation models and large language models quickly.
Amazon reports that their AI chip will run the same software stack as Graviton4 to make it compatible with popular machine learning frameworks, including PyTorch and TensorFlow, according to Amazon. In addition, there will be libraries and developer tools included, and bundles of 16 can be purchased as EC2 Trn2 instances, or scale up to 100,000 chips across next-generation UltraClusters connected via Petabit-scale networking for peak performance of 65 exaflops.
AWS's decision to invest in its own machine learning processors indicates its confidence as a competitor for the emerging market for cloud-based AI services and applications, including those developed using them such as those provided by its old rival Microsoft as well as major companies like Google and Alibaba.
Microsoft recently unveiled Maia and Google TPU to accelerate machine learning workloads; all three products seek to bring them closer to operating systems through multithreading and other hardware acceleration techniques; these options represent ongoing competition among big cloud providers for their share of ML market worth nearly $70 billion.
Comments
Post a Comment