Arm은 최근 Arm Kleidi 기술을 PyTorch 및 ExecuTorch와 통합해 차세대 앱이 Arm CPU에서 대규모 언어 모델(LLM)을 실행할 수 있도록 지원한다고 30일 발표했다.
Kleidi Technology Enables Next-Generation Apps to Run Large Language Models on Arm CPUs
Arm today announced that it is integrating Arm Kleidi technology with PyTorch and ExecuTorch to enable next-generation apps to run large language models (LLMs) on Arm CPUs.
Kleidi combines cutting-edge developer-enabling technologies and core resources to foster technical collaboration across the ML stack, and in the cloud, Kleidi has enhanced PyTorch with the Arm Compute Library (ACL).
Building on this work, Arm has established a blueprint to optimize AI on Arm everywhere, partnering directly with PyTorch and TensorFlow and integrating the Arm Kleidi library, a set of essential Arm kernels that integrate directly into these key frameworks.
Arm’s demo chatbot, based on the Meta Llama 3 large-scale language model and running on AWS Graviton processors, is the first to support real-time chat responses in mainline PyTorch. Based on benchmarks measured on AWS Graviton, the integration of Kleidi’s technology into the open source PyTorch codebase has resulted in a 2.5x reduction in time to first token creation.
We applied optimizations to torch.compile to efficiently leverage Kleidi technology provided through ACL, resulting in AWS Graviton3 performance improvements ranging from 1.35x to up to 2x on various hugging-face model inference workloads.
Cloud use cases illustrate the types of performance acceleration that are possible as Arm works to democratize ML workloads. Arm continues to invest to ensure developers’ AI apps can perform at their best on our technology from cloud to edge, including by making new features backwards compatible so developers can take advantage of them immediately.
“Just four months after its launch, Kleidi is already accelerating development and delivering major AI performance gains on Arm CPUs,” said Alex Spinelli, vice president of developer technologies at Arm. “Arm’s close collaboration with the PyTorch community is a great example of how this technology can dramatically reduce the effort required for developers to take advantage of efficient AI.”
■ Expanding support for server-side AI development Generative AI is also fueling the wave of AI innovation, with new versions of language models being released at an unprecedented pace. Arm is working closely with all the key parts of the ML stack, including cloud service providers like AWS and Google, and the rapidly growing community of ML ISVs like Databricks, to help developers get ahead.
“Arm and Google are committed to making AI more accessible and agile for developers, and Kleidi has made great progress in co-optimizing hardware and software for AI needs,” said Nirav Mehta, senior director of product management, Google Cloud. “With Arm-based AWS Graviton processors supported in ML runtime clusters, enterprises can accelerate a broad range of ML libraries while taking advantage of cloud service provider cost savings,” said Lin Yuan, software engineer at Databricks.
Because it’s important for developers to apply the resources Arm provides to real-world use cases, Arm is creating a demo software stack along with learning paths to better show developers how to build AI workloads on Arm CPUs.
By the end of 2024, ML operationalization and augmented search generation (RAG) will be added to these use cases, with more to follow in 2025, Arm said.
■ Continued efforts to improve performance even at the edge Building on Kleidi’s momentum at the edge, KleidiAI plans to integrate into ExecuTorch, PyTorch’s new on-device inference runtime. This integration is expected to be completed by October 2024 and is expected to significantly improve performance on edge devices across apps currently in production testing or shipping on ExecuTorch.
This joins several KleidiAI integrations Arm has previously announced, including Google’s XNNPACK and MediaPipe, and Tencent’s Hunyuan LLM. You can see more details on the impact on real-world workloads in our chatbot demo.
As Kleidi continues to integrate with PyTorch and ExecuTorch releases along with all other major AI frameworks, developers can immediately run AI workloads on Arm across a range of devices, from cloud data centers to edge devices.
Arm will continue to actively bring improvements to the PyTorch community and plans to focus on further improving performance by providing quantization optimizations for various integer formats in the future.