Arm
AI & ML interests
Resources, tools and content from Arm and our partner ecosystem that enable you to deploy your workloads quickly, efficiently and securely.
Recent Activity
Arm’s AI development resources ensure you can deploy at pace, achieving best performance on Arm by default. Our aim is to make your AI development easier, ensuring integration with all major operating systems and AI frameworks, enabling portability for deploying AI on Arm at scale.
Discover below some key resources and content from Arm, including our software libraries and tools, that enable you to optimize for Arm architectures and pass-on significant performance uplift for models – from traditional ML and computer vision workloads to small and large language models - running on Arm-based devices.
Arm and Meta: Llama 3.2
Accelerated cloud to edge AI performance
The availability of smaller LLMs that enable fundamental text-based generative AI workloads, such as Llama 3.2 1B and 3B, are critical to enabling AI inference at scale. Running the new Llama 3.2 3B LLM on Arm-powered mobile devices through the Arm CPU optimized kernel leads to a 5x improvement in prompt processing and 3x improvement in token generation, achieving 19.92 tokens per second in the generation phase. This means less latency when processing AI workloads on the device and a far faster overall user experience. Also, the more AI processed at the edge, the more power that is saved from data traveling to and from the cloud, leading to energy and cost savings.
Alongside running small models at the edge, we are also able to run larger models, such as Llama 3.2 11B and 90B, in the cloud. The 11B and 90B models are a great fit for CPU based inference workloads in the cloud that generate text and image, as our data on Arm Neoverse V2 shows. When we run the 11B image and text model on the Arm-based AWS Graviton4, we can achieve 29.3 tokens per second in the generation phase. When you consider that the human reading speed is around 5 tokens per second, it’s far outpacing that.
- Accelerating and Scaling AI Inference Everywhere with New Llama 3.2 LLMs on Arm
- Meta Llama 3.2 Blog
- Meta Llama 3.2 1b/3/b Partner Guide
- How Arm and Meta are Transforming AI Software Development
- Arm AI Software Page
Arm Kleidi: Unleashing Mass-Market AI Performance on Arm
Arm Kleidi is a targeted software suite, expediting optimizations for any framework and enabling accelerations for billions of AI workloads across Arm-based devices everywhere. Application developers achieve top performance by default, with no additional work or investment in new skills or tools training required.
Useful Resources on Arm Kleidi:
- KleidiAI integration with Google's MediaPipe framework
- Arm KleidiAI for optimizing any AI framework: Gitlab repo and blog
- Arm KleidiCV for optimizing any computer vision framework: Gitlab repo and blog
- Arm Compute Library for all AI software
Running LLMs on Mobile
Our foundation of pervasiveness, flexible performance and energy efficiency mean that Arm CPUs are already the hardware of choice for a variety of AI workloads. Alongside Arm-based servers excelling with LLM workloads, the Arm Kleidi software suite, optimizations to our software libraries, combined with the open-source llama.cpp project enable generative AI to run efficiently on mobile devices.
Our work includes a virtual assistant demo which at first utilized Meta’s Llama2-7B LLM on mobile via a chat-based application, and has since expanded to include the Llama3 model and Phi-3 3.8B. You can learn more about the technical implementation of the demos here.
Find out more about the community contributions that make this happen:
These advancements are also highlighted in our Learning Paths below.
AI on Arm in the Cloud
Arm Neoverse platforms give our infrastructure partners access to leading performance, efficiency and unparalleled flexibility to innovate in pursuit of the optimal solutions for emerging AI workloads. The flexibility of the Neoverse platform enables our innovative hardware partners to closely integrate additional compute acceleration into their designs, creating a new generation of built-for-AI custom data center silicon.
Read the latest on AI-on-Neoverse:
- Accelerate Your GenAI, AI and ML Workloads on Arm CPUs
- Accelerating Popular Hugging Face Models using Arm Neoverse
- Small Language Models: Efficient Arm Computing Enables a Custom AI Future
- Best-in-class LLM Performance on Arm Neoverse V1 based AWS Graviton3 CPUs
Arm Learning Paths
Tutorials designed to help you develop quality Arm software faster.
- LLMs on Android with KleidiAI, MediaPipe and XNNPACK
- Run a Large Language model (LLM) chatbot on Arm servers
- Deploy an NLP model using PyTorch on Arm-based device
- Run a local LLM chatbot on a Raspberry Pi 5
- Accelerate Natural Language Processing (NLP) models from Hugging Face on Arm servers
Contribute to our Learning Paths: suggest a new Learning Path or create one yourself with support from the Arm community.
Note: The data collated here is sourced from Arm and third parties. While Arm uses reasonable efforts to keep this information accurate, Arm does not warrant (express or implied) or provide any guarantee of data correctness due to the ever-evolving AI and software landscape. Any links to third party sites and resources are provided for ease and convenience. Your use of such third-party sites and resources is subject to the third party’s terms of use, and use is at your own risk.