- AI
- Mobile
Deploying Llama 3.2 on mobile devices with ExecuTorch
January 21 — 2025

Running artificial intelligence models directly on a mobile device is a significant challenge. ExecuTorch represents a major breakthrough in mobile AI by enabling sophisticated models to run directly on iOS and Android devices. Our team tested various practical applications with Meta's Llama 3.2 model, from text generation to translation and structured data extraction.
Installation and configuration
iOS implementation proved relatively straightforward, following official guidelines. Android presented some challenges, requiring the use of a pre-compiled library. A critical configuration step involves specifying the maximum text length (in tokens) that Llama 3.2 can analyze or generate on the mobile device.
Performance and practical considerations
Our tests revealed interesting results:
QAT+LoRA models significantly outperform their BF16 counterparts
The 3B model offers better performance but requires 12GB of RAM
Response times vary from 5 to 30 seconds depending on complexity
Power consumption requires special attention
Implications for mobile development
This technology opens new perspectives for mobile applications requiring AI capabilities, particularly in contexts where privacy or the absence of internet connectivity is crucial. However, current hardware constraints still limit its widespread adoption.