Skip to content
  • AI
  • Mobile

Deploying Llama 3.2 on mobile devices with ExecuTorch

January 21 — 2025

Running artificial intelligence models directly on a mobile device is a significant challenge. ExecuTorch represents a major breakthrough in mobile AI by enabling sophisticated models to run directly on iOS and Android devices. Our team tested various practical applications with Meta's Llama 3.2 model, from text generation to translation and structured data extraction.



Installation and configuration

iOS implementation proved relatively straightforward, following official guidelines. Android presented some challenges, requiring the use of a pre-compiled library. A critical configuration step involves specifying the maximum text length (in tokens) that Llama 3.2 can analyze or generate on the mobile device.

Performance and practical considerations

Our tests revealed interesting results:

  • QAT+LoRA models significantly outperform their BF16 counterparts

  • The 3B model offers better performance but requires 12GB of RAM

  • Response times vary from 5 to 30 seconds depending on complexity

  • Power consumption requires special attention

Implications for mobile development

This technology opens new perspectives for mobile applications requiring AI capabilities, particularly in contexts where privacy or the absence of internet connectivity is crucial. However, current hardware constraints still limit its widespread adoption.

00:00
00:00

En français SVP !