mudler/LocalAI – GitClassic

📄 README.md

:bulb: Get help - ❓FAQ 💭Discussions :speechballoon: Discord :book: Documentation website
> 💻 Quickstart 🖼️ Models 🚀 Roadmap 🛫 Examples Try on

LocalAI is the free, Open Source OpenAI alternative. LocalAI act as a drop-in replacement REST API that's compatible with OpenAI (Elevenlabs, Anthropic... ) API specifications for local AI inferencing. It allows you to run LLMs, generate images, audio (and not only) locally or on-prem with consumer grade hardware, supporting multiple model families. Does not require GPU. It is created and maintained by Ettore Di Giacinto.

Local Stack Family

Liking LocalAI? LocalAI is part of an integrated suite of AI infrastructure tools, you might also like:

LocalAGI - AI agent orchestration platform with OpenAI Responses API compatibility and advanced agentic capabilities

LocalRecall - MCP/REST API knowledge base system providing persistent memory and storage for AI agents

🆕 Cogito - Go library for building intelligent, co-operative agentic software and LLM-powered workflows, focusing on improving results for small, open source language models that scales to any LLM. Powers LocalAGI and LocalAI MCP/Agentic capabilities

🆕 Wiz - Terminal-based AI agent accessible via Ctrl+Space keybinding. Portable, local-LLM friendly shell assistant with TUI/CLI modes, tool execution with approval, MCP protocol support, and multi-shell compatibility (zsh, bash, fish)

🆕 SkillServer - Simple, centralized skills database for AI agents via MCP. Manages skills as Markdown files with MCP server integration, web UI for editing, Git synchronization, and full-text search capabilities

Screenshots / Video

Youtube video

Screenshots

Talk Interface Generate Audio

Models Overview Generate Images

Chat Interface Home

Login Swarm

💻 Quickstart

macOS Download:

Note: the DMGs are not signed by Apple as quarantined. See https://github.com/mudler/LocalAI/issues/6268 for a workaround, fix is tracked here: https://github.com/mudler/LocalAI/issues/6244

Containers (Docker, podman, ...)

💡 Docker Run vs Docker Start
> - docker run creates and starts a new container. If a container with the same name already exists, this command will fail.
- docker start starts an existing container that was previously created with docker run.
> If you've already run LocalAI before and want to start it again, use: docker start -i local-ai

CPU only image:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest

NVIDIA GPU Images:

# CUDA 13.0 docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-13 # CUDA 12.0 docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-gpu-nvidia-cuda-12 # NVIDIA Jetson (L4T) ARM64 # CUDA 12 (for Nvidia AGX Orin and similar platforms) docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64 # CUDA 13 (for Nvidia DGX Spark) docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-nvidia-l4t-arm64-cuda-13

AMD GPU Images (ROCm):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-gpu-hipblas

Intel GPU Images (oneAPI):

docker run -ti --name local-ai -p 8080:8080 --device=/dev/dri/card1 --device=/dev/dri/renderD128 localai/localai:latest-gpu-intel

Vulkan GPU Images:

docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-gpu-vulkan

AIO Images (pre-downloaded models):

# CPU version docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-cpu # NVIDIA CUDA 13 version docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-13 # NVIDIA CUDA 12 version docker run -ti --name local-ai -p 8080:8080 --gpus all localai/localai:latest-aio-gpu-nvidia-cuda-12 # Intel GPU version docker run -ti --name local-ai -p 8080:8080 localai/localai:latest-aio-gpu-intel # AMD GPU version docker run -ti --name local-ai -p 8080:8080 --device=/dev/kfd --device=/dev/dri --group-add=video localai/localai:latest-aio-gpu-hipblas

For more information about the AIO images and pre-downloaded models, see Container Documentation.

To load models:

# From the model gallery (see available models with `local-ai models list`, in the WebUI from the model tab, or visiting https://models.localai.io) local-ai run llama-3.2-1b-instruct:q4_k_m # Start LocalAI with the phi-2 model directly from huggingface local-ai run huggingface://TheBloke/phi-2-GGUF/phi-2.Q8_0.gguf # Install and run a model from the Ollama OCI registry local-ai run ollama://gemma:2b # Run a model from a configuration file local-ai run https://gist.githubusercontent.com/.../phi-2.yaml # Install and run a model from a standard OCI registry (e.g., Docker Hub) local-ai run oci://localai/phi-2:latest

⚡ Automatic Backend Detection: When you install models from the gallery or YAML files, LocalAI automatically detects your system's GPU capabilities (NVIDIA, AMD, Intel) and downloads the appropriate backend. For advanced configuration options, see GPU Acceleration.

For more information, see 💻 Getting started, if you are interested in our roadmap items and future enhancements, you can see the Issues labeled as Roadmap here

📰 Latest project news

February 2026: Realtime API for audio-to-audio with tool calling, ACE-Step 1.5 support

January 2026: LocalAI 3.10.0 - Major release with Anthropic API support, Open Responses API for stateful agents, video & image generation suite (LTX-2), unified GPU backends, tool streaming & XML parsing, system-aware backend gallery, crash fixes for AVX-only CPUs and AMD VRAM reporting, request tracing, and new backends: Moonshine (ultra-fast transcription), Pocket-TTS (lightweight TTS). Vulkan arm64 builds now available. Release notes.

December 2025: Dynamic Memory Resource reclaimer, Automatic fitting of models to multiple GPUS(llama.cpp), Added Vibevoice backend

November 2025: Major improvements to the UX. Among these: Import models via URL and Multiple chats and history

October 2025: 🔌 Model Context Protocol (MCP) support added for agentic capabilities with external tools

September 2025: New Launcher application for MacOS and Linux, extended support to many backends for Mac and Nvidia L4T devices. Models: Added MLX-Audio, WAN 2.2. WebUI improvements and Python-based backends now ships portable python environments.

August 2025: MLX, MLX-VLM, Diffusers and llama.cpp are now supported on Mac M1/M2/M3+ chips ( with development suffix in the gallery ): https://github.com/mudler/LocalAI/pull/6049 https://github.com/mudler/LocalAI/pull/6119 https://github.com/mudler/LocalAI/pull/6121 https://github.com/mudler/LocalAI/pull/6060

July/August 2025: 🔍 Object Detection added to the API featuring rf-detr

July 2025: All backends migrated outside of the main binary. LocalAI is now more lightweight, small, and automatically downloads the required backend to run the model. Read the release notes

June 2025: Backend management has been added. Attention: extras images are going to be deprecated from the next release! Read the backend management PR.

May 2025: Audio input and Reranking in llama.cpp backend, Realtime API, Support to Gemma, SmollVLM, and more multimodal models (available in the gallery).

May 2025: Important: image name changes See release

Apr 2025: Rebrand, WebUI enhancements

Apr 2025: LocalAGI and LocalRecall join the LocalAI family stack.

Apr 2025: WebUI overhaul, AIO images updates

Feb 2025: Backend cleanup, Breaking changes, new backends (kokoro, OutelTTS, faster-whisper), Nvidia L4T images

Jan 2025: LocalAI model release: https://huggingface.co/mudler/LocalAI-functioncall-phi-4-v0.3, SANA support in diffusers: https://github.com/mudler/LocalAI/pull/4603

Dec 2024: stablediffusion.cpp backend (ggml) added ( https://github.com/mudler/LocalAI/pull/4289 )

Nov 2024: Bark.cpp backend added ( https://github.com/mudler/LocalAI/pull/4287 )

Nov 2024: Voice activity detection models (VAD) added to the API: https://github.com/mudler/LocalAI/pull/4204

Oct 2024: examples moved to LocalAI-examples

Aug 2024: 🆕 FLUX-1, P2P Explorer

July 2024: 🔥🔥 🆕 P2P Dashboard, LocalAI Federated mode and AI Swarms: https://github.com/mudler/LocalAI/pull/2723. P2P Global community pools: https://github.com/mudler/LocalAI/issues/3113

May 2024: 🔥🔥 Decentralized P2P llama.cpp: https://github.com/mudler/LocalAI/pull/2343 (peer2peer llama.cpp!) 👉 Docs https://localai.io/features/distribute/

May 2024: 🔥🔥 Distributed inferencing: https://github.com/mudler/LocalAI/pull/2324

April 2024: Reranker API: https://github.com/mudler/LocalAI/pull/2121

Roadmap items: List of issues

🚀 Features

🧩 Backend Gallery: Install/remove backends on the fly, powered by OCI images — fully customizable and API-driven.

📖 Text generation with GPTs (llama.cpp, transformers, vllm ... :book: and more)

🗣 Text to Audio

🔈 Audio to Text

🎨 Image generation

🔥 OpenAI-alike tools API

⚡ Realtime API (Speech-to-speech)

🧠 Embeddings generation for vector databases

✍️ Constrained grammars

🖼️ Download Models directly from Huggingface

🥽 Vision API

🔍 Object Detection

📈 Reranker API

🆕🖧 P2P Inferencing

🆕🔌 Model Context Protocol (MCP) - Agentic capabilities with external tools and LocalAGI's Agentic capabilities

🔊 Voice activity detection (Silero-VAD support)

🌍 Integrated WebUI!

🧩 Supported Backends & Acceleration

LocalAI supports a comprehensive range of AI backends with multiple acceleration options:

Text Generation & Language Models

Backend Description Acceleration Support
llama.cpp LLM inference in C/C++ CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, CPU
vLLM Fast LLM inference with PagedAttention CUDA 12/13, ROCm, Intel
transformers HuggingFace transformers framework CUDA 12/13, ROCm, Intel, CPU
MLX Apple Silicon LLM inference Metal (M1/M2/M3+)
MLX-VLM Apple Silicon Vision-Language Models Metal (M1/M2/M3+)

Audio & Speech Processing

Backend Description Acceleration Support
whisper.cpp OpenAI Whisper in C/C++ CUDA 12/13, ROCm, Intel SYCL, Vulkan, CPU
faster-whisper Fast Whisper with CTranslate2 CUDA 12/13, ROCm, Intel, CPU
moonshine Ultra-fast transcription engine for low-end devices CUDA 12/13, Metal, CPU
coqui Advanced TTS with 1100+ languages CUDA 12/13, ROCm, Intel, CPU
kokoro Lightweight TTS model CUDA 12/13, ROCm, Intel, CPU
chatterbox Production-grade TTS CUDA 12/13, CPU
piper Fast neural TTS system CPU
kitten-tts Kitten TTS models CPU
silero-vad Voice Activity Detection CPU
neutts Text-to-speech with voice cloning CUDA 12/13, ROCm, CPU
vibevoice Real-time TTS with voice cloning CUDA 12/13, ROCm, Intel, CPU
pocket-tts Lightweight CPU-based TTS CUDA 12/13, ROCm, Intel, CPU
qwen-tts High-quality TTS with custom voice, voice design, and voice cloning CUDA 12/13, ROCm, Intel, CPU
ace-step Music generation from text descriptions, lyrics, or audio samples CUDA 12/13, ROCm, Intel, Metal, CPU

Image & Video Generation

Backend Description Acceleration Support
stablediffusion.cpp Stable Diffusion in C/C++ CUDA 12/13, Intel SYCL, Vulkan, CPU
diffusers HuggingFace diffusion models CUDA 12/13, ROCm, Intel, Metal, CPU

Specialized AI Tasks

Backend Description Acceleration Support
rfdetr Real-time object detection CUDA 12/13, Intel, CPU
rerankers Document reranking API CUDA 12/13, ROCm, Intel, CPU
local-store Vector database CPU
huggingface HuggingFace API integration API-based

Hardware Acceleration Matrix

Acceleration Type Supported Backends Hardware Support
NVIDIA CUDA 12 All CUDA-compatible backends Nvidia hardware
NVIDIA CUDA 13 All CUDA-compatible backends Nvidia hardware
AMD ROCm llama.cpp, whisper, vllm, transformers, diffusers, rerankers, coqui, kokoro, neutts, vibevoice, pocket-tts, qwen-tts, ace-step AMD Graphics
Intel oneAPI llama.cpp, whisper, stablediffusion, vllm, transformers, diffusers, rfdetr, rerankers, coqui, kokoro, vibevoice, pocket-tts, qwen-tts, ace-step Intel Arc, Intel iGPUs
Apple Metal llama.cpp, whisper, diffusers, MLX, MLX-VLM, moonshine, ace-step Apple M1/M2/M3+
Vulkan llama.cpp, whisper, stablediffusion Cross-platform GPUs
NVIDIA Jetson (CUDA 12) llama.cpp, whisper, stablediffusion, diffusers, rfdetr, ace-step ARM64 embedded AI (AGX Orin, etc.)
NVIDIA Jetson (CUDA 13) llama.cpp, whisper, stablediffusion, diffusers, rfdetr ARM64 embedded AI (DGX Spark)
CPU Optimized All backends AVX/AVX2/AVX512, quantization support

🔗 Community and integrations

Build and deploy custom containers:
https://github.com/sozercan/aikit

WebUIs:
https://github.com/Jirubizu/localai-admin

https://github.com/go-skynet/LocalAI-frontend

QA-Pilot(An interactive chat project that leverages LocalAI LLMs for rapid understanding and navigation of GitHub code repository) https://github.com/reid41/QA-Pilot

Agentic Libraries:
https://github.com/mudler/cogito

MCPs:
https://github.com/mudler/MCPs

OS Assistant:

https://github.com/mudler/Keygeist - Keygeist is an AI-powered keyboard operator that listens for key combinations and responds with AI-generated text typed directly into your Linux box.

Model galleries
https://github.com/go-skynet/model-gallery

Voice:
https://github.com/richiejp/VoxInput

Other:
Helm chart https://github.com/go-skynet/helm-charts

VSCode extension https://github.com/badgooooor/localai-vscode-plugin

Langchain: https://python.langchain.com/docs/integrations/providers/localai/

Terminal utility https://github.com/djcopley/ShellOracle

Local Smart assistant https://github.com/mudler/LocalAGI

Home Assistant https://github.com/drndos/hass-openai-custom-conversation / https://github.com/valentinfrlch/ha-llmvision / https://github.com/loryanstrant/HA-LocalAI-Monitor

Discord bot https://github.com/mudler/LocalAGI/tree/main/examples/discord

Slack bot https://github.com/mudler/LocalAGI/tree/main/examples/slack

Shell-Pilot(Interact with LLM using LocalAI models via pure shell scripts on your Linux or MacOS system) https://github.com/reid41/shell-pilot

Telegram bot https://github.com/mudler/LocalAI/tree/master/examples/telegram-bot

Another Telegram Bot https://github.com/JackBekket/Hellper

Auto-documentation https://github.com/JackBekket/Reflexia

Github bot which answer on issues, with code and documentation as context https://github.com/JackBekket/GitHelper

Github Actions: https://github.com/marketplace/actions/start-localai

Examples: https://github.com/mudler/LocalAI/tree/master/examples/

🔗 Resources

LLM finetuning guide

How to build locally

How to install in Kubernetes

Projects integrating LocalAI

How tos section (curated by our community)

:book: 🎥 Media, Blogs, Social

🆕 LocalAI Autonomous Dev Team Blog Post

Run Visual studio code with LocalAI (SUSE)

🆕 Run LocalAI on Jetson Nano Devkit

Run LocalAI on AWS EKS with Pulumi

Run LocalAI on AWS

Create a slackbot for teams and OSS projects that answer to documentation

LocalAI meets k8sgpt

Question Answering on Documents locally with LangChain, LocalAI, Chroma, and GPT4All

Tutorial to use k8sgpt with LocalAI

🤖 Autonomous Development Team

LocalAI is now helped being maintained (for small tasks!) by a full team of autonomous AI agents led by an AI Scrum Master! This experiment demonstrates how open source projects can leverage AI agents for sustainable, long-term maintenance.

📊 Live Reports: Automatically generated reports

📋 Project Board: Agent task tracking

📝 Blog Post: Learn about the autonomous dev team experiment

Citation

If you utilize this repository, data in a downstream project, please consider citing it with:

@misc{localai, author = {Ettore Di Giacinto}, title = {LocalAI: The free, Open source OpenAI alternative}, year = {2023}, publisher = {GitHub}, journal = {GitHub repository}, howpublished = {\url{https://github.com/go-skynet/LocalAI}},

❤️ Sponsors

Do you find LocalAI useful?

Support the project by becoming a backer or sponsor. Your logo will show up here with a link to your website.

A huge thank you to our generous sponsors who support this project covering CI expenses, and our Sponsor list:

Individual sponsors

A special thanks to individual sponsors that contributed to the project, a full list is in Github and buymeacoffee, a special shout out goes to drikster80 for being generous. Thank you everyone!

🌟 Star history

📖 License

LocalAI is a community-driven project created by Ettore Di Giacinto.

MIT - Author Ettore Di Giacinto

🙇 Acknowledgements

LocalAI couldn't have been built without the help of great software already available from the community. Thank you!

llama.cpp

https://github.com/tatsu-lab/stanford_alpaca

https://github.com/cornelk/llama-go for the initial ideas

https://github.com/antimatter15/alpaca.cpp

https://github.com/EdVince/Stable-Diffusion-NCNN

https://github.com/ggerganov/whisper.cpp

https://github.com/rhasspy/piper

🤗 Contributors

This is a community project, a special thanks to our contributors! 🤗

Backend	Description	Acceleration Support
llama.cpp	LLM inference in C/C++	CUDA 12/13, ROCm, Intel SYCL, Vulkan, Metal, CPU
vLLM	Fast LLM inference with PagedAttention	CUDA 12/13, ROCm, Intel
transformers	HuggingFace transformers framework	CUDA 12/13, ROCm, Intel, CPU
MLX	Apple Silicon LLM inference	Metal (M1/M2/M3+)
MLX-VLM	Apple Silicon Vision-Language Models	Metal (M1/M2/M3+)

Backend	Description	Acceleration Support
whisper.cpp	OpenAI Whisper in C/C++	CUDA 12/13, ROCm, Intel SYCL, Vulkan, CPU
faster-whisper	Fast Whisper with CTranslate2	CUDA 12/13, ROCm, Intel, CPU
moonshine	Ultra-fast transcription engine for low-end devices	CUDA 12/13, Metal, CPU
coqui	Advanced TTS with 1100+ languages	CUDA 12/13, ROCm, Intel, CPU
kokoro	Lightweight TTS model	CUDA 12/13, ROCm, Intel, CPU
chatterbox	Production-grade TTS	CUDA 12/13, CPU
piper	Fast neural TTS system	CPU
kitten-tts	Kitten TTS models	CPU
silero-vad	Voice Activity Detection	CPU
neutts	Text-to-speech with voice cloning	CUDA 12/13, ROCm, CPU
vibevoice	Real-time TTS with voice cloning	CUDA 12/13, ROCm, Intel, CPU
pocket-tts	Lightweight CPU-based TTS	CUDA 12/13, ROCm, Intel, CPU
qwen-tts	High-quality TTS with custom voice, voice design, and voice cloning	CUDA 12/13, ROCm, Intel, CPU
ace-step	Music generation from text descriptions, lyrics, or audio samples	CUDA 12/13, ROCm, Intel, Metal, CPU

Backend	Description	Acceleration Support
stablediffusion.cpp	Stable Diffusion in C/C++	CUDA 12/13, Intel SYCL, Vulkan, CPU
diffusers	HuggingFace diffusion models	CUDA 12/13, ROCm, Intel, Metal, CPU

Backend	Description	Acceleration Support
rfdetr	Real-time object detection	CUDA 12/13, Intel, CPU
rerankers	Document reranking API	CUDA 12/13, ROCm, Intel, CPU
local-store	Vector database	CPU
huggingface	HuggingFace API integration	API-based

Acceleration Type	Supported Backends	Hardware Support
NVIDIA CUDA 12	All CUDA-compatible backends	Nvidia hardware
NVIDIA CUDA 13	All CUDA-compatible backends	Nvidia hardware
AMD ROCm	llama.cpp, whisper, vllm, transformers, diffusers, rerankers, coqui, kokoro, neutts, vibevoice, pocket-tts, qwen-tts, ace-step	AMD Graphics
Intel oneAPI	llama.cpp, whisper, stablediffusion, vllm, transformers, diffusers, rfdetr, rerankers, coqui, kokoro, vibevoice, pocket-tts, qwen-tts, ace-step	Intel Arc, Intel iGPUs
Apple Metal	llama.cpp, whisper, diffusers, MLX, MLX-VLM, moonshine, ace-step	Apple M1/M2/M3+
Vulkan	llama.cpp, whisper, stablediffusion	Cross-platform GPUs
NVIDIA Jetson (CUDA 12)	llama.cpp, whisper, stablediffusion, diffusers, rfdetr, ace-step	ARM64 embedded AI (AGX Orin, etc.)
NVIDIA Jetson (CUDA 13)	llama.cpp, whisper, stablediffusion, diffusers, rfdetr	ARM64 embedded AI (DGX Spark)
CPU Optimized	All backends	AVX/AVX2/AVX512, quantization support