The latest builds of llama-server now automatically migrate your locally cached models from llama.cpp's own cache directory to HuggingFace's hub cache structure. If you run local AI models and updated recently, your files may have already moved without you asking.
On launch, the server prints a migration warning and relocates everything previously downloaded with the -hf flag from ~/.cache/llama.cpp/ to HuggingFace's standard hub cache path. This is a side effect of HuggingFace's deeper integration with ggml, the tensor library that powers llama.cpp's model inference (the process of actually running a model to generate output). The migration is described as "one-time," but it runs automatically with no opt-out flag, which is the kind of silent file-shuffling that can break scripts, symlinks, and storage setups that point at the old location.
For anyone running llama-server on machines with carefully managed disk space - say, a dedicated SSD partition for models - this matters. If your old cache sat on one drive and HuggingFace's default cache resolves to another, you could end up with duplicated multi-gigabyte model files or, worse, a full boot drive. The fix is straightforward (set HF_HOME or symlink the directories), but you have to know it happened first.
This consolidation makes long-term sense. Having one cache directory for models regardless of which tool downloaded them reduces duplication and simplifies cleanup. But shipping it as an automatic migration with no confirmation prompt is a rough edge. If you maintain local LLM infrastructure, pin your llama.cpp build version until you have verified your storage paths will survive the switch.