Ollama is my preferred choice, but here I want to gather the alternatives I’ve found.
mlx-lm
Repo: https://github.com/ml-explore/mlx-examples/blob/main/llms/mlx_lm/README.md it’s part of MLX:
MLX is an array framework for machine learning research on Apple silicon, brought to you by Apple machine learning research.
pip install mlx-lm
mlx_lm.generate --model mlx-community/Llama-3.2-1B-Instruct-4bit --prompt "What is cold fusion" --temp 0.0 --max-tokens 512
There is also server similar to the one we have on ollama.
The smallest llama on iPhone https://x.com/awnihannun/status/1839330067039887622