A complete guide to deploying local LLM inference on Apple M5 Max with 128 GB unified memory, covering model selection, benchmarking, persistent services, and agentic coding workflows.
LLMApple Siliconllama.cppInfrastructureBenchmarks
Technical articles on engineering, research, and leadership.
A complete guide to deploying local LLM inference on Apple M5 Max with 128 GB unified memory, covering model selection, benchmarking, persistent services, and agentic coding workflows.
A comprehensive reference guide to large language model terminology, covering architecture, quantization, fine-tuning, inference, and the naming conventions you need to navigate the local LLM ecosystem.