Nine lessons from three rewrites of the audit trail system in TeamPulse, a multi-tenant compliance SaaS: why audit writes must fail open, how URL-derived metadata eliminates per-route boilerplate, why you need two audit tables instead of one, and how to map NIST compliance standards to specific lines of code.
Blog
Technical articles on engineering, research, and leadership.
Two battle-tested workflows for managing a heterogeneous server fleet over Tailscale SSH: a structured system update procedure with per-package reporting, and a comprehensive health investigation that audits services, storage, containers, GRUB, and security across every node.
A structured, read-only workflow for analyzing Google Cloud Platform costs across an entire billing account. Covers BigQuery billing export queries, Cloud Logging and Monitoring cost analysis, idle resource detection, commitment coverage, anomaly detection, and cost forecasting.
A systematic, non-destructive workflow for diagnosing why a Kubernetes service is misbehaving. Covers pod-level diagnostics, networking investigation, node pressure analysis, centralized log correlation, and structured root cause synthesis.
Understanding --cache-ram in llama.cpp: Prompt Caching, Eviction Errors, and Apple Silicon
How llama.cpp's prompt caching mechanism works, why the default 8 GiB allocation caused KV cache eviction errors on my server, and what the parameter actually controls versus what most guides get wrong.
Local Large Language Model Inference on Apple Silicon for Agentic Software Engineering Workflows
An empirical evaluation of local LLM inference on Apple M5 Max with 128 GB unified memory. This study benchmarks eight models across prompt processing and token generation workloads, demonstrating that Mixture-of-Experts architectures achieve 3 to 6x higher throughput than dense models on memory-bandwidth-bound hardware, and presents a production deployment architecture for agentic coding workflows.
A comprehensive reference guide to large language model terminology, covering architecture, quantization, fine-tuning, inference, and the naming conventions you need to navigate the local LLM ecosystem.