If you've been waiting to fine-tune Google's latest Gemma 4 model on your Mac, someone just made that a whole lot easier.
Developer Matt built a Gemma 4 multimodal fine-tuner specifically for Apple Silicon, and it's now available on GitHub for anyone to fork and tinker with. The project started as something entirely different. About six months ago, he was trying to fine-tune Whisper locally on an M2 Ultra Mac Studio with 15,000 hours of audio data sitting in Google Cloud Storage. Since there was no way to fit all that audio locally, he built a streaming system to pull data from GCS during training.
Then Gemma 3n dropped, so he added support for that. Then he shelved the whole thing. When Gemma 4 launched a few days ago, he dusted it off, broke out the Gemma-specific parts from the Whisper fine-tuning code, and added Gemma 4 support.
The tool exists for a very specific reason: you can't really do audio fine-tuning with MLX right now. Apple's MLX framework is great for a lot of things, but this particular workflow isn't one of them. So instead of waiting around, Matt built the tooling himself.
One honest caveat from the developer: it's very easy to run out of memory when fine-tuning on longer sequences. Even with 64GB of unified memory on his Mac Studio, OOM errors are a constant companion. So if you're working with a machine that has less RAM, expect to keep your sequence lengths short or get creative with your batching.
The project hit Hacker News and picked up solid traction, pulling in 174 points and sparking discussion in the comments. That kind of reception makes sense given the timing. Gemma 4 is generating a lot of interest, and practical tools for working with it locally are still scarce.
For anyone building with multimodal AI and preferring to keep training runs local rather than renting cloud GPUs, this is worth bookmarking. It's not a polished product. It's a side quest that turned into something genuinely useful, and it's open for the community to improve.
The broader signal here is encouraging: the gap between "big lab releases a model" and "you can fine-tune it on your own hardware" keeps shrinking. Tools like this are what make local AI development practical, not just theoretical.