























Tags: AI, LLMs, Benchmarking, Jankmarking, Evals, local models
Local LLM performance frontier: Ollama with M5 Pro

I put together some evals for local benchmarking. I'm mostly interested in approximately measuring performance versus speed because I don't like waiting while my macbook imitiates a jet turbine in noise and temperature and want to get good enough answers locally. For anything bigger the cloud is still the move. The problem is my benchmarks are --- to use a human em-dash and quote the kids these days "hella jank".
my janky benchmark questions:
These aren't very good. They're half AI generated. The summarization one is a single sentence and is actually backwards asking for a summary longer than the input sentence... (it used to be longer but I think it got lost at some point when I copy and pasted it. Never made it into version control... )
And yet, these are directionally correct. The better models cluster together scorewise.
For serious work, make sure your bench marks aren't jankmarks.
此内容由惯性聚合(RSS阅读器)自动聚合整理,仅供阅读参考。 原文来自 — 版权归原作者所有。