Llama3 running locally on iPhone 15 Pro

woadwarrior01 · 2024-04-23T13:15:40

Is this news? I've got a nearly year old app that supports over 2 dozen local LLMs with support for using them with Siri and Shortcuts. I added support for Llama 3 8B the day after it came out and also Eric Hartford's new Llama 3 8B based Dolphin model. All models in it are quantized with OmniQuant. On iOS, 7B and 8B ones are 3-bit quantized and smaller models are 4-bit quantized. On the macOS version all models are 4-bit OmniQuant quantized. 3-bit Omniquant quantization is quite comparable in perplexity to 4-bit RTN quantization that all the llama.cpp based apps use.

https://privatellm.app/

https://apps.apple.com/app/private-llm-local-ai-chatbot/id64...

reply

davedx · 2024-04-23T13:23:00

Nice. What is battery life like under heavy use? I was reading a thread on the llama.cpp repo earlier where they were discussing whether it was possible (or attractive) to add neural engine support in some form.

woadwarrior01 · 2024-04-23T13:34:12

With bigger 7B and 8B models, the battery life goes from a over a day to a few hours on my iPhone 15 Pro.

The 8B model nominally works on 6GB phones but it's quite slow on them. OTOH, it's very usable on iPhone 15 Pro/ Pro Max devices and even better on M1/M2 iPads.

Every framework: llama.cpp, MLX, mlc-llm (which I use) all only use the GPU. Using the ANE and perhaps the undocumented AMX coprocessor for efficient decoder only transformer inference is still an open problem. I've made early some progress on quantised inference using ANE, but there 're still a lot of issues to be solved before it is even demo ready, let alone a shipping product.

reply

davedx · 2024-04-24T09:05:19

Super interesting, thank you!

aNoob7000 · 2024-04-23T12:44:13

I wonder if Apple will bump up the amount of RAM in iPhones due to AI. It seems like most LLMs require a large amount of memory.

They've been stingy on increasing RAM compared to Android phones.

reply

woadwarrior01 · 2024-04-23T13:54:54

So the trend with Apple has been that the SoC from the current generation Pro and Pro Max devices becomes the SoC for the next generation of baseline devices. For instance the iPhone 14 Pro Max and iPhone 15 have the same SoC (A16 Bionic). And this trend holds all the way back to iPhone 12.

It's almost certain that the iPhone 16 will ship with 8GB of RAM. What needs to be seen is whether iPhone 16 Pro and Pro Maxes will ship with 16GB of RAM (Like with high end M1/M2 iPad Pros with >= 1TB SSD).

reply

hu3 · 2024-04-24T17:28:01

So I plotted iPhones RAM and I fail to see a trend that could lead to doubling current RAM to 16GB.

12 Pro Max had 6GB RAM

15 Pro Max has 8GB RAM

16 most likely will have between 8GB and 12GB RAM.

https://i.imgur.com/N7OPjMK.png

reply

MyFirstSass · 2024-04-23T12:49:58

It's one of the most infuriating things about Apple.

The ram tax so absurd it's bordering on criminal, but it also just seems stupid, because if they hadn't put 8gb's of ram in the new smallest macbook air m2 their whole lineup would be more than capable at running local quality LLM's, or double their gaming devices because of their awesome chipset giving them 16gb's of vram essentially, but no, not now when 25% have low ram, ie. no new OS LLM updates.

Also we can't have gaming because half their new sold devices have shit ram, so they also kind of already ditched their "gaming" plan they just got started on a year ago - all because they wan't to push products with ram levels from 10 years ago - bizarre!

They must be betting on local AI as a "pro" feature only.

reply

whywhywhywhy · 2024-04-23T13:06:43

8GB ram models don't exist to be used they exist to be e-waste that gets you to the checkout page where you click 16GB instead.

cqqxo4zV46cp · 2024-04-23T12:57:44

8GB for a premium device in 2024 is a hard ask, completely agree. But I hold absolutely zero hard feelings toward Apple for not catering to gamers as a demographic

Most importantly, though, we are talking about iPhones here. I can’t say I’ve ever thought to myself “gosh, I wish my phone had more RAM!” in…over a decade?

reply

foobarian · 2024-04-23T13:17:45

> But I hold absolutely zero hard feelings toward Apple for not catering to gamers as a demographic

Honestly I'm glad they don't. The PC is the last open platform out there and the last thing I'd want to see is Apple encroaching on it with their walled gardens and carbonite-encased computers.

reply

nnq · 2024-04-23T13:03:36

...so, you haven't used Android in over a decade?

ben_w · 2024-04-23T14:17:49

Last time I cared how much RAM any phone had, iOS or Android, I was working at Augmentra on the ViewRanger app, and we were still supporting older devices with only 256 MB.

That was… *checks CV*… I left in April 2015.

I think RAM is like roads: usage expands to fill available infrastructure/storage.

That an iPhone today has as much RAM as the still-functioning Mid-2013 MacBook Air sitting in a drawer behind me is surprising when compared to the 250-fold growth from my Commodore 64 to my (default) Performa 5200… but it doesn't seem to have actually harmed anything I care about.

reply

nnq · 2024-04-29T11:57:00

I was basically always slowed down by RAM on Android - prob bc I switch between lots of very badly coded apps... so even on desktop I've grown to see RAM as "insurance against badly written code" as in "I'll still be able to run that memory leaky crapware and get what I need done" or in "I'll just spin up a VM for that crap that only runs on that other OS"...

Swimming in badly written SPAs and cordova/whatever hybrid apps is seriously helped by eg 12GB of RAM on a mobile :)

reply

hnfong · 2024-04-23T13:24:07

> > we are talking about iPhones here

MPSimmons · 2024-04-23T12:46:07

I can't wait until Groq or someone else release tiny mobile inference engines specifically for phones and the like.

Largeapplemodel · 2024-04-28T14:42:46

There's already tiny LLMs for this. They're bad. Because it's not enough information to be coherent.

HumblyTossed · 2024-04-23T13:03:38

> They've been stingy on increasing RAM

... in any of their products.

FTFY

reply

paxys · 2024-04-23T13:09:05

Zero chance the marketing department will let them give up the extra $400 or whatever they get to charge for the bare minimum storage and RAM upgrades on all their devices.

freedomben · 2024-04-23T13:28:21

I think it's silly to think the marketing department gets to control the pricing, but it is definitely very true that the "starting at <great price>" is very powerful for them. Even beyond Apple, it warps and distorts the entire laptop field pricing because people who don't understand how inadequate the entry level model is will compare that price to an entry-level model of Lenovo, or Dell, etc and make conclusions. Even on HN I've seen people use the "starting at" price of macs as a way of "proving" that "the Apple tax isn't much."

So yes, there is tremendous marketing value from that low starting price, although I think it's nearing the end of it's usefulness now that even fan sites are starting to call out the inadequacy.

reply

aeyes · 2024-04-23T13:38:45

I don't think that they are inadequate, these devices are perfect for most of my family. They do some calls, messages, a couple of pictures here and there, basic word processing and web browsing but not much more on these devices.

I had a Macbook with 8GB RAM and 256GB disk as my daily driver for work until last year running Docker and my fat IDE without too many issues. It's a similar story with my phone - I bought the bigger storage version because I thought I'd need it but after 3 years of using it I'm still not close to even using 128GB.

reply

freedomben · 2024-04-23T15:24:45

Interesting. what is the browser usage like for most of your family? i.e. how many tabs do they tend to keep open at a time?

yohannparis · 2024-04-23T12:39:16

This is quite impressive to be honest.

The chat is answering at a speed of one word per few/several seconds. But still, this a nice feat.

Example recording for the curious: https://www.youtube.com/watch?v=nZEvUj-QTrI

reply

sestep · 2024-04-23T13:17:01

I'm so spoiled by the quality of modern closed-source models, I laughed out loud when the beginning of its answer just said [duplicate].

bezier-curve · 2024-04-29T03:10:27

On my S24 Ultra, I am seeing it generate several words a second.

mariuz · 2024-04-23T12:44:32