I suddenly needed to replace my maxed out M1 Pro Max with a new laptop a couple months ago. The base M2 Air 8GB was the only one that would ship to me quickly. I was reluctant because I figured it would barely be functional with that much memory - before I got rid of my Pro, I loaded up my usual workload and checked how much memory was being used in total, and it was easily more than 8GB.
But I've been doing my usual (Firefox with many tabs, VS Code with a bunch of tabs and a couple windows, a Next.js dev server or two, multiple terminals, sometimes even more apps open) and I haven't noticed even a slight struggle to handle the load, no frame drops or anything.
Of course, it only supports 1 external monitor instead of the 2 I was using with the Pro. I'm sure it would struggle with the heavier VM/docker workloads. But it handles medium demands just fine.
If I'm reading your comment correctly, you've got it backwards. Most people (around 60% in 2017, down from 80% in the 2000s) believe in a JFK assassination conspiracy, and it's never been a majority believing that Oswald acted alone. So the question is: how is it that most people believe in a conspiracy, when the evidence overwhelmingly points towards Oswald's sole guilt?
The answer: by misdirection and lying about the evidence (compounded over decades). Oswald didn't shoot from a "great distance", it was 59 yards and 88 yards for each shot, whereas he had to shoot at distances of 200 to 500 yards when he qualified as a sharpshooter in the Marines.
I almost stopped watching because the middle episodes (of the first season) were just wheel spinning. But the last 3 episodes surprised me, they really redeemed the season in a big way.
A victim of the medium. Prestige TV almost always has too little writing to cover too many episodes. At least with episodic TV you could skip the filler episodes, but when long-form linear TV drips out plot revelations every ten minutes you have to sit through the whole thing.
Are there any other projects/libraries that can run Llama models on Apple Silicon GPU? This is the first one I've seen.
Comparing it to llama.cpp on my M1 Max 32GB, it seems at least as fast just by eyeballing it. Not sure if the inference speed numbers can be compared directly.
vicuna-7b-v0 on Chrome Canary with the disable-robustness flag: encoding: 74.4460 tokens/sec, decoding: 18.0679 tokens/sec = 10.8ms per token
llama.cpp:
$ ./main -m models/7B/ggml-model-q4_0-ggjt.bin -t 8 --ignore-eos = 45 ms per token
tl;dr is that you can pre-process each chunk of your database and use embeddings to quickly look up which chunk is most similar to the user's query, and then prepend that chunk to the user's query before giving it to GPT, so that GPT has the relevant context to give an answer.
Basically, there's an important plot thread that wraps up midway through season 2, and then the quality of the show drops off a cliff immediately. I'd say all the red and yellow episodes are very skippable for casual viewing (technically there's new elements in those episodes that get referenced later, but I would argue that they are very minor, and the boring sitcom/soap opera shenanigans will just prevent you from getting to the better stuff later on). Lynch came back for the finale of season 2 which is one of the best episodes, so watch that, followed by the movie Fire Walk With Me, and then season 3.
Season 3 is pure gold and a much more cohesive vision than the first two (all 18 episodes written and directed by Frost and Lynch), so it's very much worth navigating the prior unevenness.
But I've been doing my usual (Firefox with many tabs, VS Code with a bunch of tabs and a couple windows, a Next.js dev server or two, multiple terminals, sometimes even more apps open) and I haven't noticed even a slight struggle to handle the load, no frame drops or anything.
Of course, it only supports 1 external monitor instead of the 2 I was using with the Pro. I'm sure it would struggle with the heavier VM/docker workloads. But it handles medium demands just fine.