Teaching is a terrible example. Teaching is actually more efficient when it is decentralised as the teachers can adapt to local environment and changes. With centralisation you have bad feedback loop.
Training doesn't work like that. Just because a model has been exposed to text in its training data doesn't mean the model will "remember" the details of that text.
Llama 3 was trained on 15 trillion tokens, but I can download a version of that model that's just 4GB in size.
No matter how "big" your model is there is still scope for techniques like RAG if you want it to be able to return answers grounded in actual text, as opposed to often-correct hallucinations spun up from the giant matrices of numbers in the model weights.
GPT-2 was launched in 2019, followed by GPT-3 in 2020, and GPT-4 in 2023. RAG is necessary to bridge informational gaps in between long LLM release cycles.
I was thinking the same thing, and I believe there are trade offs you make in both methods. If you count and group, you just have to pick your own hexcode buckets. So orange and gold are their top two colors in the example, and on our end we would jsut have to decide what range of hex values correlates to orange and what range correlates to gold and what range applies to everything. With deep learning these ranges are effectively learned, so it's more computational and feels like overkill but I can see the benefits.
If I were at work I would probably just choose my buckets and group by the hex counts though, a lot less computation in that and you can get consistent results. If I were having fun I would fit a deep net.
You should repeat to avoid coupling code that are contextually different. You want to avoid too many abstraction. There's no black and white rule. You'll get a feel for when to repeat and when to not.