I honestly couldn't tell you, it's gob-smacking to me, has been for so long and I'm so darn curious...I assumed they _certainly_ would have figured it out after setting up a year-long sprint.
riffing out loud:
In 2021 I would talk about "products not papers ", because the gap seemed to be that OpenAI had the ability to iterate on feedback starting 18 months earlier. I don't think that's the case, in that, Google of all companies should have enough from Bard to improve Gemini.
The only thing I can think of left is that it genuinely was a horrible idea for Sundar to coming swinging in, in a rush, in December/Jan, to kneecap Brain (who owned the real grunt work of LLM work) and crown the always-distant always-academic DeepMind.
Like in retrospect, it seems obviously stupid. The first thing you do to prepare for this exisential calvary battle is swap out the people with experience riding horses day to day.
And that would also explain why we're still seeing the same generally bad performance so much later, we're looking at people getting their first opportunity to train at scale for chat, and maybe Bard and Gemini were completely separate groups, so Gemini didn't have the ability to really leverage Bard feedback. (classic Google, one thing is deprecated, the other isn't ready yet)
It really makes me wonder about some of the #s they'd publish in papers and how cherry-picked they were, it was nigh-impossible to replicate the results even with a boatload of curiosity and gumption to try anything - I mean, I didn't systematically try to do a full eval, but...it never, ever, ever, worked even close to consistently the way the papers would make you think it did.
Last thought: I'm kinda shocked they got Gemini out at all, the stuff it was saying in September was horribly off-topic and laughable about 20% of the time.
riffing out loud:
In 2021 I would talk about "products not papers ", because the gap seemed to be that OpenAI had the ability to iterate on feedback starting 18 months earlier. I don't think that's the case, in that, Google of all companies should have enough from Bard to improve Gemini.
The only thing I can think of left is that it genuinely was a horrible idea for Sundar to coming swinging in, in a rush, in December/Jan, to kneecap Brain (who owned the real grunt work of LLM work) and crown the always-distant always-academic DeepMind.
Like in retrospect, it seems obviously stupid. The first thing you do to prepare for this exisential calvary battle is swap out the people with experience riding horses day to day.
And that would also explain why we're still seeing the same generally bad performance so much later, we're looking at people getting their first opportunity to train at scale for chat, and maybe Bard and Gemini were completely separate groups, so Gemini didn't have the ability to really leverage Bard feedback. (classic Google, one thing is deprecated, the other isn't ready yet)
It really makes me wonder about some of the #s they'd publish in papers and how cherry-picked they were, it was nigh-impossible to replicate the results even with a boatload of curiosity and gumption to try anything - I mean, I didn't systematically try to do a full eval, but...it never, ever, ever, worked even close to consistently the way the papers would make you think it did.
Last thought: I'm kinda shocked they got Gemini out at all, the stuff it was saying in September was horribly off-topic and laughable about 20% of the time.