It says: This constitution is written for our mainline, general-access Claude models. We have some models built for specialized uses that don’t fully fit this constitution; as we continue to develop products for specialized use cases, we will continue to evaluate how to best ensure our models meet the core objectives outlined in this constitution.
I wonder what those specialized use cases are and why they need a different set of values.
I guess the simplest answer is they mean small fim and tools models but who knows ?
Indeed! I'm showing my age, but I do remember using this with Puppet and it was one of my inspirations :D (no commits in nearly 13 years, ouch) https://github.com/devstructure/blueprint
Yes! I always thought that was a very clever project, and was sad when it ceased development. Very excited to try this out, and glad to have stayed on Debian all these years.
I'm really interested to see what happens during the world cup. Won't be surprised if somehow it may end up even bigger of a scandal than Qatar'22. Even if we set immigration and politics aside, heat it going to be an even bigger issue than everyone is anticipating.
Yes missions accomplished, indeed. I am sure there are some good promotions being had by the Chinese and Russian intelligence operatives who successfully convinced enough of the US population to give up on everything that made our country a world leader and choose self-destruction instead. Leftist degrowthers have nothing on reactionary destructionists.
The length of tasks (measured by how long they take human professionals) that generalist frontier model agents can complete autonomously with 50% reliability has been doubling approximately every 7 months for the last 6 years...
Yeah--I wanted a short way to gesture at the subsequent "tasks that are fast for someone but not for you are interesting," and did not mean it as a gotcha on METR, but I should've taken a second longer and pasted what they said rather than doing the "presumably a human competent at the task" handwave that I did.
I agree. After all, benchmarks don't mean much, but I guess they are fine as long as they keep measuring the same thing every time.
Also, the context matter. In my case, I see a huge difference between the gains at work vs those at home on a personal project where I don't have to worry about corporate policies, security, correctness, standards, etc. I can let the LLM fly and not worry about losing my job in record time.
Exactly.
With the Intel-Nvidia partnership signed this September, I expect to see some high-performance single-board computers being released very soon.
I don't think the atx form-factor will survive another 30 years.
I had a Xolo Tegra Note 7 tablet (marketed in the US as EVGA Tegra Note 7) in around 2013. I preordered it as far as I remember. It had a Tegra 4 SoC with quad core Cortex A15 CPU and a 72 core GeForce GPU. Nvidia used to claim that it is the fastest SoC for mobile devices at the time.
To this day, it's the best mobile/Android device I ever owned. I don't know if it was the fastest, but it certainly was the best performing one I ever had. UI interactions were smooth, apps were fast on it, screen was bright, touch was perfect and still had long enough battery backup. The device felt very thin and light, but sturdy at the same time. It had a pleasant matte finish and a magnetic cover that lasted as long as the device did. It spolied the feel of later tablets for me.
It had only 1 GB RAM. We have much more powerful SoCs today. But nothing ever felt that smooth (iPhone is not considered). I don't know why it was so. Perhaps Android was light enough for it back then. Or it may have had a very good selection and integration of subcomponents. I was very disappointed when Nvidia discontinued the Tegra SoC family and tablets.
I'd argue their current CPUs aren't to be discounted either. Much as people love to crown Apple's M-series chips as the poster child of what arm can do, Nvidia's grace CPUs too trade blows with the best of the best.
It leaves one to wonder what could be if they had any appetite for devices more in the consumer realm of things.
reply