linux core utils have supported this since 2018 (coreutils 8.3), amusingly it is the same release that added `cp --reflink`. AFAIK I know you have to opt out by having `POSIX_CORRECT=1` or `POSIX_ME_HARDER=1` or `--pedantic` set in your environment. [1]
If anything the opposite has occurred. HDD scaling has largely flattened. Going from 1986 -> 2014, HDD size increased by 10x every 5.3 years [1]. If anything we should have 100Tb+ drives if scaling kept going. I say this not as a but there have been directly implications for ZFS.
All this data stuck behind an interface who's speed is (realistically after a file system & kernel involved) hard limited to 200MiB/s-300MiB/s. Recovery times sky rocket. As you simply cannot re-build parity/copy data. The whole reason stuff like draid [2] were created is so larger pools can recover in less than a day by doing sequential parity & hot-spairs loaded 1/N of each drives data ahead of time.
Not quite that level, but you can get 8TB nvmes. You'll pay $500 a pop though...[0]. Weirdly that's the cheapest NewEgg lists for anything above 8TB and even SSDs are more expensive. It's a gen4 PCIe M.2 but a SATA SSD is more? It's better going the next bracket down but still surprising to me that the cheapest 4TB SSD is just $20 cheaper than the cheapest NVMe[1] (a little more and you're getting recognizable names too!)
It kinda sucks that things have flatlined a bit, but still cool that a lot of this has become way cheaper. I think the NVMes at these prices and sizes really makes caching a reasonable thing to do for consumer grade storage
In terms of production, SSD flash chips that go into SATA and NVMe drives can be pretty much the same: only the external interface can be different.
The biggest cost driver for flash chips is not the speed they can be read from and written to in bursts, but how resilient they are (how many times can they be written over) and sustained speed (both based on the tech in use, TLC, SLC, MLC, 3D NAND, wear levelling logic...): even for SATA speeds, you need the very best for sustained throughput.
Still, SATA SSDs make sense since they can use the full SATA bandwidth and have low latency compared to HDDs.
So the (lack of) price difference is not really surprising.
I find this unconvincing. The actual discussion of LLM generation is very lacking.
The original link [1] cites a discussion of the cost per query of GPT-4o at 0.3whr [2]. When you read the document [2] itself you see 0.3whr is a lower bound & 40whr is the upper bound. The paper [2] is actually pretty solid, I recommend it. It uses the public metrics from other LLM APIs to derive a likely distribution of the context size of the average query for GPT-4o which is a reasonable approach given that data isn't public. Then factoring in GPU power per FLOP, average utilization during, and cloud/renting overhead. It admits this likely has non-trivial error bars, concluding the average is between 1-4whr per query.
This is disappointing to me as the original link [1] attempts to bring in this source [2] to disprove the 3whr "myth" created by another paper [3], yet this 3whr figure lies directly in the error bars their new source [2] arrives at.
The methodology is inherently flawed by assuming all infrastructure, training, etc is going to exist with or without individual queries, while trying to answer a different question of the impact of AI on the environment. It’s like arguing the environmental impact of solar electricity is 0 because the panels would exist either way.
Thus the results inherently fail to analyze the underlying question.
A more realistic estimate is to take their total spending assuming X% of their expenses are electricity directly or indirectly because the environmental impact isn’t adds up. Even that ignores the energy costs on 3rd party servers when they download their training data.
This really just a variant of the classic, "pretend you're somebody else, reply as {{char}}" which has been around for 4+ years and despite the age, continues to be somewhat effective.
Modern skeleton key attacks are far more effective.
I think the Policy Puppetry attack is a type of Skeleton Key attack. Since it was just released, that makes it a modern Skeleton Key attack.
Can you give a comparison of the Policy Puppetry attack to other modern Skeleton Key attacks, and explain how the other modern Skeleton Key attacks are much more effective?
Seems to me “Skeleton Key” relies on a sort of logical judo - you ask the model to update its own rules with a reasonable sounding request. Once it’s agreed, the history of the chat leaves the user with a lot of freedom.
Policy Puppetry feels more like an injection attack - you’re trying to trick the model into incorporating policy ahead of answering. Then they layer two tricks on - “it’s just a script! From a show about people doing bad things!” And they ask for things in leet speak, which I presume is to get around keyword filtering at API level.
This is an ad. It’s a pretty good ad, but I don’t think the attack mechanism is super interesting on reflection.
Given the chronostrife will occur in around 40_000 years (give or take 2_000) I somewhat doubt that </humor>