Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

IMHO there are some serious problems here that won't relate to many situations, and is not really "waste" in the way claimed and will actually probably result in greater spends.

> Memory waste: request - actual usage [0]

Memory "requests" are hints to the kube-scheduler for placement, not a target for expected usage.

> # Memory over-provisioning: limit > 2x request [1]

Memory limits are for enforcement, typically when to call the OOM killer

Niether placement nor oomkilling limits should have anything to do with normal operating parameters.

> The memory request is mainly used during (Kubernetes) Pod scheduling. On a node that uses cgroups v2, the container runtime might use the memory request as a hint to set memory.min and memory.lo [2]

By choosing to label the delta between these two as "waste" you will absolutely suffer from Goodhart's law and you will teach your dev team to not just request, but allocate memory and don't free it so that they can fit inside this invalid metric's assumptions.

It is going to work against the more reasonable goals of having developers set their limits as low as possible without negative effects, while also protecting the node and pod from memory leaks, while still gaining the advantages of over-provisioning, which is where the big gains are to be made.

[0] https://github.com/WozzHQ/wozz/blob/main/scripts/wozz-audit.... [1] https://github.com/WozzHQ/wozz/blob/main/scripts/wozz-audit.... [2] https://kubernetes.io/docs/concepts/configuration/manage-res...



you are technically right that requests are scheduling hints but in a cluster autoscaler world, requests=bill.

If I request 8GB for a pod that uses 1GB, the autoscaler spins up nodes to accommodate that 8GB reservation. That 7GB gap is capacity the company is paying for but cannot use for other workloads.

Valid point on Goodhart's Law, tho the goal shouldn't be fill the RAM, but rather lower the request to match the working set so we can bin-pack tighter.


This script does nothing to solve that and actually exasperates problem of people over specing.

It makes assumptions about pricing[0], when if you do need a peak of 8GB it would force you into launching and consuming that 8GB immediately, because it is just reading a current snapshot from /proc/$pid/status:VmSize [1] and says you are waisting memory if "request - actual usage (MiB)" [2]

What if once a week you need to reconcile and need that 8GB, what if you only need 8GB once every 10 seconds? There script won't see that; so to be defensive you can't release that memory, or you will be 'wasting' resource despite that peek need.

What if your program only uses 1GB, but you are working on a lot of parquet files, and with less ram you start to hit EBS IOPS limits or don't finish the nightly DW run because you have to hit disk vs working from the buffer with headroom etc..

This is how bad metrics wreck corporate cultures, the ones in this case encourage overspending. If I use all that ram I will never hit the "top_offender" list[3] even if I cause 100 extra nodes to be launched.

Without context, and far more complicated analytics "request - actual usage (MiB)" is meaningless, and trivial to game.

What incentive except making sure that your pods request ~= RES 24x7x356 ~= OOM_KILL limits/2, to avoid being in the "top_offender" does this metric accomplish?

Once your skip's-skip's-skip sees some consultant labeled you as a "top_offender" despite your transient memory needs etc... how do you work that through? How do you "prove" that against a team gaming the metric?

Also as a developer you don't have control over the clusters placement decisions, nor typically directly choosing the machine types. So blaming the platform user on the platform teams' inappropriate choice of instance types, while shutting down many chances of collaboration by blaming the platform user typically also isn't a very productive path.

Minimizing cloud spend is a very challenging problem, which typically depends on collaboration more than anything else.

The point is that these scripts are not providing a valid metric, and absolutely presenting that metric in a hostile way. It could be changed to help a discovery process, but absolutely will not in the current form.

[0] https://github.com/WozzHQ/wozz/blob/main/scripts/wozz-audit.... [1] https://github.com/google/cadvisor/blob/master/cmd/internal/... [2] https://github.com/WozzHQ/wozz/blob/main/scripts/wozz-audit.... [3] https://github.com/WozzHQ/wozz/blob/main/scripts/wozz-audit....


Really fair critique regarding the snapshot approach. You're right optimizing limits based on a single point in time is dangerous for bursty workloads the need 8GB for 10 seconds scenario.

The intent of this script isn't to replace long-term metric analysis like Prometheus/Datadog trends, but to act as a smoke test for gross over-provisioning, the dev who requested 16GB for a sidecar that has flatlined at 100MB for weeks.

You make a great point about the hostile framing of the word waste. I definitely don't want to encourage OOM risks. I'll update the readme to clarify that this delta represents potential capacity to investigate rather than guaranteed waste.

Appreciate the detailed breakdown on the safety buffer nuances.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: