Hacker Newsnew | past | comments | ask | show | jobs | submit | positisop's commentslogin

github.com/NVIDIA/aistore

At the 1 billion valuation from the previous round, achieving a successful exit requires a company with deep pockets. Right now, Nvidia is probably a suitable buyer for MinIO, which might explain all the recent movements from them. Dell, Broadcom, NetApp, etc, are not going to buy them.


Raising 100 mil at 1 B valuation and then trying for an exit is a bitch!


If it is not an Apache/CNCF/LinuxFoundation project, it can be a rug pull aimed at using open source for getting people in the door only. They were never open for commits, and now they have abandoned open source altogether.


Raising 100 mil at 1 B valuation and then trying for an exit is a bitch!


Longhorn is a poorly implemented distributed storage layer. You are better off with Ceph.


have not used longhorn, but we are currently in the process of migrating off of ceph after an extremely painful relationship with it. Ceph has fundamental design flaws (like the way it handles subtree pinning) that, IMO, make more modern distributed filesystems very useful. SeaweedFS is also cool, and for high performance use cases, weka is expensive but good.


That sounds more like a CephFS issue than a Ceph issue.

(a lot of us distrust distributed 'POSIX-like' filesystems for good reasons)


Are there any distributed POSIX filesystems which don’t suck? I think part of the issue is that POSIX compliant filesystem just doesn’t scale, and you are just seeing that?


I think Lustre works fairly well. At the very least, it's used in a lot of HPC centers to handle large filesystems that get hammered by lots of nodes concurrently. It's open source so nominally free although getting a support contract from specialized consulting firm might be pricey.


https://www.reddit.com/r/AMD_Stock/comments/1nd078i/scaleup_...

You're going to have to open the image and then go to the third image. I thought it was interesting that OCI pegs Lustre at 8Gb/s and their high performance FS at much higher than that... 20-80.


That's 8Gb/s per TB of storage. The bandwidth is going to scale up as you add OSTs and OSSs. The OCI FS maxes at 80Gb/s per mount target.


Basically, we are building this at Archil (https://archil.com). The reason these things are generally super expensive is that it’s incredibly hard to build.


weka seems to Just Work from our tests so far, even under pretty extreme load with hundreds of mounts on different machines, lots of small files, etc... Unfortunately it's ungodly expensive.


I've heard Ceph is expensive to run. But maybe that's not true?


Ceph overheads aren't that large for a small cluster, but they grow as you add more hosts, drives, and more storage. Probably the main gotcha is that you're (ideally) writing your data three times on different machines, which is going to lead to a large overhead compared with local storage.

Most resource requirements for Ceph assume you're going for a decently sized cluster, not something homelab sized.


I'm only just wading in, after years of intent. I don't feel like Ceph is particularly demanding. It does want a decent amount of ram. 1GB each for monitor, manager, and metadata, up to 16GB total for larger clusters, according to docs. But then each disk's OSD defaults to 4gb, which can add up fast!! And some users can use more. 10Gbe is recommended and more is better here but that seems not unique to ceph: syncing storage will want bandwidth. https://docs.ceph.com/en/octopus/start/hardware-recommendati...


This from 2023 says: https://www.redhat.com/en/blog/ceph-cluster-single-machine :

> All you need is a machine, virtual or physical, with two CPU cores, 4GB RAM, and at least two or three disks (plus one disk for the operating system).


For me it was the ram for the OSDs, 1GB per 1TB but ideally more for SSDs...


It’s going to do a good job saturating your lan maintaining quorum on the data.


I thought it is a Windows version. Wait, it is a Windows version. /s


NFS is its own spec, that is somewhat compliant with POSIX, and arguably FUSE is POSIX and can be used to implement a POSIX-compliant filesystem.


Please do not make decisions based on this article. It is a poorly written blog with typos and a lack of technical depth. The blog puts Goofys in the same bucket as JuiceFS and Alluxio.. A local NVMe populated via a high-throughput Object Store will give you the best performance. This blog does not go into the system architecture involved that prohibits static models from being pre-populated or the variations in the "FUSE" choices. I can see why AI startups need large amounts of money when the depth of engineering is this shallow.


Glad you wrote it, the title took me down the same path for a few seconds :-D


I think CS grads often skip the part of how something actually works and are happy with abstractions.


Google Cloud Storage had it for eons before S3. GCS comes across as a much better thought-out and built product.


S3 is probably the largest object store in the world. The fact that they can upgrade a system like that to add a feature as complex as read-after-write with no downtime and working across 200+ exabytes of data is really impressive to me.


I really do respect the engineering efforts.

But object stores are embarrassingly parallel, so if such a migration should be possible somewhere without down time, then it is definitely object stores.


Where would you make make the cut that takes advantage of object store parallelism?

That is, at what layer of the stack do you start migrating some stuff to the new strongly consistent system on the live service?

You can't really do it on a per-bucket basis, since existing buckets already have data in the old system.

You can't do it at the key-prefix level for the same reason.

Can't do both systems in parallel and try the new one and fall back to the old one if the key isn't in it, because opens up violations of the consistency rules you're trying to add.

Seems trickier than one might think.


Obviously depends on how they delivered read after write.

Likely they don't have to physically move data of objects, but the layer that writes and reads coordinates based on some version control guarantees e.g in database land MVCC is a prominent paradigm. They'd need a distributed transactional kv store that tells every reader what the latest version of the object is and where to read from.

An object write only acknowledges finished if the data is written and kv store is updated with new version.

They could do this bucket by bucket in parallel since buckets are isolated from each other.


Sure, but whose (compatible) API is GCS using again? Also keep in mind that S3 is creeping up on 20 years old, so backing a change in like that is incredible.


Not just 20 years old - an almost flawless 20 years at massive scale.


It's funny that things that are pinnacles of human engineering exist like this where the general public has no idea it even exists, though they (most likely) use it every single day.


I find red dead redemption 2 more impressive. I don’t know why. It sounds stupid but S3 on the surface has the simplest api and it’s just not impressive to me when compared to something like that.

I’m curious which one is actually more impressive in general.


Simple to use from an external interface yes, the backend is wildly impressive.

Some previous discussion https://news.ycombinator.com/item?id=36900147


AWS has said that the largest S3 buckets are spread over 1 million hard drives. That is quite impressive.


Red dead redemption 2 is likely on over 74 million hard drives.


I think you misunderstood. They're not saying S3 uses a million hard drives, they're saying that there exist some large single buckets that use a million hard drives just for that one bucket/customer!


actually data from more than one customer would be stored on those million drives. But data from one customer is spread over 1 million drives to get the needed IOPs from spinning hard drives.


There's likely over a trillion active SQLite databases in use right now.


> S3 on the surface has the simplest api and it’s just not impressive [...]

Reminded of the following comment from not too long ago.

https://news.ycombinator.com/item?id=43363055


That's the strangest comparison I have seen. What axis are you really comparing here? Better graphics? Sound?


Complexity and sheer intelligence and capability required to build either.


And what is the basis for your claim? You are not impressed by AWS's complexity and intelligence and capability to build and manage 1-2 zettabytes of storage near flawlessly?


Im more impressed by red dead redemption 2 or baldurs gate 3.

There is no “basis” other my gut feeling. Unless you can get quantified metrics to compare that’s all we got. For example if you had lines of code for both, or average IQ. Both would lead towards the “basis” which neither you or I have.


GCS's metadata layer was originally implemented with Megastore (the precursor to Spanner). That was seamlessly migrated to Spanner (in roughly small-to-large "region" order), as Spanner's scaling ability improved over the years. GCS was responsible for finding (and helping to knock out) quite a few scaling plateaus in Spanner.


> GCS comes across as a much better thought-out and built product

I've worked with AWS and GCS for a while, and I have the opposite opinion. GCS is what you get if you let engineers dictate to customers how they are allowed to do work, and then give them shit interfaces, poor documentation, and more complexity than adds value.

There's "I engineered the ultimate engineery thing", and then there's "I made something people actually like using".


Maybe. But Google do have a reputation which makes selecting them for infrastructure a risky endeavor


From my POV Amazon designs its services from a "trust nothing, prepare for the worst case" perspective. Eventual consistency included. Sometimes that's useful and most of the time it's a PITA.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: