More

slizard · on June 13, 2022

Hard no. Amazon EFA can barely come close to a dated HPC interconnect from the lower part of the top500 (when it comes to running code that does use the network, e.g. molecular dynamics or CFD), Azure does offer Cray XC or CS (https://azure.microsoft.com/en-us/solutions/high-performance...) which can/will be set up as proper HPC machines with fast interconnects, but I doubt these can be readily rented in the 100s of PFlops size.

Check these talks from the recent ISC EXACOMM workshop if you want to see why HPC machines and HPC computing are an entirely different league compared to traditional data center computing: https://www.youtube.com/watch?v=9PPGvqvWW8s&list=WL&index=9&... https://www.youtube.com/watch?v=q4LkF33YMJ4&list=WL&index=7

slizard · on Jan 26, 2021

In fact, _better_ than Chrome for most websites (exceptions perhaps google's). I used to struggle with Chrome/Chromium hang and chew on CPU cores, crawl to a halt, crash, or just simply run out of memory or a regular basis. Switched to Firefox ~67-69 and such issues are completely gone. I still have significant CPU usage from background tabs occasionally even in Firefox, but the rest of the issues are all nearly inexistent (even though I have on average 30+ tabs open).

eCa · on Jan 26, 2021

My tabs have gotten out of control (they are in the hundreds (due to reasons)), but Firefox handles it quite nice. It only gets restarted after updates, so they are long-lived as well.

Chrome seem to struggle when there are tens of tabs.

foepys · on Jan 26, 2021

I personally know somebody who is currently running a Firefox with 3,000+ (yes, you read that right) tabs as part of their day to day workflow. Firefox just unloads tabs that weren't used for long and cause high memory pressure which is very neat feature for tab addicts.

Kliment · on Jan 26, 2021

This describes me. I currently have 3467 tabs. Firefox is the only browser that can handle this.

gpvos · on Jan 27, 2021

6400+ currently. Firefox does not break a sweat.

bnkamalesh · on Jan 27, 2021

Forget Firefox, my brain can't handle more than 10 tabs 0_0

KamBha · on Jan 27, 2021

I daily drive FF and even develop using it. Everyone else in the company use Chrome. Recently one of our clients complained about a performance issue, I tried to replicate it on my fast PC, but could not replicate. The people who could replicate it were on Chrome, but when they followed the replication steps in FF, the problem disappeared.

Google web applications still not great, but still usable I find. Facebook performance isn't great either.

toyg · on Jan 27, 2021

> Google web applications still not great

Sadly Google will typically "accidentally sabotage" performance of their key apps on Firefox.

slizard · on March 18, 2020

Without HBM or more memory channels, the top SKUs will be rather hard to feed considering the (claimed) at least ~3-5x increase in instructions/s/socket while only increasing memory bandwidth by ~20%.

rbanffy · on March 19, 2020

Maybe they mitigated that by increasing cache sizes, but there is no information available about that yet. I would expect that, since they support 4 threads per core and that may drive up cache usage (even though one thread may be able to be useful while another is waiting for a cache miss to clear).

slizard · on March 15, 2020

> I do agree it's difficult to formulate how to ask for donations (money or cpu-time), you need to seem optimistic otherwise nobody would do it but research takes a huge amount of money and cpu-time and calendar time so there surely is a gap here between expectations and reality..

And there is a tricky balance between over-promising and under-delivering vs being realistic and lacking enough "flashyness" to get funded/thrown money at.

slizard · on March 14, 2020

My colleagues run numerous projects that transfer knowledge between computational research, including molecular simulations, and wetlab research as a routine. Computational and wetlab experiments are reguarly designed to complement each-other. Examples of numerous bio-phyiscs/bio-chemistry research done around me (ranging from research on ion channel dynamics [1] to drug permeability through the skin barrier [2,3], or cryo-em structure refinement just to name a few) strongly contradicts your experience as well as your final statement -- and than we've not even mentioned the drug binding affinity computations which is these days routinely done using molecular dynamics by both academia a pharma industry.

[1] https://doi.org/10.1016/j.celrep.2018.03.10 [2] https://doi.org/10.1016/j.jsb.2018.04.005 [3] https://doi.org/10.1016/j.jconrel.2018.05.026

slizard · on March 14, 2020

It would be nice if NVIDIA donated compute time on one of their large in-house machines the DGX SuperPOD [1] or the DGX SAturn V [2] (#20 and #67 on the last Top500, resp.) for COVID-19 research. Running simulations in a data-center is far more efficient.

[1] https://www.top500.org/system/179691 [2] https://www.top500.org/system/178928

dahart · on March 14, 2020

Sure, agreed. But Folding@Home is setup for home, and both of those machines put together add up to the equivalent of maybe 1k-2k gaming rigs. Putting aside looking for the maximum possible efficiency, crowd sourcing has the potential to scale to many orders of magnitude larger than what can be done in a data center.

slizard · on March 14, 2020

> But Folding@Home is setup for home,

Nothing prevents running the fah client on nodes of a compute cluster -- in fact my colleagues did that (while running a local F@H server), though that was a number of years ago just because they wanted take advantage of the distributed computing facilities provided by the client-server setup and built-in algorithms.

> crowd sourcing has the potential to scale to many orders of magnitude larger than what can be done in a data center.

Potential it does have, but I am skeptical of the "many orders of magnitude" claim ever having a chance to materialize. I'd love to see a cost / benefit analysis on the effective amount of useful work contributed vs the cost of the same in a data center.

dahart · on March 14, 2020

> I am skeptical of the "many orders of magnitude" claim ever having a chance to materialize.

The many orders of magnitude has already materialized https://en.wikipedia.org/wiki/SETI@home#Statistics

“On September 26, 2001, SETI@home had performed a total of 1021 floating point operations. It was acknowledged by the 2008 edition of the Guinness World Records as the largest computation in history.[22] With over 145,000 active computers in the system (1.4 million total) in 233 countries, as of 23 June 2013, SETI@home had the ability to compute over 668 teraFLOPS.[23] For comparison, the Tianhe-2 computer, which as of 23 June 2013 was the world's fastest supercomputer, was able to compute 33.86 petaFLOPS (approximately 50 times greater).”

XMPPwocky · on March 14, 2020

"On September 26, 2001, SETI@home had performed a total of 1021 floating point operations."

Just to clarify, I think HN formatting ate a caret here (now dang can see in the dark!) and it's supposed to be "10 to the 21"; either that or floating point math is much harder than I remember.

pvg · on March 14, 2020

No caret was in the original so none was eaten - it's just a superscript.

rzzzt · on March 14, 2020

Indeed, that would mean a staggering 1.618×10⁻⁵ FLOPS.

Retric · on March 14, 2020

This is 2013 with most of those computers likely even older than 2013 and likely single core vs 3,120,000 brand new cores on the supercomputer. So, in terms of “raw” flops it’s just a question of more and newer hardware on the supercomputer not really better architecture.

dahart · on March 14, 2020

Good point! I guess handwavy-napkin-math that looks like roughly ~2x more flops/core for the Tianhe-2, which is unsurprising for the latest supercomputer vs rando home computer. Maybe even surprising the number isn’t higher...

It’s probably worth noting that Seti@home has partitioned the problem space so that it’s “embarrassingly parallel”, as they say. I think raw flops are a good metric in this case, where raw flops is a very bad metric for some problems.

slizard · on March 15, 2020

We're talking about Folding@Home not any (or all) distributed computing project(s). I've not seen recent stats on the size of the FAH network, so not sure how does it compare to large supercomputer resources, but plots available through an easy google search show that in the past it was never even larger than the biggest machines on the TOP500, let alone multi-oom larger.

(Also, AFAIK garbage flops also count. As far as I recall talkin to a researcher working with FAH a qhile ago, a large-ish fraction of results returned were not usable due to data/files being broken.)

dahart · on March 15, 2020

> We’re talking about Folding@Home not any (or all) distributed computing project(s).

Oh, that wasn’t really clear to me. Anyway, there is a close relationship between the SETI project and the Folding project, so this doesn’t seem like a weird stretch to me to compare them. Whatever SETI@home has achieved is evidence for what Folding@home might achieve.

> plots available through an easy google search show that in the past it was never even larger than the biggest machines on the TOP500

But that has little bearing on either what’s possible in the future, nor why they aren’t scheduling compute time on Summit, right?

Maybe the best answer to your original question is posted in the FAQ section on folding at home.org: see “Why not just use a supercomputer” https://foldingathome.org/support/faq/project-details/

> Also AFAIK garbage flops also count.

That’s always true in these peak performance measurements, both for distributed projects and for supercomputers too. Also true for CPU & GPU peak flops specs.

We can’t actually know how much ‘useful’ work is being done in any case, that depends on all kinds of things like what kind of problem is being solved, how data-parallel the problem is, how well the problem is even understood, what algorithms are being used, what bottlenecks there are on data & IO, whether the implementers used python or CUDA.

I just don’t see any reason why Folding@Home would or should refrain from distributed crowd computing just because supercomputers exist. They can both happen. We don’t need to try to come up with efficiency numbers or compare utility/flop in order to see that Folding@Home is producing some useful research results, right?

Maybe it’s worth pointing out that compute time on TOP500 supercomputers is not free, and Folding@Home is not a job you can run for 1 day or 1 week and be done. Nvidia or IBM could donate some time, but it won’t finish the project, so it might make no sense to donate supercomputer time, just like it makes no sense for Folding@Home to seek out or purchase supercomputer time.

slizard · on March 17, 2020

> But that has little bearing on either what’s possible in the future,

There is certainly a chance that 10-100x more flops will appear on the fah network, but as I said I just don't believe it will happen.

> nor why they aren’t scheduling compute time on Summit, right?

They are. The researchers who get to use the FAH network certainly do have access to traditional supercomputing resources too. Some types of problems require strong scaling (and reliable resources) which requires HPC iron.

> > Also AFAIK garbage flops also count.

> That’s always true in these peak performance measurements, both for distributed projects and for supercomputers too. Also true for CPU & GPU peak flops specs.

I think you're misunderstanding me: I literally meant that the data files returned by the FAH contributors may often be garbage due to data corruption (OC'd cards, no ECC, overheating hardware, poor storage etc.). Not sure to what extent is this still the case today.

> I just don’t see any reason why Folding@Home would or should refrain from distributed crowd computing just because supercomputers exist. They can both happen. We don’t need to try to come up with efficiency numbers or compare utility/flop in order to see that Folding@Home is producing some useful research results, right?

In principle you're right. There are some questions around it, though. Briefly: access is a privilege of few; oversight?; inefficient use of hardware (low Flops/W), just to name a few.

> Maybe it’s worth pointing out that compute time on TOP500 supercomputers is not free,

No, it is not, but access is granted by grant agencies with at least some transparency and oversight as well as scientific review; also, those machine are far more efficient (flops/w).

> and Folding@Home is not a job you can run for 1 day or 1 week and be done.

No, Folding@Home is not the "job"; a "job" at least in HPC sense is a molecular dynamics simulation (or many); a set of such jobs is what is typically required for a project to be completed result of which would end up in a publication. The computational work corresponding to such a project/paper, depending on the size of the machine, can in fact be run in 1 week, and on a large machine even in 1 day.

laegooose · on March 14, 2020

are they comparing SETI@home in 2001 vs supercomputer in 2013? Looks unfair

slizard · on March 15, 2020

On a second thought, projects like the Exscalate4CoV [1] might have a chance to contribute even short-term as they do have the explciit goal to also focus on "Identify virtually and quickly the drugs available, or at an advanced stage of development, potentially effective"

[1] hhttps://www.cineca.it/en/news/exscalate4cov-hpc-platform-win...

wallacoloo · on March 14, 2020

No, they aren’t. Read it again.

In 2001 SETI@home had more flops than the best supercomputer of the time. In 2013, SETI@home had fewer flops than the best supercomputer of the time.

koheripbal · on March 14, 2020

These projects are cool and do contribute to our understanding, but they are not part of the critical path of fighting this pandemic.

The vaccine(s) are already in phase one clinical trial testing, as are some important anti-viral treatments which are in phase three clinical trial testing.

There is no potential output from this folding project that can accelerate those timelines.

slizard · on March 14, 2020

Sure, I'm aware that this is fundamental research (likely not on actual protein folding, but haven't looked at the details), which will probably not have short-term outcomes, not on the timescales required for a 1st gen vaccine (perhaps not even 2nd gen).

That does however not take away from my main point, if anything it is in support of what I was trying to emphasize: such computational research is better suited for a compute cluster.

koheripbal · on March 14, 2020

I guess my point was that corporations devoting large sets of their compute clusters to partake in activity that has no potential to help the current pandemic, isn't going to make much business sense.

slizard · on March 14, 2020

I agree.

In fact, donating in other ways, e.g. giving free programming, HPC etc. education to more researchers can have a higher impact on the long run.

slizard · on March 14, 2020

[Disclaimer: How do I know this stuff? I'm a core developer of GROMACS, one of the major molecular dynamics simulation FOSS community codes. Molecular dynamic is the algorithm/method behind the F@H simulations. I work as a researcher in high performance computing and have (co-)developed many of the parallel algorithms and GPU acceleration in GROMACS.]

20-200x is simply not true. Typically, such numbers are a result of comparing unoptimized CPU code to moderately or well-optimized GPU code which is often misleading. (Such differences are however perfectly reasonable when comparing hardware-accelerated workloads like ML/DL). If you compare actually well-optimized codes, you'll see more like ~4-5x difference in performance for FLOP/instrction-bound code as it is the case for well-optimized molecular dynamics.

Case in point, I recently pointed out the huge difference in CPU performance of two of the top molecular simulation codes, one of which is 8-10x faster on CPUs than the other, solving the same problem [2].

F@H relies on GROMACS as a CPU engine [1] which happens to be the same code as I quoted above as the fast one. The trouble is that F@H has not updated their CPU engine for many years and distribute CPU binaries which lack crucial SIMD optimizations to allow making use of AVX2/AVX512 on modern x86 CPUs as well as the years of algorithmic improvement and code optimization we made. These two factors combined lead to _significantly_ lower F@H CPU performance compared to what they had we're they using a recent GROMACS engine.

Consequently, due to the combination of an inherent performance advantage of GPUs and the severely outdated CPU engine, it is indeed not worth wasting energy with running F@H on CPUs.

[1] https://en.wikipedia.org/wiki/List_of_Folding@home_cores#GRO...

[2] https://twitter.com/twilard/status/1235142089156984832?s=20

Edit 1: adjusted wording to reflect that the performance difference between running outdated GROMACS version and subotimal SIMD optimizations on modern hw can have a range of performance difference, depending on hardware and inputs. Edit 2: fixed typo + formatting.

swinglock · on March 14, 2020

It seems irresponsible to waste donated cycles like that. Does anyone else do it better?

slizard · on March 14, 2020

> Does anyone else do it better?

To that question, assuming by "anyone" here you are asking about other donate-your-cycles-distributed-computing-projects: I am not too familiar with how well-optimized the codes of different @home projects are.

Taking a few steps back, perhaps the efficiency of these codes is the lesser issue and to be honest, in some (many?) cases other forms of donation/contribution may further more scientific progress than simply crunching numbers on one's home PC.

swinglock · on March 14, 2020

Yes, I'm referring to @home projects.

Totally, but if the work done @home is useful, donating compute time makes economical sense I think.

If I'm willing to donate $10 I can either donate money and it may be used to buy $10 worth of compute, with should cover all costs including the hardware and administration.

Or I can donate $10 worth of pure electricity and the other marginals I cover for no or a very small extra cost, since I already own the hardware for other purposes which it's temporarily not used for.

In the latter case the value of my $10 is higher, I theorize. Again, given that the @home project is truly useful.

slizard · on March 15, 2020

Unfortunately typically there is no way to directly donate funds (especially not $10, even if there are 10000 people who'd do that on a monthly basis) to some research (group) of one's choosing. Therefore the choice is whether to donate to an @home project and trust that what they do is meaningful and the donated resources are used in a responsible manner.

In that respect, the responsibility of whether to ask for and how to make good use of donations lies solely on the teams that receive the donation. Without oversight it would however be foolish of them to be overly critical on their own shortcomings as there is a great benefit to having these cheap FLOPS (and good PR) that F@H brings.

I was about to suggest that it would be great to set up a merit-based funding scheme somewhat akin to the governamental funding agencies, but one run independently by the "council of the people". I'm however uncertain how effective could such an organization be at awarding the funding in a responsible and effective manner.

slizard · on March 14, 2020

Can you please elaborate? Are you referring to the F@H CPU "cores" being outdated and inefficient?

swinglock · on March 14, 2020

Yes, if it's so. When donating money or compute time, I want my donation to be used efficiently and not be wasted by high overheads.

slizard · on March 14, 2020

> When donating money or compute time, I want my donation to be used efficiently and not be wasted by high overheads.

Fair point. I think this is something you should bring up with the authors of Folding@Home. I do not work on that project.

My personal view on this is that there is always a cost/benefit balance that one has to strike which is often tricky especially given the considerably constrained resources as it is typically the case in academic computational/simulation tool development.

It is however, as you point out, a great responsibility of the researchers and developers of codes to make sure that choices made and action taken (or not) do not lead to disproportionate waste of resources donated by volunteers or awarded through a grant by research funding agencies.

I do not know the detailed reasons why F@H chose to not update CPU "core" since FahCore_a7 (AFAIK based one GROMACS code from 2014), but it is likely related to the aforementioned cost/benefit analysis done in their team. One of the motivations could have been (just hypothesizing) that the software engineering efforts estimated to be required to update FahCore for CPUs (which generate a very small fraction of the "points") would have taken away resources from the GPU FahCore.

slizard · on March 27, 2018

Hahha, good one. Just that by the time that one dev implements everything, you'll be overtaken by the competition -- at least that's what everyone's fear is, and it's partly warranted.

slizard · on March 27, 2018

Well, those who were willing to pay the ~$150k for the DGX-1 (Volta-16Gb upgraded IIRC), won't necessarily find it too much -- and NVIDIA is after quick money before the markets start slipping down the hyper curve.

You also do get quite some meat compared to the DGX-1: the equivalent of 2x DGX-1 in terms of GPUs and NICs with 4x HBM2 size, plus the NVlink fabric. Plus a more SSD storage, Xeon Platinums to round it off.

Oh, and 350 lbs, let's not forget. :D

Expensive, yes! Will it sell? I'd be very surprised if it didn't!

slizard · on March 13, 2018

Wild guess / conspiracy theory: Intel, afraid of the damage to their image just made worse by diminished performance advantage compared to AMD )due to Meltdown), fearing long-term market loss, quickly found ways to tackle the issue by, instead of pedaling to regain trust, damaging a competitor's image. It seems like a reasonable long game to support and perhaps steer the disclosure of AMD vulnerabilities that CTS-labs had been investigating. Or maybe is was Intel investigating themselves, had some cards up their sleeves, but needed some other entity to do the public disclosure.

Other theories discussed here seem less far-fetched than the above, but in any case, it does smell funny.

HelloNurse · on March 13, 2018

My guess is that some researcher found something and decided to maximize profits, scraping the bottom of the barrel of quasi-vulnerabilities, creatively exaggerating, and bringing in the lawyers, the financiers and the PR weasels needed to throw a scary web site and a misleading "white paper" at AMD. We'll see what CTS works on next.