Hacker Newsnew | past | comments | ask | show | jobs | submit | owlbite's commentslogin

I think calling VLIW "an adandoned design" is somewhat of an exaggeration, such architectures are pretty common for embedded audio processing.


Worth adding on that note:

From JAX to VLIW: Tracing a Computation Through the TPU Compiler Stack, https://patricktoulme.substack.com/p/from-jax-to-vliw-tracin...

Google’s Training Chips Revealed: TPUv2 and TPUv3, HotChips 2020, https://hc32.hotchips.org/assets/program/conference/day2/Hot...

Ten Lessons From Three Generations Shaped Google’s TPUv4i, ISCA 2021, https://gwern.net/doc/ai/scaling/hardware/2021-jouppi.pdf


Thanks, that JAX writeup was interesting.


Sure. I did mention DSPs. But how many people write code for DSPs?


x86-64 SSE and AVX are also SIMD


SIMD and VLIW are somewhat similar but very different in the end.


True.

The ISA in this Anthropic machine is actually both, VLIW and SIMD, and both are relevant to the problem.


This book provides a high level overview of many methods without (on a quick skim) really hinting at the practical usage. Basically this reads as a encyclopedia to me, whereas Nocedal and Wright is more of an introductory graduate course going into significantly more detail on a smaller selection of algorithms (generally those that are more commonly used).

Picking on what I'd consider one of the major workhorse methods of continous constrained optimization, Interior Point Methods get a 2-3 page super high level summary in this book. Nocedal and Wright give an entire chapter on the topic (~25 pages) (which of course still is probably insufficient detail to implement anything like a competitive solver).


It's a bit like the old Numerical Recipes book in that regard.

(but better)


But it can be even worse than that. It's "we assassinated the phone", "algorithm says vehicle has suspicious travel history and must die". There's no real thinking human in the loop for some of this stuff, just some model decided the metadata has a high probability of being associate with an opponent of some flavor and then everyone in the vicinity is blown to bits as computer said kill.


Very true, but a lot of stuff builds on a few core optimized libraries like BLAS/LAPACK, and picking up a build of those targeted at a modern microarchitecture can give you 10x or more compared to a non-targeted build.

That said, most of those packages will just read the hardware capability from the OS and dispatch an appropriate codepath anyway. You maybe save some code footprint by restricting the number of codepaths it needs to compile.


They just label such people as Applied Mathematicians, or worse: Physicists and Engineers; and then get back to sensible business such as algebraic geometry, complex analysis and group theory.


Introduction to PhD study: "How hard can it be, I'm sure I could write that in a week"


I thought GPLv3 adoption by GCC was what really lit the flames on moving to llvm by commercial entities?


you only need to worry about GPLv3 if you are modifying gcc in source and building it and distributing that. Just running gcc does not create a GPLv3 infection. And glibc et al are library licensed so they don't infect what you build either, most especially if you are not modifying its source and rebuilding it.


And what we've seen from e.g. Apple is that "make a private fork and only distribute binaries" is exactly what they wanted the whole time.


you only need to worry about GPLv3 if you are modifying gcc in source and building it and distributing that.

That's the context here. If you build a new compiler based on GCC, GPL applies to you. If you build a new compiler based on LLVM it doesn't.


the context here doesn't actually specify whether we are talking about companies using llvm sources to create proprietary compilers (or maybe integrated with a proprietary IDE) or using llvm to quickly bootstrap and craft a compiler for a new processor, new language, etc., where they will distribute the source to the compiler anyway

but such a compiler or IDE would not GPLv3 infect it's users' target sources and binaries.


The main problem with GPLv3 specifically from the perspective of various commercial vendors is the patent clause.


Still some companies try hard to avoid GPLv3, see Apple, who either provide old GPLv2 licensed software or invest in BSD/MIT replacements.


You might know this history better than me.


What I suspect he really means is that FORTRAN lays out its arrays column-major, whilst C choose row-major. Historically most math software was written in the former, including the de facto standard BLAS and LAPACK APIs used for most linear algebra. Mix-and-matching memory layouts is a recipe for confusion and bugs, so "mathematicians" (which I'll read as people writing a lot of non-ML matrix-related code) tend to prefer to stick with column major.

Of course things have moved on since then and a lot of software these days is written in languages that inherited their array ordering from C, leading to much fun and confusion.

The other gotcha with a lot of these APIs is of course 0 vs 1-based array numbering.


> is written in languages that inherited their array ordering from C

It’s not just C. Modern GPU hardware only supports row major memory layout for 2D and 3D textures (ignoring specialized layouts like swizzling and block compression but none of them are column major either). Modern image and video codecs only support row major layout for bitmaps.


The MKL blas/lapack implementation also provides the “cblas” interface (I’m sure most blas implementations do, I’m just familiar with MKL—BLIS seems quite willing to provide additional interfaces to I bet they provide it as well) which explicitly accepts arguments for row or column ordering.

Internally the matrix is tiled out anyway (for gemm at least) so column vs row ordering is probably a little less important nowadays (which isn’t to say it never matters).


Oh yes, from an actual implementation POV you can just apply some transpose and ordering transforms to convert from row major to column major or vice-versa. cblas is pretty universal though I don't think any LAPACK C API ever gained as wide support for non column-major usage (and actually has some routines where you can't just pull transpose tricks for the transformation).

Certain layouts have performance advantages for certain operations on certain microarchitectures due to data access patterns (especially for level 2 BLAS), but that's largely irrelevant to historical discussion of the API's evolution.


Much better to burn the area for multiple smaller units, its a bit more area for frontend handling, but worth it for the flexibility (see Apple's M-series chips vs intel avx*).


Yes and no. I think neon is undersized for today at 128bit registers -- if you're working with doubles for example, that's only two values per register, which is pretty anemic. Things like shuffles and other tricky bitops benefit from wider widths as well (see my other reply)


Agreed that 128 bit is undersized, but 512 feels pretty good for the time being. We're unlikely to see further size increases since going to 1024 would require doubling the cache line, register file, and ram bandwidth, while just adding an extra fma port is far less hardware.


totally - especially given how bandwidth constrained CPUs still are, going wider than 512 doesn't make much sense. 512 itself was a stretch for quite a long time (and all the negative press on the original implementations was a consequence of being not-quite-ready for primetime), but for current hardware I think it's perfect.

But 128bit is just ancient. If you're going to go to significant trouble to rewrite your code in SIMD, you want to at least get a decent perf return on investment!


128 bit is already really nice for things like Int8 comparison (e.g lots of string operations and Swiss Dict key search)


In the UK specifically the radical reform (read destruction) of council housing by the Thatcher government had a large impact on the housing market in the 1980s.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: