Hacker News new | past | comments | ask | show | jobs | submit login
Blaze: A High Performance C++ Math Library (bitbucket.org/blaze-lib)
85 points by optimalsolver 14 days ago | hide | past | favorite | 60 comments



It seems like every large project these days has coalesced around Eigen, what are some of the advantages that Blaze has over Eigen?


I'm surprised people think this, there is also the widely-used Armadillo linear algebra library. In my opinion it has a much nicer syntax.

https://arma.sourceforge.net/


How's the performance?

EDIT: also being on Sourceforge is kind of a hinderance to discovery these days. I wonder why they chose to be on there instead of github?


It's slower but maybe the target audience is different? Armadillo prioritizes MATLAB like syntax. I use armadillo as a stepping stone between MATLAB prototypes and a hand rolled C++ solution, and in many scenarios it can get you a long ways down the road.


On this exact sequence, is there a LLM of choice that is really performant in this translation task? To armadillo, Eigen, Blaze or even numpy?

I have had very little success with most of the open self-hosted ones, even with my 4xA40 setup, as they either don't know the c++ libraries or generate very good-looking numpy stuff, full of horrors, simple and very very subtle bugs...

Looking for the same thing from any linear algebra library or language to cuda BTW (yes, calls to cu-blas/solver/sparse/tlass/dnn are OK), I haven't found one model able to write cuda code properly - not even kernels themselves but at least chaining library calls.

Probably doesn't exist (invoking Cunningham's Law).


Linear algebra routines seem like one of the worst possible use cases for current LLMs.

Large amounts of repetitive yet meaningfully detailed code. Algorithms that can (and often are) implemented using different conventions or orders of operations. Edge cases out the wazoo.

A solid start seems like it would be using LLMs to write extensive test suites which you can use to verify these new implementations.


Yet for me all this C++/CUDA code is a lot of boilerplate to express dense and supposedly very tired concepts. I thought LLMs were supposed to help with the boilerplate. But yeah I guess it won't work.

And yes, it's nice to build unit test and benchmark harnesses. But those were never really such time-wasters for me.


Tough to say something as blanket as "it's slower"... there are lots of operations in any linear algebra library. It's not a direct comparison with other C++ linear algebra libraries, but hard to say Armadillo is slow based on benchmarks like this:

https://conradsanderson.id.au/pdfs/sanderson_curtin_armadill...


According to the provided benchmarks [1], it seems to be quite a bit faster.

[1] https://bitbucket.org/blaze-lib/blaze/wiki/Benchmarks


These benchmarks look to be ~8 years old, and don't really agree with benchmarks done by other sources (https://romanpoya.medium.com/a-look-at-the-performance-of-ex..., https://eigen.tuxfamily.org/index.php?title=Benchmark)

In general I would be skeptical about any benchmark that claims to beat MKL significantly on standard operations


beating MKL for <100x100 is pretty doable. the BLAS framework has a decent amount of inherent overhead, so just exposing a better API (e.g. one that specifies the array types and sizes well) makes it pretty easy to improve things. For big sizes though, MKL is incredibly good.


If you are talking about non-small matrix multiplication in MKL, is now in opensource as a part of oneDNN. It literally has exactly the same code, as in MKL (you can see this by inspecting constants or doing high-precision benchmarks).

For small matmul there is libxsmm. It may take tremendous efforts make something faster than oneDNN and libxsmm, as jit-based approach of https://github.com/oneapi-src/oneDNN/blob/main/src/gpu/jit/g... is too flexible: if someone finds a better sequence, oneDNN can reuse it without major change of design.

But MKL is not limited to matmul, I understand it...


Compile times for one.

Eigen uses C++ templates to do most things, which explodes compile times.


AFAIK blaze is also somewhat heavy on templates, but maybe it uses more modern metaprogramming techniques.


Compile times and binary sizes :(


Aaaand debug times. And profiling. I'd forgotten the joys of debugging/tracing heavily templated code before I jumped back into Eigen. Not that MKL was easier to debug but nowadays most of oneapi is open-source, at least the parts I use?

Or cuBLAS. In practice, if I'm going through the trouble to rewrite math in C++, I'd rather just make GPU kernels.


I mean, that only works for a small subset of workloads where the data movement patterns fit, the bandwidth is more important than the latency, etc.

The reality is that almost all workloads aren't anywhere near saturating the AVX instruction max bandwidth on a CPU since Haswell.


> almost all workloads aren't anywhere near saturating the AVX instruction max bandwidth on a CPU since Haswell

That’s true, but GPUs aren’t only good at FLOPs, the memory bandwidth in them is also an order of magnitude faster than system memory.

In my previous computer, the numbers were 484 GB/second for 1080 Ti, and 50 GB/second for DDR4 system memory. In my current one, they are 672 GB/second for 4070 Ti super, and 74 GB/second for DDR5 system memory.


I'm by no means an expert in the topic, but to share my take anyway: It seems to me like there's just diminishing returns in SIMD approaches. If you're going to organize your data well for SIMD use then it's not a far reach to make it work well on a gpu, which will keep getting more cores.

I imagine we'll get to a point where CPUs are actually just pretty dumb drivers for issuing gpu commands.


As someone who worked on CUDA 15 years ago - it’s amazing to me that someone on the internet posted this statement.

Did GPUs win?


I don't think that there's a "win" here. It's just sort of which way you tilt your head, how much space do you have to cram a ton of cores connected to a really wide memory bus and how close can you get the storage while keeping everything from catching on fire, no? ("just sort of" is going to have to skip leg day because of the herculean lift it just did)

It's a fairly fractal pattern in distributing computing. Move the high throughput heavy computation bits away from the low latency responsive bits ("low latency" here is relative to the total computation). Use an event loop for the reactive bits. Eventually someone will invert the event loop to use coroutines so everything looks synchronous (Go, anyone? python's gevent?).

After it seems to me that the only real question is if takes too long or costs too much to move the data to the storage location the heavy computation hardware uses. There's really not much of a conceptual difference between airflow driving snowflake and c++ running on a cpu driving cuda kernels. It takes a certain scale to make going from a OLTP database to an OLAP database worth it, just like it takes a certain scale to make a GPU worth it over simd instructions on the local processor.


Yes and no. The compute density and memory bandwidth is unmatched. But the programming model is markedly worse, even for something like CUDA: you inherently have to think about parallelism, how to organize data, write your kernels in a special language, deal with wacky toolchains, and still get to deal with the CPU and operating system.

There is great power in the convenience of "with open('foo') as f:". Most workloads are still stitching together I/O bound APIs, not doing memory-bound or CPU-bound compute.


CUDA was always harder to program - even if you could get better perf

It took a long time to find something that really took advantage of it, but we did eventually. CUDA enabled deep learning which enabled LLMs . That's history.

What surprised me about the statement was that it implied that the model of python driving optimized GPU kernels was broader than deep learning.

That was the original vision of CUDA - most of the computational work being done by massively parallel cores


Win what? This person said they were inexperienced. SIMD is extremely valuable and the situations where it works well are not rare at all.


Not really, no.

GPUs are still very limited, even compared to the SIMD instruction set. You couldn't make a CUDAjson the same way the SIMDjson library is built for example, because it doesnt handle SIMD branching in a way that accomodates it.

Second, again, the latency issue. GPUs are only good if you have a pipeline of data to constantly feed it, so that the PCIe transfer latency issue is minimal.


With PCIe 4 and 5 the latency issues are not as much a problem as they were, what with latency masking, gpudirect/storage-direct, busy-loop kernels (and hopefully soon scheduling libraries to make them easier to use) :-) and if you're really into real-time, computing time on NVIDIA GPUs has excellent jitter/stability and they are used in the very tight control loop of adaptive-optics (1ms-loop with mechanical actuators to drive).

The penalty for branching has reduced in the last years, but yeah it's still heavy, but if you're OK with a bit of wasted compute, you can do some 'speculative' execution and do both branches in different warps, use only one result...

But yes, you're still using an accelerator.


Depends on whether you measure workloads as "jobs" or "flops". If "flops", I would hazard that the bulk of computing on the planet right now is happening on GPUs.


Is Eigen still alive? There's been no release in 3 years, and no news about it: https://gitlab.com/libeigen/eigen/-/issues/2699


The master branch is active and people use Eigen today. The Discord has maintainers that are still active. Not sure how it could be considered "dead"?


The rise of frontend developers over the last 5 years learned everything must be new.

That a math library of all things could be complete is several orders of thinking beyond their ability. I'm sure the gut reaction is to downvote this for the embarrassing criticism, but in all seriousness, this is the right answer.


I realize asking for a new 4.0 release is fair (and the GitLab issue does have a highly upvoted request for a release).

But you can't just call things "dead" for no reason, it's in poor taste. It's feature-complete, not dead!


Sure code can be “feature complete” but the reality is the rest of the world changes, so there will be more and more friction for your users over time. For example someone in the issue mentions they need to use mainline to use eigen with cuda now.


Mathematics is a priori. It's beyond the world changing. You might be surprised to learn we still use Euclid's geometry despite it being thousands of years old.

What you're actually saying is you expect open source maintainers to add arbitrary functionality for free.


> Mathematics is a priori

Sure, but the discussion here is about a software library not the math concepts


Software programs are equivalent to mathematical proofs. [1]

Short of a bug in the implementation, there has yet to be a valid explanation for why mathematics libraries need to be continuously maintained. If I published an NPM library called left-add, which adds the left parameter to the right parameter (read: addition) how long, exactly, should I expect to maintain this for others?

The only explanation so far is that scumbags expect open source library maintainers to slave away indefinitely. The further we steer into the weeds of ignorant explanations, the more I'm inclined to believe this really is the underlying rationale.

1: https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_correspon...


There are many reasons why a library require continuous maintainance even when it's "feature-complete", off the top of my head:

1. Bug fixes

2. Security issues

3. Optimization

4. Compatibility/Adapt to landscape changes

People pointing flaws in a library aren't "scumbags that expect open source library maintainers to slave away indefinitely"

No one is forcing the maintainer to "slave away", they can step down any time and say I'm not up for this role anymore. Those interested will fork the library and carry the torch.

No need to be so defensive and insult others just for giving feedback.


I think you’ve constructing a strawman, arguing for general software libraries. We're talking specifically about math libraries.

Regardless of the strawman, the person(s) that authored the code don’t owe you anything. They don’t have to step down, make an announcement, or merge your changes just because you can’t read or comprehend the license text that says very clearly in all capital letters the software is warrantied for no purpose once so ever, implied or otherwise.

If one had a patch and was eager to see it upstreamed quickly, it seems like you’re arguing the maintenance status actually doesn’t matter. Since "[t]hose interested will fork the library and carry the torch" if the patch isn’t merged expediently.

But if you're confident the interested will fork and carry the torch, why do you think you're entitled to force the author(s) giving software warrantied for no purpose should step down. That's genuinely deranged, and my insults appear to be accurate descriptions rather than ad hominem attacks since no coherent explanation has been provided as to why the four reasons given somehow supersede the authors chosen license.


I don’t think I’m saying that at all. There are plenty of little libraries out there written in C89 in 1994 that still work perfectly well today. But they don’t claim to use the latest compiler or hardware features to make the compiled binary fast, nor do they come with expectations about how easy or hard it is to integrate. The code simply exists and has not been touched in 30 years. Use at your own peril.

If you have a math library that is relying on hardware and compilers to make it fast you should acknowledge that the software and hardware ecosystem in which you exist is constantly changing even if the math is not.


> THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

This is a pretty bold and loud acknowledgement.

What more could you really ask for when even lawyers think this is sufficient.


> What more could you really ask for

Some signal that the project is being maintained? If it’s not that’s fine but don’t go radio silent and get pissy when people ask if a project is dead…

This is not a legal or moral issue it’s just being considerate for others as well. You, the maintainer, made the choice to maintain this project in the public and foster a userbase. This is not a one-way relationship. People spend their time making patches and integrating your software. You are under no obligation to maintain it of course but dont be a dick.


The reason open source maintainers get pissy is because idiots selectively ignore entire paragraphs of the license that explicitly states the project isn't maintained and you shouldn't imply it is under any circumstances. The author is being extremely considerate. The problem is fools have no respect for author or chosen license. They rather do the opposite of what the author's license says. The only reason we're having this discussion is because there's enough fools that think they might be on to something.

The implication is the mistake, not the author for not being explicit enough.


The only one being foolish here is you with needless pedantry. Yes the legal contract says that the authors dont owe anyone anything but there is also a social contract at play here that you are apparently not understanding.


I don't recall there ever being a social contract.

Further, what makes you assume everyone is on the same page about what that social contract is? Have you even considered the possibility that there might be differences of opinion on a social contract which are incompatible? It's why the best course of action is to follow the license rather than delusional fantasies.

The idea there's a social contract is sophistry. Plain and simple.


What? You mean I don't need to refactor and break API every 6 months?


I mean, it's not like linear algebra has changed that much in 4 years?


Randomized linear algebra and under-solving (mixed precision or fp32 instead of fp64) seem to be taking off more than in the past, mostly on gpu though (use of tensor cores, expensive fp64, memory bandwidth limits).

And I wish Eigen had a larger spectrum of 'solvers' you can chose from, depending on what you want. But in general I agree with you, except there's always a cycle to eke out somewhere, right?


Too many people have their brain rotted from the web dev world where things are reinvented every other week.


If you want something similar, but for games:

https://github.com/EricLengyel/Terathon-Math-Library


Another good PGA library

https://github.com/jeremyong/Klein


What is the advantage over glm? The geometric algebra stuff?


out of curiosity, when and/or how often do these high-performance math libraries get folded into game physics engines? Like would Blaze offer any sort of advantage if you were to develop a new 3d soft/hard body physics engine?


For typical game physics engines... not that much. Math libraries like Eigen or Blaze use lots of template metaprogramming techniques under the hood that can help when you're doing large batched matrix multiplications (since it can remove temporary allocations at compile-time and can also fuse operations efficiently, as well as applying various SIMD optimizations), but it doesn't really help when you need lots of small operations (with mat3 / mat4 / vec3 / quat / etc.). Typically game physics engines tend to use iterative algorithms for their solvers (Gauss-Seidel, PBD, etc...) instead of batched "matrix"-oriented ones, so you'll get less benefits out of Eigen / Blaze compared to what you typically see in deep learning / scientific computing workloads.

The codebases I've seen in many game physics engines seem to all roll their own minimal math libraries for these stuff, or even just use SIMD (SSE / AVX) intrinsics directly. Examples: PhysX (https://github.com/NVIDIA-Omniverse/PhysX), Box2D (https://github.com/erincatto/box2d), Bullet (https://github.com/bulletphysics/bullet3)...


I don't know if I would call a math library that uses templates so liberally "high performance". High performance also includes compile time in my opinion.


Your opinion is wrong.


I get the template hate, they take a while to wrap your head around and can create cryptic bugs. Nonetheless they can be extremely powerful and enable performance and reduced complexity by being a bit complex upfront.

Yeah. Avoiding templates almost certainly leads to losing run time performance. The compile time is a drop in the bucket.


Are there any benchmarks to show it would be noticeably faster to compile with a non-templated design?

Is this in represented here for posterity? Last news item is 15.8.2020. There are recent commits for compiler compatibility testing (feb 2024).

What is of import here?


People can post whatever they want on HN. It's a neat library, why not post it?

Previous post (by the same submitter): https://news.ycombinator.com/item?id=34407106




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: