llama.cpp is the SOTA inference engine that everyone in the know uses, and has a Vulkan backend.
Most software in the world is Vulkan, not CUDA, and CUDA only works on a minority of hardware. Not only that, AMD has a compatibility layer for CUDA, called HIP, part of the ROCm suite of legacy compatibility APIs, that isn't the most optimal in the world but gets me most of the performance I would expect from a similar Nvidia product.
Most software in the world (not just machine learning related stuff) is written in an API that is cross-compatible (OpenGL, OpenCL, Vulkan, Direct family APIs). Nvidia continually sending a message of "use CUDA" really means "we suck at standards compliance, and we're not good at the APIs most software is written in"; since everyone has realized the emperor wears no clothes, they've been backing off on that, and are slowly improving their standards compliance for other APIs; eventually, you won't need the crutch of CUDA, and you shouldn't be writing software today in it.
Nvidia has a bad habit of just dropping things without warning when they're done with them, don't be an Nvidia victim. Even if you buy their hardware, buying new hardware is easy: rewriting away from CUDA isn't (although, certainly doable, especially with AMD's HIP to help you). Just don't write CUDA today, and you're golden.
Most software in the world is Vulkan, not CUDA, and CUDA only works on a minority of hardware. Not only that, AMD has a compatibility layer for CUDA, called HIP, part of the ROCm suite of legacy compatibility APIs, that isn't the most optimal in the world but gets me most of the performance I would expect from a similar Nvidia product.
Most software in the world (not just machine learning related stuff) is written in an API that is cross-compatible (OpenGL, OpenCL, Vulkan, Direct family APIs). Nvidia continually sending a message of "use CUDA" really means "we suck at standards compliance, and we're not good at the APIs most software is written in"; since everyone has realized the emperor wears no clothes, they've been backing off on that, and are slowly improving their standards compliance for other APIs; eventually, you won't need the crutch of CUDA, and you shouldn't be writing software today in it.
Nvidia has a bad habit of just dropping things without warning when they're done with them, don't be an Nvidia victim. Even if you buy their hardware, buying new hardware is easy: rewriting away from CUDA isn't (although, certainly doable, especially with AMD's HIP to help you). Just don't write CUDA today, and you're golden.