More

danielhanchen · 2025-12-31T10:15:43 1767176143

Qwen's latest Qwen-Image-2512 is currently the strongest open source model.

To run them locally, we made some GGUFs: https://huggingface.co/unsloth/Qwen-Image-2512-GGUF

danielhanchen · 2025-11-10T05:11:16 1762751476

Love the blog :) If you or folks are looking for junior ML roles on training, RL & distributed training, doors always open!

danielhanchen · 2025-10-30T08:36:56 1761813416

Super agree! Love how uv installs packages in parallel! It made installs 30 seconds from 5 minutes during `uv pip install unsloth`!

danielhanchen · 2025-10-03T07:50:05 1759477805

I made some dynamic GGUFs for the 32B MoE model! Try:

./llama.cpp/llama-cli -hf unsloth/granite-4.0-h-small-GGUF:UD-Q4_K_XL

Also a support agent finetuning notebook with granite 4: https://colab.research.google.com/github/unslothai/notebooks...

anshumankmr · 2025-10-03T08:34:30 1759480470

You guys are lightning fast. Did you folks have access to the model weights before hand or something, if you don't mind me asking?

danielhanchen · 2025-10-04T11:59:14 1759579154

Oh thanks! Yes sometimes we get early access to some models!

incomingpain · 2025-10-03T12:24:21 1759494261

As always, you're awesome. keep up the great work!

danielhanchen · 2025-10-04T11:59:20 1759579160

Thanks!

danielhanchen · 2025-10-02T22:08:01 1759442881

Made some dynamic GGUFs for those interested! https://huggingface.co/unsloth/granite-4.0-h-small-GGUF (32B Mamba Hybrid + MoE)

CMay · 2025-10-03T02:58:33 1759460313

Thanks! Any idea why I'm getting such poor performance on these new models? Whether Small or Tiny, on my 24GB 7900XTX I'm seeing like 8 tokens/s using the latest llama.cpp with vulkan. Even if it was running 4x faster than this I would be asking why I'm getting so few tokens/s when it sounds like the models are supposed to bring increased inference efficiency.

danielhanchen · 2025-10-04T11:59:54 1759579194

Oh I think its a Vulcan backend issue - someone raised it with me and said the rocm backend is much faster

danielhanchen · 2025-09-27T20:25:15 1759004715

Ye probs mis-spelt :)

danielhanchen · 2025-09-27T17:46:22 1758995182

Thank you! No worries at all! Yes! Sleep mode is super cool since this means the allocation of memory for inference can be totally decoupled away from training, which opens the door to many larger RL runs!

danielhanchen · 2025-09-27T12:02:36 1758974556

Oh hey! Thanks for the love :)

The primary goal of the release and our notebook https://colab.research.google.com/github/unslothai/notebooks... was actually to showcase how to mitigate reward hacking in reinforcement learning - for example when RL learns to cheat and output global variables instead like editing the timer to cheat on benchmarking and others. You can edit the notebook to do rl on other powerful models like Qwen, Llama etc automatically with Unsloth as well via our automatic compiler! We also made sink attention and moe inference super optimized for training - note flash attention 3 doesn't have sink backwards support so you'll have to use unsloth.

Gpt-oss tbh in our tests is a truly powerful model, especially the 120b variant - it's extremely popular in western enterprises since yes it's from openai but also because reasoning mode high and the censored nature and its reasoning capabilities are attractive. A big underutilized feature is its web search and internal intermediate tool calling which it can do as part of its reasoning chain just like o3 or gpt5.

RL yes isn't an all powerful hammer, but it can solve so many more new problems. For a financial institution, you can make automatic trading strategies via RL. For an intelligence agency, decryption via RL. For a legal startup, possibly case breakthroughs via RL, automatic drug candidates etc. And yes, big labs want to automate all tasks via massive RL for eg being able to play pokemon and all other games as one example. RL opened so many doors since you don't need any data, just one prompt like "make fast matrix multiplications kernels", and reward functions - it can allow many more interesting use cases where data is a constraint!

ripped_britches · 2025-09-27T14:15:31 1758982531

Can you elaborate on “decryption via RL”

danielhanchen · 2025-09-27T17:55:08 1758995708

Definitely not breaking any modern day standards, but from what I understand, some folks are trying it on simple ciphers or combinations of simple ciphers to first see if RL can help.

terataiijo · 2025-09-27T15:10:24 1758985824

I think you can train a model to decrypt an encrypted. My friend tried this only on like simple example tho. As long as we have the environment, we can do these things.

Der_Einzige · 2025-09-27T15:52:05 1758988325

I’m sorry but I don’t buy for a second that you can do meaningful and even close to reliable decryption with RLHF on currently known secure ciphers.

Furthermore, I’m very worried that whoever may be paying for this is barking up the wrong tree. I feel that the damage done with extremely bad decryption attempts would massively outweigh the very few times when whatever it “decrypts” is meaningfully close to what the actual text was.

I’m aware of how easy certain things in surveillance are (I.e n-gram analysis is enough to dox anyone on HN in like 10 words of text) - but even sort of decent decryption of SHA-256 would be a literally front page of the world achievement.

vlovich123 · 2025-09-27T17:18:52 1758993532

If you’re going to be rude and arrogant, then the level of knowledge you exhibit has to match. SHA-256 decryption would be a front of the world achievement because it would be redefining foundational mathematics since it’s not an encryption algorithm. The words you’d be looking for are either a collision of SHA-256 or breaking encryption algorithms like AES, RSA, ECC etc.

sha-256 is used in the construction of certain encryption algorithms as a primitive but by itself never encrypts anything. If it did you’ve also got middleout compression invented since you could encrypt arbitrary length input into 256 bits of output.

danielhanchen · 2025-09-27T17:58:34 1758995914

Oh yes if RL breaks SHA-256 that'll be revolutionary - but definitely not that - some folks are for now investigating basic combinations of old school ciphers for now - security applications with RL are most likely for now related to automatically finding attack surfaces and creating defensive layers - I probably should have re-worded "decryption for RL" to just "security for RL" sorry!

danielhanchen · 2025-09-04T17:05:42 1757005542

Made a collection of quants for those interested! https://huggingface.co/unsloth/embeddinggemma-300m-GGUF Will try making some notebooks in the next few days :)

seshakiran · 2025-09-04T20:02:14 1757016134

That was quick indeed :)

danielhanchen · 2025-09-04T21:44:42 1757022282

danielhanchen · 2025-08-28T17:45:09 1756403109

Hey HN! Just sharing some work we did to make gpt-oss finetuning use O(N) and not O(N^2) VRAM via Flex Attention + some bug fixes :)