Hacker Newsnew | past | comments | ask | show | jobs | submit | danielhanchen's commentslogin

Qwen's latest Qwen-Image-2512 is currently the strongest open source model.

To run them locally, we made some GGUFs: https://huggingface.co/unsloth/Qwen-Image-2512-GGUF


Love the blog :) If you or folks are looking for junior ML roles on training, RL & distributed training, doors always open!


Super agree! Love how uv installs packages in parallel! It made installs 30 seconds from 5 minutes during `uv pip install unsloth`!


I made some dynamic GGUFs for the 32B MoE model! Try:

./llama.cpp/llama-cli -hf unsloth/granite-4.0-h-small-GGUF:UD-Q4_K_XL

Also a support agent finetuning notebook with granite 4: https://colab.research.google.com/github/unslothai/notebooks...


You guys are lightning fast. Did you folks have access to the model weights before hand or something, if you don't mind me asking?


Oh thanks! Yes sometimes we get early access to some models!


As always, you're awesome. keep up the great work!


Thanks!


Made some dynamic GGUFs for those interested! https://huggingface.co/unsloth/granite-4.0-h-small-GGUF (32B Mamba Hybrid + MoE)


Thanks! Any idea why I'm getting such poor performance on these new models? Whether Small or Tiny, on my 24GB 7900XTX I'm seeing like 8 tokens/s using the latest llama.cpp with vulkan. Even if it was running 4x faster than this I would be asking why I'm getting so few tokens/s when it sounds like the models are supposed to bring increased inference efficiency.


Oh I think its a Vulcan backend issue - someone raised it with me and said the rocm backend is much faster


Ye probs mis-spelt :)


Thank you! No worries at all! Yes! Sleep mode is super cool since this means the allocation of memory for inference can be totally decoupled away from training, which opens the door to many larger RL runs!


Oh hey! Thanks for the love :)

The primary goal of the release and our notebook https://colab.research.google.com/github/unslothai/notebooks... was actually to showcase how to mitigate reward hacking in reinforcement learning - for example when RL learns to cheat and output global variables instead like editing the timer to cheat on benchmarking and others. You can edit the notebook to do rl on other powerful models like Qwen, Llama etc automatically with Unsloth as well via our automatic compiler! We also made sink attention and moe inference super optimized for training - note flash attention 3 doesn't have sink backwards support so you'll have to use unsloth.

Gpt-oss tbh in our tests is a truly powerful model, especially the 120b variant - it's extremely popular in western enterprises since yes it's from openai but also because reasoning mode high and the censored nature and its reasoning capabilities are attractive. A big underutilized feature is its web search and internal intermediate tool calling which it can do as part of its reasoning chain just like o3 or gpt5.

RL yes isn't an all powerful hammer, but it can solve so many more new problems. For a financial institution, you can make automatic trading strategies via RL. For an intelligence agency, decryption via RL. For a legal startup, possibly case breakthroughs via RL, automatic drug candidates etc. And yes, big labs want to automate all tasks via massive RL for eg being able to play pokemon and all other games as one example. RL opened so many doors since you don't need any data, just one prompt like "make fast matrix multiplications kernels", and reward functions - it can allow many more interesting use cases where data is a constraint!


Can you elaborate on “decryption via RL”


Definitely not breaking any modern day standards, but from what I understand, some folks are trying it on simple ciphers or combinations of simple ciphers to first see if RL can help.


I think you can train a model to decrypt an encrypted. My friend tried this only on like simple example tho. As long as we have the environment, we can do these things.


I’m sorry but I don’t buy for a second that you can do meaningful and even close to reliable decryption with RLHF on currently known secure ciphers.

Furthermore, I’m very worried that whoever may be paying for this is barking up the wrong tree. I feel that the damage done with extremely bad decryption attempts would massively outweigh the very few times when whatever it “decrypts” is meaningfully close to what the actual text was.

I’m aware of how easy certain things in surveillance are (I.e n-gram analysis is enough to dox anyone on HN in like 10 words of text) - but even sort of decent decryption of SHA-256 would be a literally front page of the world achievement.


If you’re going to be rude and arrogant, then the level of knowledge you exhibit has to match. SHA-256 decryption would be a front of the world achievement because it would be redefining foundational mathematics since it’s not an encryption algorithm. The words you’d be looking for are either a collision of SHA-256 or breaking encryption algorithms like AES, RSA, ECC etc.

sha-256 is used in the construction of certain encryption algorithms as a primitive but by itself never encrypts anything. If it did you’ve also got middleout compression invented since you could encrypt arbitrary length input into 256 bits of output.


Oh yes if RL breaks SHA-256 that'll be revolutionary - but definitely not that - some folks are for now investigating basic combinations of old school ciphers for now - security applications with RL are most likely for now related to automatically finding attack surfaces and creating defensive layers - I probably should have re-worded "decryption for RL" to just "security for RL" sorry!


Made a collection of quants for those interested! https://huggingface.co/unsloth/embeddinggemma-300m-GGUF Will try making some notebooks in the next few days :)


That was quick indeed :)


:)


Hey HN! Just sharing some work we did to make gpt-oss finetuning use O(N) and not O(N^2) VRAM via Flex Attention + some bug fixes :)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: