I can’t speak to the Tesla stuff but I run an Epyc 7713 with a single 3090 and c... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		oceanplexian 5 months ago \| parent \| context \| favorite \| on: Ask HN: How can ChatGPT serve 700M users when I ca... I can’t speak to the Tesla stuff but I run an Epyc 7713 with a single 3090 and creatively splitting the model between GPU/8 channels of DDR4 I can do about 9 tokens per second on a q4 quant.

CamperBob2 5 months ago [–]

Impressive. Is that a distillation, or the real thing?

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact