Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A process is described here: https://arxiv.org/pdf/2506.22405

>A physician or AI begins with a short case abstract and must iteratively request additional details from a gatekeeper model that reveals findings only when explicitly queried. Performance is assessed not just by diagnostic accuracy but also by the cost of physician visits and tests performed.



I believe that dataset was built off of cases that were selected for being unusual enough for physicians to submit to the New England Journal of Medicine. The real-world diagnostic accuracy of physicians in these cases was 100% - the hospital figured out a diagnosis and wrote it up. In the real world these cases are solved by a team of human doctors working together, consulting with different specialists. Comparing the model's results to the results of a single human physician - particularly when all the irrelevant details have been stripped away and you're just left with the clean case report - isn't really reflective of how medicine works in practice. They're also not the kind of situations that you as a patient are likely to experience, and your doctor probably sees them rarely if ever.


Either way, the AI model performed better than the humans on average, so it would be reasonable to infer that AI would be a net positive collaborator in a team of internists.


Okay you have a point. AI probably would do really well when short case abstracts start walking into clinics.


How else would a study scientifically determine the accuracy of an AI model in diagnosis? By testing it on real people before they know how good it is?


Why not ? Have AI do it then have human doctor do a follow-up/review ? I might not be a fan of this for urgent care but for general visits I wouldn't mind spending a bit extra time if they it was followed by an expert exam.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: