Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

For covering the risk of mistakes I suggest considering ways of "visually quoting" the documents.

If the summary says "closing timeline: X" but there's an icon I can click that pops open an overlay with a visual cropped screenshot of that part of the original PDF - maybe even with a red circle around that detail - I can trust those summaries a whole lot more.

Gemini 2.5 has image bounding box and masking features that can help with this (sadly missing from Gemini 3.)





Oh I didn’t know about the visual bounding boxes this is super cool!

Quick question are you talking about this feature?

https://docs.cloud.google.com/vertex-ai/generative-ai/docs/b...

Because it’s just using structured response so it should be doable with Gemini 3 ? (We are using Gemini 3 for some docs processing and its visual understanding is just incredible)


No I'm talking about the image segmentation feature: https://simonwillison.net/2025/Apr/18/gemini-image-segmentat...

But the bounding box stuff might work well enough in Gemini 3 to handle this case as well.


Hmm so that post also links back to segmentation done by structured outputs? (Though here not even enforcing the structure)

https://ai.google.dev/gemini-api/docs/image-understanding#se...


It's not supported by Gemini 3: https://ai.google.dev/gemini-api/docs/gemini-3#migrating_fro...

> Image segmentation: Image segmentation capabilities (returning pixel-level masks for objects) are not supported in Gemini 3 Pro or Gemini 3 Flash. For workloads requiring native image segmentation, we recommend continuing to utilize Gemini 2.5 Flash with thinking turned off or Gemini Robotics-ER 1.5.


Ok, gotcha. I think this is doable. Show the excerpt from the original document so the user has confidence the data is correct.

Thank you for the feedback.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: