For covering the risk of mistakes I suggest considering ways of "visually quotin...

lysecret · 2026-01-02T14:23:31 1767363811

Oh I didn’t know about the visual bounding boxes this is super cool!

Quick question are you talking about this feature?

https://docs.cloud.google.com/vertex-ai/generative-ai/docs/b...

Because it’s just using structured response so it should be doable with Gemini 3 ? (We are using Gemini 3 for some docs processing and its visual understanding is just incredible)

simonw · 2026-01-02T15:13:15 1767366795

No I'm talking about the image segmentation feature: https://simonwillison.net/2025/Apr/18/gemini-image-segmentat...

But the bounding box stuff might work well enough in Gemini 3 to handle this case as well.

lysecret · 2026-01-02T18:41:51 1767379311

Hmm so that post also links back to segmentation done by structured outputs? (Though here not even enforcing the structure)

https://ai.google.dev/gemini-api/docs/image-understanding#se...

simonw · 2026-01-02T19:13:10 1767381190

It's not supported by Gemini 3: https://ai.google.dev/gemini-api/docs/gemini-3#migrating_fro...

> Image segmentation: Image segmentation capabilities (returning pixel-level masks for objects) are not supported in Gemini 3 Pro or Gemini 3 Flash. For workloads requiring native image segmentation, we recommend continuing to utilize Gemini 2.5 Flash with thinking turned off or Gemini Robotics-ER 1.5.

beechwood · 2026-01-02T14:14:31 1767363271

Ok, gotcha. I think this is doable. Show the excerpt from the original document so the user has confidence the data is correct.

Thank you for the feedback.