Launch HN: Sieve (YC W22) – Pluggable APIs for Video Search

plasma · on Feb 2, 2022

Neat project, you may want to consider adding audio processing too (eg sound detected) as part of the video.

You could go deeper and compare samples of audio that could be uploaded separately (eg, siren sounds), check out MFCC processing https://en.wikipedia.org/wiki/Mel-frequency_cepstrum#Applica... to do Shazam-style audio comparison.

I wonder too if you process it on a per-frame basis or you can take series of frames too (eg analyze the last 5 seconds of frames) to detect things like a "hand wave".

mvoodarla · on Feb 2, 2022

Thanks for the comment! Audio is definitely an interesting angle. We haven't spoken to a lot of folks yet that want that feature, but I'd guess some productivity tools that let you rewatch Zoom calls, baby monitoring, media playback, or a few other things like that might mind audio useful. We'll explore it for sure.

We currently process on the per-frame basis but have the optionality to increase the window size of 1, to any arbitrary size like 300 frames. This is what allows us to currently detect "actions" for some of our customers. Of course, it's not the same as a sliding window but it's something we're considering.

parkerhiggins · on Feb 3, 2022

Bingo. In enterprise environments audio is vastly more valuable than video for searching internal meeting and discussion recordings.

Often a recording title is not enough.

jensneuse · on Feb 2, 2022

Hey, that's a really nice product. I love that it's API first? Would you be interested in adding it to our API hub? We're currently in private beta and would love to have your API on it. Why? It's the best developer experience to integrate APIs and we'd love to send our users your way. Here's a link if you're interested: https://hub.wundergraph.com/

mvoodarla · on Feb 2, 2022

Thanks! The API hub seems interesting. Want to send me an email (see bio)?

jensneuse · on Feb 2, 2022

Sure!

applgo443 · on Feb 2, 2022

This is really cool!

I have a few questions

1. Is there really no other competitor or company that tried to tackle this problem in the past? It feels like a really common usecase and someone must've done something about it!

2. Do you have a fixed set of words that the user should use to query? I'm an AI researcher/practitioner who worked in this area. It's super difficult to search for tail objects in images/text.

3. Why API first?

mvoodarla · on Feb 2, 2022

Thanks for the thoughtful questions!

1. People have definitely tried this in the past in terms of having similar tech, but I don't think anyone has sanely approached it at the same angle. We're seeing so many more companies like EquipmentShare (https://www.equipmentshare.com/), UpKeep (https://www.upkeep.com/), Spot AI (https://www.spot.ai/) and others which are building core software for industries without necessarily having tons of video expertise in-house. Also other video-first companies like Gong (https://www.gong.io/), Mux (https://mux.com/), and Loom (https://www.loom.com/) are building platforms that might benefit from visual analysis at scale. Things like serverless functions were also less cool before.

2. There's a bunch of ways to search using Sieve. Some are fixed variables, others are visual similarity based. See here: https://sievedata.com/faq

3. Our target is the full stack developer within companies building cloud-first software for specific industries. Easy-to-use APIs are key because of this. They are the ones with the expertise to package it into a product that really works for the end user, and we're the ones with computer-vision and video expertise.

mandeepj · on Feb 3, 2022

> Is there really no other competitor or company that tried to tackle this problem in the past?

Microsoft and Google already have very mature offerings in this space.

https://azure.microsoft.com/en-us/services/media-services/#g...

mvoodarla · on Feb 3, 2022

It's a good point. While Microsoft and Google offer video processing / analysis, none of them are API-first in the same way we are.

1. We process videos, and can store their metadata / embeddings in perpetuity. This means a simple query API, don't worry about storing a DB of metadata yourself.

2. "Reverse Image Search" - Say a user cares about something specific, like a blue box that's toppled over. Using high-dimensional representations of certain frames in video, we can surface every time this scenario shows up in the future, based on a user having marked it as "interesting" in the past.

3. Our "killer" app is the video processing infra that's super modular. We give this flexibility to the full-stack dev to plug in their own models, or use ours.

A final thing to note is that we're going after a completely different set of companies, specifically companies focused on building software for a specific industry (i.e. https://www.equipmentshare.com/) that eventually want to add video analytics as an offering. It's also unclear who Azure or GCP are targeting though it seems like it's for general media content.

mandeepj · on Feb 3, 2022

Thanks for the detailed response. I wish you all the luck in your journey.

> 2. "Reverse Image Search" - Say a user cares about something specific, like a blue box that's toppled over. Using high-dimensional representations of certain frames in video, we can surface every time this scenario shows up in the future, based on a user having marked it as "interesting" in the past.

I believe a lot of retail/consumer companies would be interested just in that. Example use case - restaurants trying to find repeat customers or gauging their mood before/after the meal.

PanMan · on Feb 2, 2022

Like the concept, but especially for live feeds, seems expensive? at $3456/month/camera ? Is this not your use-case (it's the first one on your use-cases page)? Or am I missing something? Congrats on the launch!

mvoodarla · on Feb 3, 2022

Thanks!

Most use-cases have limitations in bandwidth, so not all data is streamed up to the cloud. Typically, people will run simple motion detectors on the physical camera device itself, and save content in the cloud that they think might be valuable. Our service can then be used to make that content searchable, past simple motion detection.

There are other use-cases where people store tons of data in the cloud already. Media content is an example of this. In this case, companies might want to analyze all the raw data for which we then charge only for intervals of video where there's real action. If the video is static, we won't charge the full per minute pricing.

Neander · on Feb 4, 2022

This is cool. Something tangential I've always wanted from video apps and APIs is the ability to highlight any video online. To just bracket cool clips in long videos and then be able to view and later edit those highlights together rapidly.