Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Launch HN: Sieve (YC W22) – Pluggable APIs for Video Search (sievedata.com)
71 points by mvoodarla on Feb 2, 2022 | hide | past | favorite | 14 comments
Hi HN, we’re Mokshith and Abhi from Sieve (https://sievedata.com). We’re building an API that lets you add video search to internal tools or customer applications, instantly. Sieve can process 24 hours of video in less than 10 minutes, and makes it easy to search video by detected objects / characteristics, motion data, and visual similarity. You can use our models out of the box, or plug-in your own model endpoints into our infrastructure. ('Model' here means any software that produces output given an image.)

Every industry from security, to media, supply chain, construction, retail, sports, and agriculture is being transformed by video analytics—but setting up the infrastructure to process video data quickly is difficult. Having to deal with video ingestion pipelines, computer-vision model training, and search functionality is not pretty. We’re building a platform that takes care of all of this so teams can focus on their domain-expertise, building industry-specific software.

We met in high school, and were on the robotics team together. It was our first exposure to computer vision, and something we both deeply enjoyed. We ended up going to UC Berkeley together and worked on computer vision at places like Scale AI, Niantic, Ford, NVIDIA, Microsoft, and Second Spectrum. We were initially trying to solve problems for ourselves as computer vision developers but quickly realized the unique problems in video having to do with cost, efficiency, and scale. We also realized how important video would be in lots of verticals, and saw an opportunity to build infrastructure which wouldn’t have to be rebuilt by a fullstack dev at any company again.

Let’s take the example of cloud software for construction which might include tons of features from asset trackers to rental management and compliance checks. It doesn’t make sense for a construction software company to build their own video processing for telematics—the density and scale of video make this a difficult task. A single 30 FPS camera generates over 2.5M frames within a day of recording. Imagine this across thousands of cameras and many weeks of footage—not to mention the actual vertical-specific software they’re building for end users.

Sieve takes care of everything hard about processing and searching video. Our API allows you to process and search video with just two API calls. We use filtering, parallelization, and interpolation techniques to keep costs low, while being able to process 24 hours of video in under 10 minutes. Users can choose from our pre-existing set of models, or use their own models with our video processing engine. Our pricing can range anywhere from $0.08-$0.45 per minute of video processed based on the models clients are interested in and usage volume. Our FAQ page (https://sievedata.com/faq) explains these factors in more detail.

Our backend is built on serverless functions. We split each video into individual chunks which are processed in parallel and passed through multiple layers of filters to determine which chunks are “important”. We’re able to algorithmically ignore parts of video which are static, or change minimally, and focus on the parts that contain real action. We then run more expensive models on the most “important” parts of video, and interpolate results across frames to return information to customers at 30 FPS granularity. Our customers simply push signed video URLs to our platform, and this happens automatically. You can then use our API to query for intervals of interest.

We haven’t built an automated sign up flow yet because we're focused on building out the core product for now. But we wanted to give all of you the chance to try Sieve on your own videos for free, so we've set up a special process for HN users. Try it out here: https://sieve-data.notion.site/Trying-Sieve-s-Video-Search-4.... We'll email you a personal, limited-access API key.

Here's a video demo of using our dashboard to do video search: https://www.youtube.com/watch?v=_uyjp_HGZl4

We’d love to hear what you think about the product and vision, and ideas on how we can improve it. Thanks for taking the time to read this, we’re grateful to be posting here :)



Neat project, you may want to consider adding audio processing too (eg sound detected) as part of the video.

You could go deeper and compare samples of audio that could be uploaded separately (eg, siren sounds), check out MFCC processing https://en.wikipedia.org/wiki/Mel-frequency_cepstrum#Applica... to do Shazam-style audio comparison.

I wonder too if you process it on a per-frame basis or you can take series of frames too (eg analyze the last 5 seconds of frames) to detect things like a "hand wave".


Thanks for the comment! Audio is definitely an interesting angle. We haven't spoken to a lot of folks yet that want that feature, but I'd guess some productivity tools that let you rewatch Zoom calls, baby monitoring, media playback, or a few other things like that might mind audio useful. We'll explore it for sure.

We currently process on the per-frame basis but have the optionality to increase the window size of 1, to any arbitrary size like 300 frames. This is what allows us to currently detect "actions" for some of our customers. Of course, it's not the same as a sliding window but it's something we're considering.


Bingo. In enterprise environments audio is vastly more valuable than video for searching internal meeting and discussion recordings.

Often a recording title is not enough.


Hey, that's a really nice product. I love that it's API first? Would you be interested in adding it to our API hub? We're currently in private beta and would love to have your API on it. Why? It's the best developer experience to integrate APIs and we'd love to send our users your way. Here's a link if you're interested: https://hub.wundergraph.com/


Thanks! The API hub seems interesting. Want to send me an email (see bio)?


Sure!


This is really cool!

I have a few questions

1. Is there really no other competitor or company that tried to tackle this problem in the past? It feels like a really common usecase and someone must've done something about it!

2. Do you have a fixed set of words that the user should use to query? I'm an AI researcher/practitioner who worked in this area. It's super difficult to search for tail objects in images/text.

3. Why API first?


Thanks for the thoughtful questions!

1. People have definitely tried this in the past in terms of having similar tech, but I don't think anyone has sanely approached it at the same angle. We're seeing so many more companies like EquipmentShare (https://www.equipmentshare.com/), UpKeep (https://www.upkeep.com/), Spot AI (https://www.spot.ai/) and others which are building core software for industries without necessarily having tons of video expertise in-house. Also other video-first companies like Gong (https://www.gong.io/), Mux (https://mux.com/), and Loom (https://www.loom.com/) are building platforms that might benefit from visual analysis at scale. Things like serverless functions were also less cool before.

2. There's a bunch of ways to search using Sieve. Some are fixed variables, others are visual similarity based. See here: https://sievedata.com/faq

3. Our target is the full stack developer within companies building cloud-first software for specific industries. Easy-to-use APIs are key because of this. They are the ones with the expertise to package it into a product that really works for the end user, and we're the ones with computer-vision and video expertise.


> Is there really no other competitor or company that tried to tackle this problem in the past?

Microsoft and Google already have very mature offerings in this space.

https://azure.microsoft.com/en-us/services/media-services/#g...


It's a good point. While Microsoft and Google offer video processing / analysis, none of them are API-first in the same way we are.

1. We process videos, and can store their metadata / embeddings in perpetuity. This means a simple query API, don't worry about storing a DB of metadata yourself.

2. "Reverse Image Search" - Say a user cares about something specific, like a blue box that's toppled over. Using high-dimensional representations of certain frames in video, we can surface every time this scenario shows up in the future, based on a user having marked it as "interesting" in the past.

3. Our "killer" app is the video processing infra that's super modular. We give this flexibility to the full-stack dev to plug in their own models, or use ours.

A final thing to note is that we're going after a completely different set of companies, specifically companies focused on building software for a specific industry (i.e. https://www.equipmentshare.com/) that eventually want to add video analytics as an offering. It's also unclear who Azure or GCP are targeting though it seems like it's for general media content.


Thanks for the detailed response. I wish you all the luck in your journey.

> 2. "Reverse Image Search" - Say a user cares about something specific, like a blue box that's toppled over. Using high-dimensional representations of certain frames in video, we can surface every time this scenario shows up in the future, based on a user having marked it as "interesting" in the past.

I believe a lot of retail/consumer companies would be interested just in that. Example use case - restaurants trying to find repeat customers or gauging their mood before/after the meal.


Like the concept, but especially for live feeds, seems expensive? at $3456/month/camera ? Is this not your use-case (it's the first one on your use-cases page)? Or am I missing something? Congrats on the launch!


Thanks!

Most use-cases have limitations in bandwidth, so not all data is streamed up to the cloud. Typically, people will run simple motion detectors on the physical camera device itself, and save content in the cloud that they think might be valuable. Our service can then be used to make that content searchable, past simple motion detection.

There are other use-cases where people store tons of data in the cloud already. Media content is an example of this. In this case, companies might want to analyze all the raw data for which we then charge only for intervals of video where there's real action. If the video is static, we won't charge the full per minute pricing.


This is cool. Something tangential I've always wanted from video apps and APIs is the ability to highlight any video online. To just bracket cool clips in long videos and then be able to view and later edit those highlights together rapidly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: