Video Object Detection is a very interesting problem that could help a lot of people. I found out about it talking to a shark researcher (maybe not his exact title). They have grad students counting sharks in a video from an underwater camera. These videos can be very long and sometimes there are no sharks in hours. I thought about Machine Learning instantly, what could go wrong. I started reading about it and found different approaches.
- Auto ML Solution (Google, MSFT…): I used these solutions in the past with images, with good results. The con is that these services do not provide video support, at least I was not able to find it.
- Tensorflow: I watched a ton of videos of examples of the Object Detection API. Be careful with the videos, search for recent ones the version changes can make very hard to follow the tutorial. I had some trouble trying to train the model with my own images. It might have been a combination of the documentation, my package management and maybe luck. I ended looking for another way.
- Tensorflow Object Counting API: I found this repository. It has great examples and it’s built on top of Tensorflow. I still had some problems training my own images. My only comment would be that this API still lacks the abstraction I wanted to see on an API, at least for the training part.
- Detecto: I found about this repository and the first thing I noticed was that it promised the abstraction I was looking for. I managed to train with my own images, all the different examples are ~5 lines of code. You don’t need to understand about Pytorch in order to use it. I was able to run it on a Google Collab, the free GPU’s made the training process faster. At some point going to the Tensorflow could make sense, but to start I recommend Detecto.
I need to feed more images to the model but here are is an example of the results: