Semantic Segmentation Support

September 21, 2021

Plain old Jersey barrier, not much fun.

Ice cream barrier, more fun!

Why semantic segmentation?

Last month we wrote on our custom electric scooter detector we created using the Stray CLI and only a handful of video clips. However, that detector was limited to only bounding box detection, so we decided to upgrade the system to also support object instance segmentation masks .

Many tools and services provide 2D bounding box annotation for a few cents per image, whereas segmenting items in images is considerably more expensive (try +$3 per image). The reason obtaining segmentation masks is much more expensive, is that you need to carefully define a polygon that encloses the entire object in the image, which might require quite a few clicks.

In this update, we show how you can easily create segmentation masks for thousands of images with a few clicks and train a custom semantic segmentation model.

Creating segmentation masks from 3D annotations

Similarly to the electric scooter detector, we start by collecting a dataset using the Stray Scanner app and importing it into the Stray Studio tool using the Stray Command Line Interface. Instead of electric scooters, this time we decided to collect data of Jersey barriers found in the city. We also found an ice cream variant of such a barrier, which we figured would be a fun addition to the dataset. We process the dataset into a 3D reconstruction and extract camera poses using the Stray Command Line Tool with the simple stray studio integrate command. The output is the trajectory of the camera and a triangle mesh representing the scene in 3D. After processing the dataset, we open the collected scenes in Stray Studio and add bounding boxes to the items we want to segment later in our detection model.

The ice cream barrier in Stray Studio

Once we are happy with the bounding box (hint: verifying this is very easy with the preview-command), we are good to go for the segmetation step! The segmentation mask is created by projecting the 3D mesh back onto the 2D images used to create the scene. We store the resulting masks as files in the scene folder for later use. Check out the documentation for more details.

Once the masks have been produced, we can visualize the end results:

The segmentation quality is quite good even though the meshes are not 100% perfect in the reconstruction.

We think that the semantic segmentation use case really highlights the benefits of 3D labeling. In exchange for a simple 3D bounding box, we get thousands of semantic segmentation masks in return, segmenting the object from different viewpoints.

Setting up training for a custom segmentation model

After we have produced the segmentation masks, we can kick off a segmentation training run. Similarly to the electric scooter detection, we choose the Detectron2 library for the detection task. The Stray Command Line Interface provides easy to use utilities for model definition and baking (i.e. fine tuning and training, check out the documentation for further details).

The model is configured to perform segmentation mask detection. The qualitative results shown above are after training for 500 000 iterations and evaluating on data not included in the training set. Even though the evaluation data is from a completely different context, the model generalizes well to unseen scenes. We also noticed that errors in the original labels get averaged out during training, hinting that perfect labels are not needed for a well performing model.

Using the tool yourself

We have made all the tools mentioned in this post available for everyone to use, free of charge for now. Install the tools by following our installation guide. Currently, we support macOS and Linux platforms.

Moving forward, we will be expanding our toolkit to make it even more powerful and versatile.

If you have a use case in mind, have improvement ideas or anything else in mind, we would very much like to hear from you!

Meanwhile, you should subscribe to our newsletter, to follow us as we develop a simple to use toolkit for solving computer vision problems.

Other Blog Posts