Launching 3D Studio
Our latest tool, Label Studio, allows teams working with computer vision to rapidly annotate large amounts of image data.
In spatial AI applications such as robotics and augmented reality, cameras move through a space and information needs to be inferred about the state of the scene. This can be things such as the position of objects, semantic segmentation (what pixel corresponds to what object), detecting planes, completing depth information or detecting specific points in the scene.
Most algorithms today solve these problems through machine learning. A large dataset is created with input and output examples, and a model is fitted to predict the outputs from the inputs.
This requires large datasets to do accurately. Typically these datasets are built up by taking image frames and outsourcing them to a large amount of workers who annotate the images one-by-one.
In spatial AI applications, image frames are highly correlated and have a specific relation to each other — they view the same scene, but from different viewpoints.
Annotate large amounts of data with a few clicks
Label Studio takes image frames from your robot or app and stitches them together to build a global 3D reconstruction of the scene. In the process, we build up a graph of camera poses, which allows us to infer the exact pose of the camera as it moves through the scene. This allows us to very quickly annotate large amounts of image frames by labeling the scene once using a 3D graphical user interface, and projecting the annotated 3D labels to all the images in the dataset.
If one scan of your scene contains 3600 images (60 seconds of video at 60 frames per second), you can label all 3600 images by adding your annotations with just a few clicks in Label Studio, and generating labeled image frames containing 2D input and output examples. You can repeat this for each of the scenes you care about, and very quickly get to the hundreds of thousands of examples required to train modern computer vision algorithms.
What follows, is a demonstration of what Label Studio can do.
Label Studio, takes as input RGB-D image frames. These are image frames with a corresponding depth map, as captured by depth cameras, such as the Intel RealSense, LiDAR enabled iOS devices or the Azure Kinect.
For the purposes of this demo, we used an iPhone 12 Pro and our Stray Scanner app.
On the left, we see the depth output from the iPhone 12 Pro. On the right the color images.
We save images from a scene in a folder, structured as follows:
In this case, we want to add 3D bounding boxes that encompasses the bottle in our scene.
Generating a Labeled Dataset
In this post, we showed how Label Studio can be used to quickly create datasets for spatial AI applications.
Currently, Label Studio supports keypoint, 3d and 2d bounding box annotations types. Going forward, we will be adding other annotation types, including semantic segmentation, depth completion, 6D object poses and optical flow among others.
If you would like to try out Label Studio for your application, reach out to us here.