BLOG Computer Vision

Application of Machine Learning in Data Labeling

Akshay Lal
March 24, 2020

The artificial intelligence market is forecasting steady growth; Gartner predicts: By 2021, almost 80% of emerging technologies will have AI foundations. With this radical shift to AI, the Global Datasphere will expand from 33 zettabytes in 2018 to 175 zettabytes by 2025. Data accessibility is a definitive advantage for developing cutting-edge autonomous systems and ML models. But the paucity of accurate, labeled data will substantially slow down AI advancement.  


Enterprises heavily rely on human intelligence to acquire training datasets for ML models. This 100% manual process is time-consuming, labor-intensive, and the quality of annotations is influenced by various factors like tool capabilities, workforce effectiveness, tool UI, data complexity, number of classes of annotations required, etc. I believe automation can help overcome these barriers of quality and accuracy.


Automation: The Antidote for Overcoming Labeling Inefficiencies


Automation has always been the answer to by-passing the inefficiencies involved in 100% manual operations. Therefore, when it comes to data labeling, with semi-to-sometimes-fully automated tools, acquiring large swathes of diverse, high-quality ground truth datasets can be executed faster, at reduced costs, and with improved accuracies.


When we first built annotation tools for data labeling, they were completely manual. The annotators would draw boxes after boxes and points after points without any feedback or assistance from the tool. Manually drawing and annotating a single box hardly takes any time, but when millions of boxes enter the equation, the time and effort involved add up very quickly. 

We recognized that automation is indispensable for executing large scale data labeling efficiently when the batches requiring annotations were piling up and resources were being over-consumed to make complex annotations with high accuracy.  


ML Proposals for Faster and Accurate Labeling  


At Playment, we have developed semi-automated and highly-interactive annotation tools that enable faster and more accurate labeling with lesser clicks and ML-assistance for checking the quality of the annotations. Our proprietary labeling models are developed based on the state-of-the-art machine learning architectures and are trained on a variety of datasets. 

Generic models trained on common object datasets can be used in a variety of computer vision scenarios like object detection, object tracking, 3D-object detection, semantic segmentation, etc. Specific models trained on autonomous vehicle datasets are used for AV-related use cases. Our annotation tools allow annotators to view model proposals, play around with thresholds and select the proposals which are accurate and either reject or edit the annotations. 


Interpolation

Apart from ML proposals, we also use interpolation methods to label objects in a sequence. With the interpolation feature, the annotator will be required to label every second to the fifth frame in a sequence, instead of labeling the same object across each frame. This drastically reduces the time taken to label videos and sensor fusion sequences. 

Interactive Instance Segmentation

Semantic segmentation can be executed in mere clicks. By marking the extreme points of an object, the tool automatically generates a semantic mask. This speeds up the segmentation of objects by 10x. 

One-Click Cuboids

Cuboids in 3D point clouds can be drawn with just a click. When an annotator clicks on a cluster of points, the pre-trained model automatically identifies the best fitting cuboid. This reduces the time taken to execute 3D point cloud annotations by 25%. 


Advantages of ML-Assisted Annotation Tools 

Shorter Annotation Timelines: With ML-assisted annotation tools, data labeling speed can be increased up to almost 40 - 60%, vastly reducing the project timelines for companies building complex ML models. 

Lower Annotation Costs: With pre-trained models, the annotators spend far less time executing annotations because the models eliminate any unnecessary labeling tasks and can focus their efforts on labels that display a low confidence score. This saves time, which in turn reduces the annotator costs involved in data labeling. 

Higher Labeling Accuracies: The time saved by the automation can now be spent by human labelers to fix any errors that might have cropped up. This helps in improving the accuracy of the labels. The human annotators further help improve our models by marking inaccurate annotations that can be used to retain the models.   

Scope For Scalability: With managed time, cost, and quality, companies can scale their AI projects from tens to thousands to millions of data-points without any added hassles.

If you are looking to evaluate ML-assisted tools or would like to have a chat with experts in the field, feel free to reach out at hello@playment.io