Industrial Webinar "Human and Machine Learning — Collaboration in Data Labelling"
Evgeny Sorokin, ML-engineer at Yandex.Toloka, conducted an industrial webinar dedicated to crowdsourcing technologies in big data labelling.
Data is one of the three pillars of artificial intelligence (the two others are algorithms and hardware). On the one hand, there is plenty of data everywhere, for example: social media messages or bank transactions. On the other hand, for machine learning data must be labelled.
There are several solutions to this problem: labelling within the company, outsourcing, synthetic data and crowdsourcing. Labelling within the company guarantees predictable results. The drawback is that it is time-consuming and non-scalable. Outsourcing allows not to distract data engineers from their tasks but it might be expensive and slow with unpredictable results. Synthetic data is generated with required parameters and saves time and money. However often synthetic data ends up worse than real one.
Crowdsourcing offers scalable solutions within short period of time. But it still requires quality control which is provided with the help of machine learning.
To learn more watch the video: