Machine learning enhanced image recognition & human automation
Fotolia by Adobe is Europe's leading online stock image company. It currently offers over 57 million high-res, royalty-free images and videos available to license.
Problem / Opportunity
Fotolia has a vast, ever expanding picture catalogue. To make these pictures discoverable to users requires very detailed categorisation along many dimensions. In this instance, they needed pictures of people categorised along with the number of people in picture, ethnicity, gender and the age group of each person. Fotolia used to employ a dedicated workforce to enrich their pictures with this highly specialised data. The workforce could not keep up with new image influx nor was it run cost effectively. This is when they approached us.
We teamed up their search and data owners with our machine learning and image recognition experts to form the innovation task force. We sketched out the integration story, suitable outcome data format and ideal cost per categorised picture so that the solution would pay for itself in a short time.
We started modelling the decision tree each picture would have to pass to be properly categorised. We ran several tests through a variety of image recognition systems. The results we gathered made it clear that we would also need to utilise machine learning techniques in order to increase the accuracy and improve results over time. Essentially the output from the image recognition systems was often ambiguous. However, there were observable patterns in these ambiguities. Using this insight, we assembled a very large set of training data which had been pre-categorised by Mechanical Turk. These images were then run through image recognition and we were able to build a machine learning model which plotted the patterns in the ambiguous image recognition data and their relationship to the correct categorisations produced by mTurk.
The entire system was exposed via APIs enabling new pictures to flow from the Fotolia website through the machine learning categorisation and be passed back to the website in a matter of seconds.
Fotolia used the system to categorise 250k pictures in the first month, for just under 6 cents per picture. This means they saved nearly 90% of the cost for picture categorisation. With our image pre-processing and dynamic quality control, we were able to lower the categorisation error rate by 20%.