Clarifai is a promising AI start-up. In a short(ish) time, it has made major progress on an important problem. And it is rapidly rolling out products with lots of business potential. But there are still some things that it could do.
As I understand it, the base version of Clarifai API is trying to do two things at once: a) learn various recognizable patterns in images b) rank the patterns based on ‘appropriateness’ and probab_true. I think Clarifai would have to split these two things over time and allow people to input what abstract dimensions are ‘appropriate’ for them. As the idiom goes, an image is a thousand words. In an image, there can be information about social class, race, and country, but also shapes, patterns, colors, perspective, depth, time of the day etc. And Clarifai should allow people to pick dimensions appropriate for the task. Though, defining dimensions would be hard. But that shouldn’t stymie the efforts. And ad hoc advances may be useful. For instance, one dimension could be abstract shapes and colors. Another could be the more ‘human’ dimension etc.
Extending the logic, Clarifai should support the building of abstract data science applications that solve a particular problem. For instance, say a user is only interested in learning about whether the photo features a man or a woman. And the user wants to build a Clarifai based classifier. (That person is me. Task is inferring gender of first names. See here.) Clarifai could in principle allow the user to train a classifier that uses all other information in the images, including jewelry, color, perspective, etc. and provide an out of sample error for that particular task. The crucial point is allowing users fuller access to what Clarifai can do and then letting users manage it to their ends. To that end again, input about user objectives needs to be built into the API. Basic hooks could be developed for classification and clustering inputs.
More generally, Clarifai should eventually support more user inputs and a greater variety of outputs. Limiting the product to tagging is a mistake.
There are three other general directions for Clarifai to go into. A product that automatically sections an image into multiple images and tags each section would be useful. This would allow, for instance, to count the number of women in a photo. Another direction to go would be to provide the ‘best’ set of tags that collectively describe a set of images. (It may seem like violating the spirit of what I note above but it needn’t — a user could want just this.) By the same token, Clarifai could build general purpose discrimination engines — a list of tags that distinguishes image(s) the best.
Beyond this, the obvious. Clarifai can also provide synonyms of tags to make tags easier to use. And it could allow users to specify if they want, say tags in ‘UK English’ etc.