← back

a screenshot of the CLIP Open AI website

CLIP is trained on Imagenet, a dataset of more than 14 million images (smaller than LAION-5B) scraped from the Internet, labelled by Amazon Mechanical Turk crowdworkers and organised through 21,000 categories. The categories of ImageNet were created making use of WordNet, a lexical dataset of English nouns, verbs, adjectives and adverbs grouped into sets expressing distinct concepts. Problems of the inherited categories to describe abstract concepts –for example ‘fear’ or ‘happiness’– and its integration of racial and gender bias have been publicly criticised and debated.

a screenshot of imagenet and its categories
Here is an interesting project about the bias of Imagenet & Wordnet.

And some examples of Imagenet categories:
Let's try a quick writing exercise about the relationship between text & image. Here is an image from LAION (don't peak at the alt text!): → Caption Writing Exercise
But let's take a look at CLIP and use it to test our own image descriptions.
→ Test your image descriptions with CLIP


← back