← back
LAION-5B contains over 5 billion image-text pairs. I have downloaded a small subset of their
'aesthetically pleasing' subset. To produce an aesthetic score, LAION creators used a model they trained on the
Simulcra Aesthetic Captions dataset, a dataset of AI generated images & their prompts that users rated on a scale of 1-10.
The images that I downloaded also have similarity scores (from the CLIP filtering) and "unsafe" scores. The Unsafe Score comes from yet
another AI model trained to detect "NSFW" images. The automation of NSFW ratings is something that my friend Livia Foldes has been thinking a lot about in relation to sex workers rights with her group
Decoding Stigma. As you can imagine, there is a lot of problems with automatic detection of NSWF images from the use of nonconsensual imagery in training data to false positives.
But let's take a look at this subset that I've downloaded in order of the NSFW/unsafe score...
→ See Sample of LAION dataset sorted by unsafe score
Of course, aesthetic is also an interesting thing to try to quantify...
→ See Sample of LAION dataset sorted by aesthetic score
Spend some time exploring this subset. For reference, I have downloaded 1454 images which is only about 0.00002908% of the full dataset.
← back