← back

a screenshot of different images from the LAION database

LAION-5B contains over 5 billion image-text pairs. I have downloaded a small subset of their 'aesthetically pleasing' subset. To produce an aesthetic score, LAION creators used a model they trained on the Simulcra Aesthetic Captions dataset, a dataset of AI generated images & their prompts that users rated on a scale of 1-10. The images that I downloaded also have similarity scores (from the CLIP filtering) and "unsafe" scores. The Unsafe Score comes from yet another AI model trained to detect "NSFW" images. The automation of NSFW ratings is something that my friend Livia Foldes has been thinking a lot about in relation to sex workers rights with her group Decoding Stigma. As you can imagine, there is a lot of problems with automatic detection of NSWF images from the use of nonconsensual imagery in training data to false positives.

But let's take a look at this subset that I've downloaded in order of the NSFW/unsafe score...
→ See Sample of LAION dataset sorted by unsafe score

Of course, aesthetic is also an interesting thing to try to quantify...
→ See Sample of LAION dataset sorted by aesthetic score

Spend some time exploring this subset. For reference, I have downloaded 1454 images which is only about 0.00002908% of the full dataset.

← back