Recent advances in AI (artificial intelligence) systems have made them more powerful than previous systems. “Text-to-image” AI can convert a text prompt like “origami elephant” into a picture within a few minutes. Current systems include Midjourney, DALL-E, Craiyon and Stable Diffusion.
The elephant in the picture did not existed before: the creation is based on learning from millions of images and textual descriptions. In this case, the AI system has learned what “origami” and “elephant” are associated with and tried to create a picture that satisfies these associations.
The more the subject appears in the training data, the better the results are likely to be. The results can be plausible or even convincing and impressive. For example, the wrinkles in the paper of the elephant replicate unintended distortion of paper. The slight asymmetry of the legs is pleasing, giving the appearance of walking. However, the difference in the lengths of the tusks feels too great. Papercrafts other than origami can work well, e.g. quilling, papercutting and silhouettes. A strength of AI not shown here is its facility with different media, e.g. oil painting, pen and ink, charcoal, etc.
Some of the origami that AI generates can be hideous and grotesque, revealing the AI’s biases and lack of training. Even with common subjects like humans, AI systems can make strange results like distorted eyes or hands. These problems are likely to be solved as systems improve.
Beyond being a fun toy to play with, what are the implications of text to image AI systems? Will they be a niche technology that is important but only for a limited number of people (like 3D printing and the blockchain) or common and widely used (like social media and streaming media)?
Predicting the future is a fool’s game, but we should try to plan and manage for the negative effects of technology. This may have reduced the harm from social media, i.e. fake news and polarisation. Current text to image AI may reduce the work done by professional artists and illustrators. Instead of briefing and paying an illustrator to create an image, a few minutes with an AI system may be enough to create an acceptable image.
Some argue that the AI is a tool like photo editing software, but AI is far more powerful and needs less skill to operate. More realistic is the comparison with photography affecting painters, or perhaps the motor car and horse-drawn transport: the old industry became obsolete except for some niche uses. If photography moved the focus of painting away from realism (ushering in Impressionism, Cubism and Surrealism) then perhaps AI will move the focus of image-making away from making images to curating them and collaborating with AI.
Some of the people using AI text to image systems would have never commissioned a human, e.g. they are creating a cover for a self-published book or making 50 characters for self-designed game. This is like using AI translation on a casual basis: translating a snippet of text that one would never commission a human to do. However, the issue may become more serious for illustrators as the AI improves.
There is an ethical issue in how some of the images in the training database are by living artists. Style cannot usually be copyrighted, so will some artists see their work dry up as AI make far more images than they could? On the other hand, if the AI can so easily make images in the same style, does that mean the original artist’s work lacks originality and a breadth of ideas?