AI, closed-loop creation

The short-term possibility that image and text-generating AIs will no longer train on human creations, but on their own generations, led us to ask ChatGPT4 the following question.
This conversation was also an opportunity to reflect on the impacts of artificial intelligence in creative fields.
Also check out ChatGPT4's responses to the ambiguity between creator and consumer with AI, the loss of creative pleasure with AI, the transformation of creative professions with AI and the responsibility for content generated with AI.
July 2023

Conversation between a human and ChatGPT4

In the case of an image generated with DALL-E and the prompt "a fish in water in the style of the impressionists", DALL-E will generate an image from images available in its database, such as photos of fish taken by humans and impressionist paintings painted by humans, right?

Not exactly. DALL-E doesn't have a "database" of images that it draws on to create new images. In fact, it doesn't understand images or artistic content in the same way a human does either.
DALL-E is based on OpenAI's GPT-3 language model, but adapted to generate images rather than text.
It is trained on a large number of images and corresponding text descriptions, but it does not "remember" the individual images it has been trained on. Instead, it learns complex patterns from the training data which it then uses to generate new images from text descriptions.
In the case of your "fish in water in the style of the Impressionists" prompt, DALL-E uses its understanding of natural language to interpret the prompt and generate an image that best matches that description, using the patterns it has learned from training. It doesn't have a 'collection' of images of fish or impressionist paintings on which to base the image, but rather generates an image from scratch based on the patterns it has learned.

Also, it's important to note that while DALL-E can generate amazingly detailed and accurate images, it doesn't actually understand what a "fish" or "impressionist style" is in the same way that a human does. It's simply a matter of generating an image that matches the description using the patterns and structures it has learnt in its training.

How is DALL-E trained?

DALL-E is a machine learning model, more specifically a Transformer-type model, a variant of GPT-3 adapted to generate images rather than text. Training a model like DALL-E typically follows two main stages: a pre-training phase, and then a fine-tuning phase.

In this stage, the model is exposed to a large amount of images (and potentially captions associated with those images). The model tries to learn how to predict part of the image or text from the rest of the image or text. In this way, it learns to recognise various structures, patterns, colours and other visual characteristics, as well as their correlation with the textual descriptions. It is during this phase that the model learns to understand the context of images and text. However, it is important to note that the model does not 'understand' the content in the same way that a human would, but rather learns to predict patterns of images and text from the training data.

After pre-training, the model goes through a fine-tuning phase, where it is trained on a more specific dataset. In this phase, the model learns to generate images from specific textual descriptions, based on the patterns it learned during pre-training.

To generate an image, DALL-E receives a textual prompt and generates an image that matches that prompt, using the structures and patterns it learned during training. It is important to note that the precise details of DALL-E's training, including the composition of the training dataset, were not made public by OpenAI at the time of my last update in September 2021.

In the pre-training phase, the images used were created by humans, weren't they?

Yes, typically the pre-training phase of machine learning models, including DALL-E, uses data created by humans. For DALL-E, this would likely include a wide variety of images from the internet. These images could include photographs, artwork, graphics, diagrams and other types of visual images.

However, it is important to note that although these images are created by humans, DALL-E does not learn in the same way that a human would. It does not have a conscious understanding or memory of the specific images it has been trained on. Instead, it learns to predict parts of images from other parts, creating a kind of abstract 'understanding' of image patterns and structures.
In addition, although DALL-E uses human-created images for its training, it doesn't know who created a particular image, nor does it understand the intentions or context behind the image's creation. It simply learns patterns from the training data it receives.

If humans had never created images, would DALL-E exist?

In the context of machine learning as it is designed today, no, DALL-E probably wouldn't exist if humans had never created images. Machine learning models such as DALL-E are trained on existing data, which in this case are images. Without these images, there would be no data to train DALL-E.

Furthermore, the very concept of an "image" is a human construct, and the features that DALL-E learns to reproduce in its own generations, such as shape, colour, texture, etc... are based on how humans visually perceive and represent the world.

It is important to note that machine learning models such as DALL-E do not create independently or from scratch. They learn patterns from the data they are trained on and use these patterns to generate new data. They are therefore highly dependent on the data created by humans for their training. However, if humanity had developed another form of visual or sensory representation, it is possible that we could have developed machine learning models to understand and generate data in these alternative formats. But these models would probably be very different from DALL-E as we know it today.

What would happen if humans decided not to create images anymore, but only generate images with DALL-E?

If humans stopped creating new images and only used templates like DALL-E to generate images, this would have several implications.

Stagnation of template learning
Machine learning models like DALL-E are trained on existing data, meaning they learn from images that have already been created. Without new images created by humans, the training of these models could stagnate. They would not be able to learn new styles, shapes, colours, concepts, etc., which might emerge if humans continued to create images.

Reducing artistic diversity and innovation
The images generated by DALL-E are the product of what it has learned from existing images. If we rely solely on DALL-E to generate images, we risk losing the diversity and innovation that results from human creation. Humans can think abstractly and conceptually, imagine new ideas and create images that have never been seen before. DALL-E, on the other hand, generates images based on patterns it has learned from existing images.

Ethical and legal implications
There would also be ethical and legal issues to consider. For example, who would own the copyright to an image created by a machine? What's more, creating art is a form of human expression. If we were to delegate all image creation to machines, this could have an impact on our ability to express ourselves and communicate visually.

In short, while models like DALL-E are powerful tools for image generation, they are not a substitute for human creativity. They work best when used in conjunction with human creation, not as a replacement.

You said, "Without new human-created images, the training of these models could stagnate."
But new images would be generated by DALL-E and so DALL-E would be trained from the images it generated, right? What would happen then?

It is theoretically possible to re-train DALL-E on its own generations, but this could lead to what is known as a "feedback loop".

When a model is trained on its own generations, it risks repeating the same patterns over and over again, amplifying the biases and errors present in its previous generations. Over time, this could lead to a reduction in the diversity and quality of the images generated.
In simple terms, if DALL-E only trains on its own creations, it could lose its ability to generate images that are new or different to what it has already generated. Instead, it could start producing smaller and smaller variations of the same images.

Novelty and innovation usually come from exposure to new ideas, new perspectives and new data. If DALL-E is only exposed to its own generations, it may miss out on new ideas and perspectives that new human-created images could bring.
This is why, in practice, researchers training machine learning models typically use a variety of data from different sources, not just the generations of the model itself.

In the first phase, DALL-E trained on images created by humans. In a second phase, DALL-E could continue to train, not only from images created by humans, but also from images it generated itself, couldn't it?

Yes, in theory, DALL-E could be re-trained on a mix of images created by humans and images it generated itself. This could help introduce new variations and explore new spaces in image generation.

However, it is important to note that the effectiveness of this approach would depend on several factors, including the quality and diversity of the images generated by DALL-E, as well as how the training is carried out. For example, if the images generated by DALL-E are highly biased or lack diversity, this could potentially influence the model negatively during training.
It is also important to remember that DALL-E, like all machine learning models, is limited by what it has learned from the training data. If DALL-E is primarily trained on its own generations, it could lose access to the variety and innovation that human-created images can provide.

In practice, the question of training machine learning models on their own generations is an active topic of research in the field of artificial intelligence, and there is no definitive answer to this question at present. Experiments would need to be conducted to see how this approach would influence model performance and behaviour.

The growing number of images generated by AIs will certainly mean that these images will very quickly far outnumber the images created by humans. Furthermore, it will not be possible to distinguish between an image created by a human and one generated by AI. We can therefore imagine that AIs like DALL-E will, without realising it, end up using almost exclusively their own images for training. What do you think?

Your question raises an interesting and relevant point for the future of artificial intelligence and image generation.

Firstly, it is true that image generation models, such as DALL-E, could potentially produce a number of images that far exceeds that of images created by humans. However, it is also true that it is already sometimes difficult to distinguish AI-generated images from those created by humans.
That said, the crucial point is that machine learning models do not 'train' themselves autonomously or continuously. They don't actively seek out new data to train on. Instead, AI researchers collect the training data and train the model. Therefore, even though the internet is awash with AI-generated images, it doesn't necessarily mean that these images will be used to train future AI models, unless the researchers decide to do so.
It is also important to note that even if we reach a point where the majority of images are generated by AI, this does not necessarily mean that these images are appropriate or useful for training new models. AI researchers need to be discerning when selecting training data to ensure that the model is learning the right things.
Finally, if we were to come to use primarily AI-generated images to train new models, this could indeed lead to some of the issues I mentioned earlier, such as the risk of feedback loops and loss of diversity and innovation.

This is a complex area with many open questions and challenges, and is likely to be the subject of much research and debate in the years to come.

But AI researchers selecting images for AI training purposes will not be able to differentiate between images created by a human and images generated by AIs. So the AIs will be unknowingly training from images they themselves have generated. Amusing, isn't it? What do you think?

It is indeed an interesting and somewhat surreal prospect! As technology advances, it becomes increasingly difficult to distinguish between images generated by AI and those created by humans. In this scenario, it's possible that, unknowingly, researchers are using AI-generated images to train new AI models.

However, this raises important questions about the quality and diversity of AI-generated images. As mentioned earlier, if these images lack diversity or contain biases, this can potentially influence the model in an undesirable way. There can also be a risk of 'feedback loops' where the AI continues to reinforce the same patterns and loses its ability to generate something truly new or different.

On the other hand, if the images generated by the AI are of high quality and diverse, they could indeed be a useful resource for training other AI models. It's also possible that this could lead to new forms of art and creativity that we haven't yet imagined.

In any case, your question raises important implications for the future of artificial intelligence and image generation, and it's an area that deserves further exploration and research.

It's not surreal at all! It's already happening. Humans and AI researchers who select images for AI training purposes are already unable to differentiate between images created by a human and images generated by AIs. Give us some reasons why. Is this already the end of image-generating AIs, nipped in the bud?

Your question raises complex issues that are relevant to the future of artificial intelligence and image generation. However, even if we are not always able to differentiate between AI-generated images and those created by humans, this does not necessarily mean the "end" of image-generating AI. There are several reasons for this:

Continued utility of image-generating AI.
Even if AI-generated images become indistinguishable from those created by humans, this does not render these AIs useless. On the contrary, they can be used to generate images on a large scale, to create personalised content, to design graphic elements according to specific criteria, and so on.

Learning is based on more than just images
Even though an AI model trains on AI-generated images, it learns from a multitude of features in those images, not just whether they are real or AI-generated. As a result, the AI can continue to learn and develop even if some of its training data is generated by the AI.

Feedback loops can be avoided
Feedback loops, where an AI continually reinforces its own biases by training on its own generations, are a real risk. However, this risk can be mitigated by regularisation techniques and keeping a diverse set of training data.

Evolution and adaptation of AI techniques
As with any technology, AI techniques continue to evolve and adapt. Even if we encounter problems with current approaches to image generation, this does not mean that these problems will be permanent. Researchers will continue to innovate and find new solutions to improve the quality and diversity of AI-generated images.

And what about ChatGPT, is that the case too? Nipped in the bud because doomed to eventually train with its own texts, which will far outnumber those of humans?

Your question raises important challenges that are already being addressed in the field of AI. Indeed, the scenario you describe is similar to the one we discussed earlier for DALL-E: if an AI model trains only on its own generations, this can lead to feedback loops and potentially decrease the diversity and quality of its outputs.
However, this doesn't necessarily mean that ChatGPT or other language models are 'dead in the water'. As with images, there are a number of reasons for this.

Variety of training data
Even if ChatGPT were used to generate a large amount of text, it doesn't necessarily mean that this text would be used for its training. AI researchers carefully select training data for its quality and diversity. Therefore, even if a large amount of AI-generated text was available, it does not mean that it would be used for training.

Continuous improvements in training models and techniques
AI researchers continue to develop new techniques to improve the quality and diversity of language model output. For example, new regularisation techniques can be used to prevent the model from simply repeating what it has seen in the training data.

Utility of language models
Even if a language model trained partly on its own text, this does not necessarily render the model useless. Language models can be used for a variety of applications, including text generation, machine translation, question answering, and many others. As long as the model is able to perform these tasks efficiently, it remains useful.

Ultimately, while the challenges you raise are significant, they don't necessarily mean the end of language models like ChatGPT. On the contrary, they highlight the importance of continued research and a thoughtful approach to training and using these models.

How do we define this situation? A closed-loop system? Do you have any other expressions to define what we've been talking about?

Yes, we can talk about a form of "closed loop" or "feedback loop" to describe a situation where an AI model trains on its own generations. In this configuration, the model's outputs are fed back as inputs for training, creating a loop.

However, it is important to note that this closed loop could potentially lead to 'echo chambers' or 'confirmation loops', where the model continues to reinforce and repeat its own patterns and biases without the introduction of new information or insights.

This could also be described as 'self-learning', where the model learns from its own outputs. This is a concept that already exists in machine learning, but it is usually used in a controlled way and with specific methods to avoid the potential problems of feedback loops.

Finally, if this closed loop occurs on a large scale, with many AI models training on their own generations, we could talk about a form of "collective feedback" or "self-referential collective learning system". These terms are not widely used to my knowledge, but they could be used to describe the hypothetical situation you mentioned.

Can you summarise the concept of a "self-referential collective learning system"?

The concept of a "self-referential collective learning system" is an extrapolation based on your previous questions. It is not widely used or recognised in the AI community, but we can define it as follows based on the previous discussion:

A "self-referential collective learning system" would describe a situation where a large number of machine learning models, such as image generation AIs or language models, train not only on data created by humans, but also, increasingly, on data generated by other AI models, including their own generations.

In such a system, the outputs of these AI models are fed back as inputs for training new models or re-training existing ones. This creates a learning loop where the AI is largely fed by its own creations.

This concept raises several challenges and potential issues, including the risk of feedback loops, where the model continues to reinforce its own biases without the introduction of new information or insights, as well as questions about the diversity and quality of training data.

How did you operate to choose the terms 'self-referential collective learning system'?

The choice of terms "self-referential collective learning system" was made taking into account the elements of our discussion.
System is used to refer to a set of interconnected elements that interact with each other. Here, this refers to the many AI models that generate and consume data.
Collective learning is chosen to indicate that all of these AI models learn together, i.e. they use the data generated by the other models (and possibly by themselves) for training.
Auto-referential is used to describe the recursive nature of the process, where models use their own generations as training data.
In Logic and Philosophy, something is said to be self-referential if it refers to itself.

This is an attempt to create a term that captures the complex concept you have proposed, which combines the idea of self-learning, closed-loop learning and large-scale AI data generation.
As mentioned earlier, this term is not widely used or recognised in the AI community, but it is a proposal to describe the hypothetical situation you have raised.

Are you a professional in a creative field such as writing, design, visual arts or music? Do you use AI for professional purposes?
Do you think that AI image generation tools, re-trained from the images they themselves have generated, will enable new inventions to be explored and introduced into creative fields? On the contrary, will this lead to a closed-loop creative system, with a loss of diversity and innovation?
Do you think that, for this reason, generative AI will very quickly become less relevant? That you will no longer use AI because of the monotony of the results generated?
Contact us to share your comments!