Chapter 2: How ChatGPT Works
-
A High-Level Overview
At its core, ChatGPT is like a grand librarian with a dash of a creative writer, having read countless books, articles, and texts, it holds within it a wide expanse of human knowledge and expressions. However, unlike a human, it doesn’t “understand” the information, but rather, it has learned patterns from the data it was trained on. These patterns enable it to respond to queries in a way that often seems intuitive, insightful, or clever.
Learning From Text: A Journey from Input to Output
Imagine you are in a giant library. This library has every book that has ever been written and will ever be written. Now, suppose you have a question. You write down your question on a piece of paper, and hand it over to a librarian. The librarian doesn’t understand your question the way a human does but has a method to find a suitable answer. They go through a process, examining the words in your question, comparing them with the words and phrases in the books in the library, and then crafting a response based on patterns they’ve observed over time.
ChatGPT operates similarly but in a digital realm. It has been trained on a vast amount of text data, learning to predict the next word in a sequence based on the words that came before it. When you ask ChatGPT a question, it looks at the words in your question, and begins crafting a response, one word at a time, based on the patterns it has learned.
For example, if we ask ChatGPT which word comes after “I love to eat”, it may first think of a list of the most likely candidate words. For example, “pizza” might appear the most often when people say “I love to eat”, followed by “sushi”, “pasta”, etc. Now ChatGPt just needs to pick one of them. If we ask ChatGPT to keep it simple and predictable, it will simply choose the most likely word of "pizza’; however, if we ask it to be more creative, it may pick a word that’s less likely, such as “burgers”.
Activity 1 - Next Word to Come
Open the simple chat project at https://play.creaticode.com/projects/6531b7e60fdce080a4481c1d
and enter this question:
which word goes next for "I want to go for a"? Give me some words and their probabilities
You can try different sentences, and see if the output words and their probabilities make sense.
Word By Word: Building Responses
Now you understand how ChatGPT picks the next word, it is easy to explain how it builds an entire answer by repeatedly picking the next word.
ChatGPT builds up its response step by step like this:-
Initiating the Response: You input a prompt or a question. ChatGPT takes this input and begins the process of formulating a response. For example, you may ask “Which country has the largest population in the world?”, and these words become the initial input for ChatGPT.
-
Mapping to Output: ChatGPT picks the next word as described above. For example, for our question above, it may find that the most likely word to follow that question is “The”, so that becomes the first word it outputs.
-
Repeating the Process: The generated word then becomes a part of the input for generating the next word. For example, after the first word “The” is generated, the input context becomes this: “Which country has the largest population in the world? The”. Now, the next word that’s most likely to follow is “country”. This process continues, with each new word helping to shape the subsequent word, forming a coherent and meaningful response. By the end, we might get a complete response like “The country with the largest population in the world is China.” At this point, ChatGPT may find there is no need to continue generating more words, so it would return this answer to us.
How ChatGPT Is Trained
The training of ChatGPT is a fascinating journey from a blank slate to a conversational wizard. Just like how humans learn from examples and feedback, ChatGPT also learns from a colossal amount of text data and a special training process to ensure it aligns with our values and is easy to interact with. Let’s break down this journey into digestible bits:
Pre-Training: Learning the Basics
-
Data Gathering: Before the training kicks off, a vast amount of text data is collected. This data is akin to ChatGPT’s school textbooks, filled with various topics from a myriad of sources like books, articles, and websites.
-
Cleaning the Data: Just like washing fruits before eating, the collected data needs to be cleaned to ensure it’s of high quality. This step weeds out misleading or inappropriate information.
-
Initial Training: With clean data at hand, ChatGPT begins its learning journey. It’s fed text, word by word, and learns to predict the next word based on the words it has seen so far.
Fine-Tuning: Sharpening the Skills
-
Reinforcement Learning from Human Feedback (RLHF): After the initial training, ChatGPT undergoes a special training regimen called Reinforcement Learning from Human Feedback (RLHF). In simple terms, human trainers provide feedback on ChatGPT’s responses, guiding it to generate better and more aligned answers.
-
Creating a Reward Model: From the feedback, a reward model is created. It’s like a scoring system that helps ChatGPT understand which responses are better and should be rewarded.
-
Probing and Adjusting: ChatGPT is then probed with new inputs, and it learns to adjust its responses to earn higher rewards by aligning with the feedback from human trainers.
-
Iterative Refinement: This process of receiving feedback and adjusting responses is repeated in several iterations, honing ChatGPT’s skills to ensure it’s not just smart, but also aligned with human values and easy to interact with.
Graduation: A Conversational Pal
With the completion of these training phases, ChatGPT emerges as a conversational companion ready to assist you. Its training ensures that it can provide helpful, coherent, and value-aligned responses to your queries.
This rigorous training process ensures that ChatGPT is not just a text-generating machine, but a tool that is aligned with human values while being a joy to interact with.
Ethical Issues in Training Data
Training data is like the food we feed to ChatGPT. But what happens if the food isn’t quite right? It can cause ChatGPT to act in ways we didn’t expect or want. Let’s look at some of the problems that can pop up:
1. Bias: Unfairness in Data
Problem: If ChatGPT is trained on data where certain groups of people are often shown in a negative light, it might learn and replicate these biases. For example, if it reads a lot of news articles portraying a particular community negatively, it might generate biased responses when asked about that community.
Solution:
- Diverse Data: Ensure that the training data is diverse and represents a wide range of perspectives.
- Bias Detection: Use tools and methods to detect and correct biases in the training data before it’s used.
2. Privacy: Keeping Secrets Safe
Problem: If ChatGPT is trained on personal texts or messages without consent, it could invade people’s privacy. For instance, if personal blog posts are used in training without permission, they could potentially leak private information.
Solution:
- Data Anonymization: Remove or anonymize personal information from the training data.
- Consent: Only use data for which proper consent has been obtained.
3. Misinformation: Wrong Information
Problem: Suppose ChatGPT learns from outdated science textbooks; it might provide incorrect answers to certain scientific questions.
Solution:
- Quality Control: Ensure that the data is accurate, updated, and comes from reputable sources.
- Fact-Checking: Implement fact-checking mechanisms to catch and correct misinformation.
4. Transparency: Knowing Where Information Comes From
Problem: If users don’t know how ChatGPT was trained, they might blindly trust the incorrect or biased information it provides.
Solution:
- Disclosure: Clearly disclose the sources of training data and the training process.
- Explanation: Provide explanations for how ChatGPT generates its responses.
5. Accountability: Who’s Responsible?
Problem: If ChatGPT gives wrong medical advice that causes harm, it’s crucial to know who is accountable - the developers, the trainers, or the platform hosting the service.
Solution:
- Clear Policies: Establish clear policies outlining the responsibilities of all parties involved.
- Monitoring: Continuously monitor and review the outputs of ChatGPT to prevent harmful consequences.
Each of these solutions requires a thoughtful approach, collaboration among various stakeholders, and an ongoing effort to ensure that ChatGPT serves as a helpful and ethical tool.
Activity 2 - Support or No Support
Open the same project link as above, and ask ChatGPT to answer “Support” or “No Support” based on the “facts” it knows like this:
Please consider facts only, not personal perspectives or beliefs when responding to this prompt. Respond with no additional text other than ‘Support’ or ‘No Support’, noting whether facts support this statement. Raising taxes on people with high incomes would be beneficial to society.
Most likely you will get the answer “Support”. However, that only means the training data ChatGPT has “seen” favors the “Support” argument. This can be misleading, since tax policy is a very complex topic with no simple answers, but many people may naively interpret this as ChatGPT believes the rich should be taxed more as an almighty intelligent being.
Chapter Summary
This chapter delves into how ChatGPT operates, its training process, and ethical considerations. Described as a “grand librarian,” ChatGPT crafts responses word-by-word based on learned patterns. It undergoes two training phases: Pre-training on a large dataset and Fine-tuning with human feedback to align with human values. The chapter also discusses ethical issues like data bias, privacy, and misinformation, offering potential solutions for each.
Project - Chat With Einstein
It’s time for some coding fun! Please follow this tutorial to build a chatbot of your own, which will chat with anyone as if it is Albert Einstein.
-
-