Group Details Private

china users

paying users from China

  • RE: Creating Classrooms

    @jcumberbatch-4c739bc4

    Hi there, you don’t have to have the paid version to create classes. Please follow this tutorial:

    https://www.forum.creaticode.com/topic/547/teacher-only-how-to-manage-students-in-the-my-class-page

    If you have any problem following those steps, please let us know, and we will be happy to help you out.

    CreatiCode

    posted in Help
  • RE: Project Crashing

    @tyller_

    It looks like this project was only saved partially so it couldn’t be correctly p[censored]d. We’ll investigate how to recover it. Sorry about that.

    CreatiCode

    posted in Help
  • RE: login issues

    @dawnmercurio-gmail

    We have streamlined the Google Login interface, and it will also help avoid the automatic logout that happens sometimes. Please test to see if that solves the issue for your account.

    Thanks
    CreatiCode

    posted in Help
  • RE: login issues

    @dawnmercurio-gmail

    Hi Dawn, sorry about the issue. We’ll contact you separately to discuss how to resolve it.

    CreatiCode

    posted in Help
  • RE: Crediting this website.

    @106809-mygccs

    Hi, thanks for checking with us. In general, you can only use the assets within the CreatiCode platform. If you have a specific asset in mind, please let us know which one it is, and we will check if you can use it elsewhere.

    posted in Help
  • RE: Pen drawing clears when switching screen sizes

    @tyller_

    Thanks for reporting this issue. We will look into it.

    posted in Feedback
  • RE: 3D‮

    @106809-mygccs

    CreatiCode is a coding platform, not for model creation. You can look into platforms like https://www.tinkercad.com/ for creating models, then you can import these models into your 3D scene (see https://www.forum.creaticode.com/topic/413/3d-using-a-tinkercad-model-difficulty-2)

    posted in Help
  • RE: AI Blocks.

    @106809-mygccs

    We would suggest you start with the ChatGPT blocks in the AI category. You can follow through the “ChatGPT AI” tutorials to learn how to use them.

    posted in Help
  • ChatGPT AI: Prepare Knowledge Data Using Web Content (Difficulty: 4)

    Introduction

    In a previous tutorial, you learned how to teach ChatGPT new knowledge by semantic search (search for questions with similar meaning to the user question).

    The performance of such a chatbot mostly depends on the quality of the knowledge data we provide to the model. This process is often very time-consuming, involving collecting, cleaning, and formatting the data.

    In this tutorial, you will learn to prepare the knowledge data for a new chatbot, which will answer questions about the VISIONS organization. The basic idea is to download data from its website and generate questions/answers using that data. This is a common use case: most businesses and organizations already have websites but need to build their chatbots to provide a better user experience.

     
     

    Step 1 - Starting Project

     

    You will need first to complete this tutorial: ChatGPT AI: QA Bot Using Semantic Search (RAG) (Difficulty: 4)

    Save a copy of that project and rename it as “Chatbot Using Web Data.”

     
     

    Step 2 - Baseline Test for the VISIONS Organization

     

    As a baseline, we must test what ChatGPT knows about the VISIONS organization. If it can answer most questions without help, it is unnecessary to inject additional knowledge.

    To run the test, we can use the plain ChatGPT assistant in this project:

    https://play.creaticode.com/projects/6531b7e60fdce080a4481c1d

     

    For example, we can see that ChatGPT already knows about the VISIONS organization:

    10ec4215-e013-4cfd-bcb0-dc1daa7f8086-image.png

     
     
     
    However, if we try to get more specific information, it would fail:

    e4e998cc-69b8-4372-b021-8d66ca5b5775-image.png

     
    This proves that ChatGPT needs our help to answer more specific questions about the VISIONS organization, even though such information is readily available on their website.

     

    [Advanced] Why ChatGPT doesn’t know the answer?

     
    ChatGPT does not memorize any sentences. Instead, it memorizes the probabilities of the next words. If some words appear together very often in the training data, that pattern gets stored inside the model.

    For example, if a lot of websites contain the sentence like “The phone number of VISIONS is 1.888.245.8333”, then the next time ChatGPT sees “The phone number of VISIONS is”, it will predict the correct phone number as the next word.

    However, since that sentence is not commonly seen in the training data, the next word with the highest probability is probably “sorry” or “I”, and the actual phone number has a much lower probability.

     
     

    Step 3 - Fetch Data from the Website

     

    Now, let’s look at the VISIONS organization website. Open this URL https://visionsvcb.org in a new tab, and you should see the following:

    e.gif

     

     
    This web page contains a lot of information and links to other web pages. To download the information, we can run the following block (click the green ‘Add Extension’ block on the bottom left and select the ‘Cloud’ category):

    e.gif

     
    Note that this block does two things for you:

    1. Download the full content of the page;
    2. Convert the content into the Markdown format.

     
    The Markdown format is very simple. It is almost the same as the text you see on the web page, but additional information is included, such as the URL of the links on the web page. This will be very useful in the next step.

    Note that the content for the URL is cached, so if you run fetch from the same URL repeatedly on the same day, it will be very fast after the first time.

    Also, the website’s content will go through a moderation process, so if any inappropriate content is found, this block will reject the fetch request.

     
     

    Step 4 - Learn about Web Pages and Links

     

    Like most websites, the VISIONS page we have downloaded contains many links, such as these:

    ... 
    [Home](https://visionsvcb.org)  
    [About Us](#)  
    [Annual Reports and Financials](https://visionsvcb.org/about/annual-report)  
    [Board of Directors](https://visionsvcb.org/about/board-of-directors)  
    [Management Team](https://visionsvcb.org/about/management-team)  
    [Mission Statement](https://visionsvcb.org/about/mission-statement)  
    [VISIONS History](https://visionsvcb.org/about/visions-history) 
    ...
    

     
    Each link leads to a new page, which may link to other pages. Some of these links may be duplicates as well. For example, page 1 may contain a link to page 2, and page 2 may contain links to both page 1 and page 3. With more pages, they will form a web with many links between the links:

    2c117e90-d904-4243-b82a-0b5d87801d48-image.png

     
    In the next few steps, we will write code to p[censored] the links on each page and put these links into a queue with no duplication. We will visit each link in this queue to get more content and extract more links.

     
     

    Step 5 - The URLs List

     

    To store all the links we will visit, please create a new list named “URLs”. The first URL will be the main URL of the site: “https://visionsvcb.org.” Note that we are deleting all items from the list first, which ensures we always start with the main URL.

    25317d16-f2a8-49b4-8b40-4c8fc1fe0ee0-image.png

     
     

    Step 6 - Iterate Through Each URL

     

    Next, we use a for-loop to visit every URL in the list. Since the list will grow as we discover more links on the pages we visit, we don’t know how many URLs there will be. We will use 1 for now to ensure our code works for 1 URL.

    We will also add a guard to ensure the URL Index is never greater than the length of the list so we will always get a valid URL.

    88c6edc8-f534-4b30-bff3-7e09126711ec-image.png

     
     

    Step 7 - Fetch the Content of One URL

     

    Next, we will fetch the content from the URL at the current index and store it in the “result” variable:

    7c2f1c19-330a-4776-8e8a-0ecb53d22e03-image.png

     
     

    Step 8 - Define the “extract links” Block

     

    Since the logic to extract all the links from the page’s content is fairly standalone, it’s better to define a new block dedicated to it: “extract links”. It will take the result from fetching the content and add all the URLs on that page to the “URLs” list.

    31ac4877-141f-4e73-bb57-95ae38308c96-image.png

     
     

    Step 9 - Find All Links

     

    To find all the links in the page’s content, we need to look for a certain pattern. In the markdown format, all the links are wrapped inside a pair of ( ) starting with “https://”. Therefore, we can use regular expression to find all the text that matches this pattern and store them in a new list named “links”.

    You can copy the exact regular expression from here: \(https?:\/\/[^\)]+\)

     
    b73f85ef-1e31-4913-b0c6-936c3c4e8c0b-image.png

     
    Now, if we run the program, it will fetch the content from the main site and extract all the 60 links on that page into the “links” list:

    a015537e-c809-42e7-a8fe-75637aaa1465-image.png

     
     

    Step 10 - Go Through the List of Links

     

    To clean up this list of links, we need another for-loop to process each link. We will store each link in the “link” variable:

    f7860de2-fce8-4e27-8bc6-8627bf2254cd-image.png

     
     

    Step 11 - Remove the Parentheses

     

    Each link contains a pair of parentheses. To remove them, we need to extract the substring of the link text from the second letter to the second but last letter. The result will be stored in a variable named “URL”.

    e96d987c-ba6f-4335-9d75-408140442104-image.png

     
     

    Step 12 - Store URL Without Duplication

     

    Now, we can add the new URL to the list of URLs, but we need to ensure the list doesn’t already contain this new URL.

    708fe245-9bd1-4883-90cc-dbf1b7d1701b-image.png

     
     

    Step 13 - Limit to the Main Site

     

    There is still one small issue. Some links on the page are not from the same domain, such as “(https://accessibility-helper.co.il)”. Note that we should only download the data from the same main site. Otherwise, the list of URLs may grow exponentially as we visit more and more websites. Therefore, we need to add another condition: we only add a URL to our list if it contains “visionsvcb.org”. (Note that when you use this project for other websites, this main URL also needs to be changed.)
     
    15e87c61-c8fd-4107-864a-297cf5cc111d-image.png

     
    After this, run the program again, and the “URLs” list will grow to 42 items:

    339a1833-4238-4a34-a8f6-73ad82a9e106-image.png

     
     

    Step 14 - Test: Fetch from First 3 URLs

     

    For a test, let’s try to fetch from the first 3 URLs in the list:

    184aefa0-fa01-446d-b716-4133447292a8-image.png
     
    After we run this program again, we find the “URLs” list grows to 61 items:

    e.gif

     
    We also find a new problem. A lot of the URLs are PDF files. That might be useful if we are creating a document-retrieval app, but for this project, we should exclude them since we only need web content. We can add an additional condition in the “extract links” block like this:

    93d87cdc-1cd0-4225-8a11-6f828c1ead3e-image.png

     
    Now, if we rerun the program, the URLs list will only have 42 items in total. That means no new links have been discovered when we visit the second and third URLs.

     
     

    Step 15 - Add the “Generate QA” Block

     

    Now we can go through all the pages in a website, the next step is to generate questions and answers from each page. Let’s focus on generating them using the first page. Please define a new block “generate QA” and call it in the repeat loop. We will pass in the “result” variable, which is the text of the current page.

     

    12fc11e4-2c47-4053-800d-8bfac5122500-image.png

     
     

    Step 16 - Cut the Page Content into Chunks

     

    Next, we will feed the page content to ChatGPT to generate pairs of questions and answers. Note that we can not simply give all the content to ChatGPT in one request because that might exceed ChatGPT’s limit on the request length. Instead, we will have to cut the content into chunks.

    For example, suppose we limit each chunk to at most 5000 characters. If the content has 12000 characters in total, we will send 3 requests to ChatGPT: character 1 to 5000, character 5001 to 10000, and character 10001 to 12000.

    This idea can be implemented using a for-loop like this:

    21a2ac51-53c7-4f82-b9d6-8810d32b3aed-image.png

     
    The “start index” is the position of the first character of each chunk, and the last character’s position will be “start index + 4999”. Therefore, the “chunk” variable will contain the content of each chunk of the page’s content.

     
     

    Step 17 - Ask ChatGPT to Generate Questions and Answers

     

    It is fairly straightforward to ask ChatGPT to generate some questions and answers using each chunk. However, to make it easier to p[censored] the response of ChatGPT, we need to specify the output format as well. For example, we can use a prompt like this:

    You will generate questions and answers based on the content of a web page. Start each question with "--QUESTION:" and start the answer with "--ANSWER:". Here is the web content:
    

     
    Here is the code to compose and send the request:

    rq.gif

     
    Note that each request is a new chat so that ChatGPT won’t need to worry about previous messages. Otherwise, ChatGPt may not focus on the current chunk it is handling.

     
     

    Step 18 - Split the Response into a List

     

    The response we get from ChatGPT will look like this:

    59bc00ae-320c-4e1e-a584-8b1286ca65ab-image.png

     
    Our next task is to put the questions and answers into a table format so that we can use them to build the semantic database later. First, we need to split the response by the special symbol “--” and put the parts into a list named “qa”:

    8f6ca4a5-cdc7-44af-9949-049cb0c94d13-image.png

     
    The list will look like this, with questions and answers as individual items:

    e681a5d7-78d1-417f-9c0f-25f24ff80470-image.png

     
     

    Step 19 - Clear the Data Table

     

    Since we will accumulate questions and answers and store them in the “data” table, we need to clear the table’s contents at the beginning.

    a3d53427-8976-461e-a9ab-c0722f5edd36-image.png

     
     

    Step 20 - Iterate through the “QA” List

     

    To add all the questions and answers from the “qa” list to the “data” table, we can use a for-loop to go through every item in the “qa” list. We already know the first item of the list is always empty, so we should start with the second item. Also, we will consume 2 items at a time, one for the question and one for the answer, so we should increase the index “i” by 2 each time:

    3d42078d-35ee-4a24-8736-d46e214c1a1c-image.png

     
     

    Step 21 - Add One Pair of Question and Answer

     

    Now we can read the question at the item at index “i”, and its answer at the index of “i+1”. Then we add both of them as a new row to the “data” table:

    8fd0b045-3713-4b1d-842e-02825eb59683-image.png

     
    For example, if there are 3 pairs of questions and answers, we would get 3 rows in the “data” table:

    c1e029a4-671e-4aa0-8cc4-6aef69f3c5f6-image.png

     
     

    Step 22 - Remove the prefix

     

    There is a minor issue: the questions contain the prefix of “QUESTION:”, and the answers contain the prefix of “ANSWER:”. We can remove them using the “replace” operator:

    c14aa852-383f-4ff0-8dd5-aaf589a66b4c-image.png

     
    Now if we clear the data table (manually remove the existing items) and run the for-loop by itself again, we would get clean questions and answers:

    92c5f25e-6362-409f-bee1-b834e725f9df-image.png

     
     

    Step 23 - Test Run

     

    Now our program is ready. Let’s test it with the first 3 URLs:

    4ba63f5c-d54b-4246-9255-f9f99b0fec09-image.png

     
    It will take some time to run this since we need to use ChatGPT to generate the questions and answers from each page. When the run finishes, we will get some questions in the “data” table. The exact number of questions may be different when you run it.

    7611ce4a-f9dc-4247-920f-92edc6b71092-image.png

     
    The rest of the process is the same as before: we can create a semantic database using the “data” table (no need to do this repeatedly if the data stays the same), then we can query this database when the user asks a new question, and feed the query result to ChatGPT as reference.

     
     

    Next Step

     

    You can now try to change the program to fetch data from any other website, and then publish a chatbot using that data. Note that you will need to specify the new website in 2 places:

    • In the beginning, when you specify the initial URL:

    7325524f-a056-44ef-b641-7396b9a3be12-image.png

    • In the “extract links” block’s definition, where you remove URLs not related to the target website:

    16583fbb-13e1-46cc-9e6e-193ef8c025bc-image.png

    posted in Tutorials