Update — For the full story about Ranko Bon and the inspiration behind building RankoBot check out my co-founder’s medium post.
My company’s co-founder approached me with a proposition: to utilize ChatGPT as an interactive tool for delving into his father’s journal entries. His father, Ranko Bon, has been an inexhaustible writer for half a century, amassing an astonishing collection of over 5 million words across 19,000 journal entries. Intriguingly, these treasure troves of insights are housed in a WordPress blog. The concept was to leverage ChatGPT as a conversational medium, providing a unique platform for inquiring about the author and his vast array of compositions.
I’ve discovered that ChatGPT and Large Language Models (LLMs) are profoundly intriguing and handy, yet their practical application could be hindered by a dearth of contextual comprehension, limiting their utmost utility. This realization prompted the conception of an ideal hobby project: crafting an interactive chatbot to engage with articles, thereby amplifying their usefulness and offering a fascinating deep-dive into the realm of AI conversation.
RankoBot App— https://rankobot.com
Ranko Bon’s Journal — https://residua.org
Overview of how the application works
- Leverage ChatGPT to process and tokenize the text of journal entry articles, extracting the corresponding vector embeddings.
- Archive the journal entries along with their respective vector embeddings for future accessibility.
- Receive and interpret the user question.
- Utilize ChatGPT to process the user’s question and obtain the related vector embedding.
- Query the archived articles using the vector embedding of the question, identifying the top journal entries that closely align with the inquiry.
- Query ChatGPT, employing the text from the relevant journal entries as context and incorporating the user’s question.
- Retrieve the ChatGPT response and display to the user.
Text Processing Steps with ChatGPT
Let’s delve into the inner workings of ChatGPT, a powerful language model developed by OpenAI, by dissecting its four fundamental steps in processing text: Tokenization, Vectorization, Contextualization, and Decoding.
1. Tokenization: This initial step involves breaking down the input text into smaller segments known as tokens. Tokens can vary in size, ranging from a single character to an entire word.
2. Vectorization: The model then converts these tokens into numerical vectors via an embedding layer, which operates as a matrix. Each row in this matrix corresponds to a unique token in the model's vocabulary, and each column represents an embedding dimension.
3. Contextualization: The vectors pass through multiple layers of the model. In the case of GPT, these layers are known as transformer layers. Each layer computes new vectors based on the surrounding vectors, allowing the model to grasp the context of each token.
4. Decoding: In this final stage, the model transforms the contextualized vectors back into tokens. This forms the output of the model.
We are employing PostgreSQL as our database to store and query on the vector embeddings. To facilitate this, we make use of the PgVector extension in PostgreSQL. In addition, to enable eloquent model querying, we are utilizing the pgvector-php Laravel library.
Process the posts text and store the embeddings
Process the user question, search for similar documents and the ask question to ChatGPT
Hopefully this post serves as an effective springboard for you to engage with your own documents contextually, harnessing the capabilities of Laravel and ChatGPT. Your thoughts and suggestions for enhancements are highly welcomed. Also, if you happen to notice any discrepancies, please don’t hesitate to point them out. Together, we can refine and improve this exciting exploration into the world of AI and text processing.