Natural language processing (NLP) is one of the most relevant research fields within computer science. This area is dedicated to finding the best way to make a machine understand the language as we people understand it.

Inevitably, this field has always encountered a series of challenges that come from the form or flexibility that we people have to communicate or, more importantly, to understand what is communicated to us. The problem of rigidity that machines have, make aspects such as context, sarcasm or ambiguity, real headaches for people who research and dedicate their careers to this topic.

To make it clearer, think of the following example:

  • John and Sonia are married.

Does it mean that they are a couple and are married to each other or that they are both married to their respective partners? If I knew Juan or Sonia, my interpretation of this sentence would be straightforward, I would have no doubt about it. However, I’ve had to make use of context to understand exactly what this phrase refers to. In short, if a human has doubts about the interpretation of the phrase, a machine will also have them (and much more, I would dare to say).

However, it’s true that when faced with this ambiguity we think of a more likely option, because we are used to speaking in a certain way. In this example, the first thing we usually think, at least I do, is that they both are a couple and are married, because if not, we usually specify the sentence with something else, for example: “Juan and Sonia are married to their respective partners” in order not to give rise to a possible misunderstanding.

Generative AI Models

Lately, we are constantly bombarded by advances in generative artificial intelligence, especially since the rise of ChatGPT as our personal assistant in our daily tasks. Let’s see what is the general idea behind Large Language Models (LLM) such as GPT3.5 (Chat GPT, OpenAI), PaLM (Google), Falcon (Hugging Face), etc. These models are built with one main task, to construct what is the next word in the sentence given all of the above, in other words, what is the most likely word to fit within the sentence out of all available words, i.e. out of the whole vocabulary.

This is very similar to the way we act as receivers of information: we look for the meaning that the sender of the message is referring to from among all the possible meanings and we continue the conversation from there.

Being a bit more technical, an LLM is a neural network with a very high number of parameters that has been trained with a lot of data (petabytes of data). This huge amount of data contains the necessary information for these models to be able to perform tasks such as summaries, information classification, entity extraction and even text translation among other possible tasks.

In other words, the “classic” tasks in the area of natural language processing that we used to separate into different smaller and more specialized language models, we have unified them into one huge model and made an “all-in-one”.

Put into practice, this makes it seem that when different types of queries are made and the model responds correctly it is because it is very smart, because knowing how to perform all these tasks at the same time seems very human, when the trick is that the query is already somewhere in the huge amount of information it contains. However, all the information coming from any AI is not exempt from being contrasted and checked, nobody assures you that within so much information the model has made a correct and concise interpretation of the query that we have made. We must remember that the only task required of the model is to respond, not to respond correctly, so it is always important to verify the information.

Today, these models require an infrastructure equivalent to their size, which translates into large instances with a very high computational capacity. These restrictions mean that only companies with a large internal infrastructure are able to train these models. For reference, an LLM training can take months to complete! In addition, it is estimated that the cost of training such a model can be several million dollars . I believe this data reflects the magnitude of the size of these models and their complexity. Similarly, if we as users want to make use of these tools, we must take into consideration that this can only be done if we have at our disposal the infrastructure that can store and work with these models using dedicated machines or via API (Chat GPT), understanding that the confidentiality of the exposed data may be compromised if an API is used.

How to work with LLMs?

Every day more and more applications are coming to light that can be carried out using generative AI. The way to work with them is to establish a series of instructions, also called prompts. These prompts are templates that specify to the model what to look for and how to respond. A very simple example of a prompt might be:

  • Given the following text fragment: [Here would go the text] Generate a summary.

These instruction templates must be very clear and very concrete, the more specific the better . Keep in mind that there are limitations on the amount of text that can be entered into the language templates, so precision is very important in this aspect.

If the task is very complex or very specific, a re-training (fine-tuning) of the language model can be performed, in order to make the previous model specialized in the specific task that is desired. For example, if I want to create a model with the purpose of answering medical doubts, it is convenient to do a re-training with only contrasted medical documents. In this way, the model will have specialized in answering purely medical questions and may lose more generality but be a better version in the task it has been focused on.

The results of these language models generate more and more impact. According to Open AI, these were the results of their GPT 3.5 and GPT 4 models in college level exams with respect to the students who took those exams.

Here it can be seen that in most cases the level of GPT 4 resembles that of the best students who took the exam.

Similarly, Google’s Med-PaLM 2 model, a model specialized in medical issues, achieved an 85.4% accuracy rate on questions on the US Medical License Exam (USMLE).

Seeing the general performance of these models in so many areas, we can have a first idea of how useful these models can be, provided they are used correctly and ethically.

At the enterprise level, more and more use cases are emerging that can be solved by generative AI, for example: document management cases, i.e., searching for information through a given documentation, generating image or text content, creating your own virtual assistant, etc. Now it is up to us to extract value from these tools and see how far we can go.

 

Image: Freepik | Rawpixel

Author

  • Carlos Castro

    Data Scientist en Keepler. "I am a mathematician dedicated to data science, I love learning new algorithms and machine learning techniques that help develop customized data-driven solutions."