A Beginner’s Guide To: Large Language Models (LLM)

17 Mar

What is a LLM?

Since the rise of Open AI’s ChatGPT, there has been a lot of talk about large language models and how they are created. In a nutshell, a LLM is an AI foundational model, trained on vast datasets comprising diverse sources, from literature and scientific papers to websites and social media content, enabling them to grasp a wide array of human knowledge and language nuances.

Just how big are LLM’s?

One of the defining characteristics of LLMs is their immense size, both in terms of the dataset they are trained on and the number of parameters they contain. Parameters are essentially the parts of the model that are learned from training data; they can be thought of as the knowledge and rules the model uses to generate responses. It is rumored that Chat GPT-4 is based on 1.76 trillion parameters (as at March 2024).

Example of an LLM in practice

A common output of a LLM is to enable predictive text, like when you are typing a text message on your phone and the next word in a sentence is recommended. The LLM does this by attributing a probability score to the reoccurrence of the words. This is the very start of how content generation has evolved.

What are the benefits of a LLM?

The extensive training of LLMs enables the models to understand context and generate coherent and relevant text across various topics and languages. The sheer volume of data ensures that LLMs have a broad knowledge base to draw from, covering a vast spectrum of human knowledge. LLMs are great for language translation, general content creation or summarisation, and are commonly used for code generation.

What are the limitations of a LLM?

Data bias and representation issues arise in LLMs as they learn from pre-existing data, which often embeds societal biases and stereotypes. This can lead to generated content that unintentionally perpetuates these biases. For example, If you ask a LLM to describe a nurse, drawing from its training data it might generate a response like ‘ a nurse is a female healthcare professional who assists doctors and provides compassionate care to patients’.

Furthermore, although LLMs excel at reproducing textual patterns, they fall short in truly understanding the world, lacking common sense and real-world knowledge. If you asked an LLM to create a recipe using strawberries and curry powder, it would do so without knowing whether the resulting dish actually tasted great or not.

The future of LLMs

While LLMs continue to expand their influence, a notable trend for 2024 is the rise of smaller AI models. These compact yet nimble models are poised to redefine how we approach natural language understanding and generation. Keep an eye out for our upcoming blog post, where we’ll delve into the opportunities that smaller AI models can provide.

Image created from Chat GPT-4. Prompt: Create a picture of curried strawberry dip? Put it in a dish in the centre of the plate surrounded by sliced vegetables and pita chips.

Emma Cole

A Beginner’s Guide To: Large Language Models (LLM)

AI Buzzword Bingo!

Embracing digital wellbeing at Strawberry Sauce