What Are Tokens, Context And Messages ? – AI Engine

Tokens

Think of tokens as the currency AI models use to digest and generate text. Each piece of text is broken down into these units called tokens. The number of tokens can vary based on how common a sequence of characters is and how the AI model has been trained to understand it. It's like the AI's vocabulary!

To get a better idea, why not give it a go yourself? Head over to the OpenAI Tokenizer and plug in some text to see how it's tokenized. It's pretty cool to see how the AI breaks down your words!

Context

Imagine having a conversation with someone who forgets everything you said a minute ago. Frustrating, right? That's where context comes into play. The AI keeps track of what has been said previously, storing it in its memory bank to maintain a coherent conversation. This history, or "Context", isn't just limited to what you say, though. It also includes instructions, known as system messages, which can be set up when you're configuring your chatbot or when you're pulling in dynamic content like a web search. This way, the AI isn't flying blind—it's got the backstory!

Messages

Each entry in the context is a message. Think of messages as pieces of a puzzle that the AI puts together to see the full picture. Each message takes up a certain number of tokens, so you'll want to balance the length and number of messages to avoid overloading the AI, especially since there's a maximum context size it can handle.

Understanding this in AI Engine

Let's break down some of the technical jargon and make it easier to understand:

Max Tokens: This is the limit of how much the AI can process or generate at one time. Different models have different token limits. Most users just want to use the maximum allowed, and honestly, they don't need to worry about this setting too much. OpenAI needs this value, but we can simplify things for you, look at our recommendations!

Max Messages: This is all about the conversation history. Think of it like the AI's short-term memory of what you've been discussing. Remember the puzzle pieces we talked about ? You can limit how much of them the AI can play at once with.

Input Max Length: This one refers to the length of the new message you're sending to the AI. Keep it concise to stay within limits! This affects the number of charaters allowed in the input box for your chatbot.

Context Max Length: This isn't about tokens anymore; it's the total size of the context if there is one. The context can include previous messages, instructions, embeddings, or other dynamic content.

And a little heads up: it's now called Context Max Length, not Context Max Tokens, for a bunch of reasons. It's simpler and faster this way. Plus, calculating tokens on the fly can be tricky and might vary between models and systems, so let's stick to what's stable and clear.

Recommended Values

For OpenAI, the best reference is this: https://platform.openai.com/docs/models/overview.

For English, 4 characters is equal to 1 token. Keep in mind that while we're using multiples of 4 for simplicity and ease of comprehension, this method is versatile and can be applied to any numerical value.

gpt-4-1106-preview (GPT-4 Turbo)

GPT 4 Turbo supports 128,000 tokens in input, and 4,096 as output. Based on that, we could go for the following values.

Max Tokens: 4096 tokens Max Messages: 16 messages Context Max Length: 8192 characters (~ 2048 tokens)

One message is rarely longer than 512 characters (~ 128 tokens). By 16 (Max Messages), it would account for 8192 characters (same as for the Context Max Length). Therefore, with those values, the maximum number of tokens would seem to be 2048 * 2 = 4096 tokens. We are very far from the 128,000 tokens, so those values are extremely safe.

Based on your requirement, you can easily increase the Context Max Length and/or the Max Messages.

For Max Tokens, which is only used for output, it simply means that we don't limit the AI, and it can use as many tokens as it needs to build its reply.

gpt-4-32k (GPT-4 32k)

It has 32,768 tokens in input. Therefore, the values used in GPT-4 Turbo are also okay.

gpt-3.5-turbo-16k (GPT 3.5 Turbo 16k)

It has 16,385 tokens. The values used in GPT-4 Turbo are also okay, but they should be basically the maximum.