Copilot Toolkit: How to count tokens … precisely

The Challenge of Token Count in AI Module

This was supposed to be 3-rd blog in a row about Copilot Toolkit for Business Central. 

The reason I decided to write it today is this commit published yesterday. As you probably know, Business Central 23 has now 2 copilots:

These copilots rely on the Azure OpenAI to generate suggestions. To produce suggestions, we ask the models what we want to achieve – prompts. 

The size of these prompts is limited and differ from the model: currently from 4k to 128k tokens

Tokens represent commonly occurring sequences of characters. For example, the string " tokenization" is decomposed as " token" and "ization", while a short and common word like " the" is represented as a single token. As a rough rule of thumb, 1 token is approximately 4 characters or 0.75 words for English text.

OpenAI made a special python library tiktoken, that make it simple for the developers to count tokens before sending to Azure/OpenAI endpoints, to avoid unnecessary ‘out of limit’ errors and to effectively utilize full context size window. 

In AL, we don’t have such tool. The base application utilizes a function to approximate token counts, a necessary step before sending data to Azure Open AI endpoints.

This exact function, Microsoft yesterday added to the AI Module.

				
					    // AzureOpenAIImpl.Codeunit.al
    
    [NonDebuggable]
    procedure ApproximateTokenCount(Input: Text): Decimal
    var
        AverageWordsPerToken: Decimal;
        TokenCount: Integer;
        WordsInInput: Integer;
    begin
        AverageWordsPerToken := 0.6; // Based on OpenAI estimate
        WordsInInput := Input.Split(' ', ',', '.', '!', '?', ';', ':', '/n').Count;
        TokenCount := Round(WordsInInput / AverageWordsPerToken, 1);
        exit(TokenCount);
    end;
				
			

However, this approximation, while useful, can sometimes lead to inefficiencies or errors due to context window limits.

Underestimating the token count leads to out-of-limit errors, and overestimation leaves valuable ‘space’ unused, which could otherwise be leveraged for more detailed inputs, yielding better AI results.

Precise Token Counting

To address this challenge, I’ve developed an Azure Function that accurately counts tokens using the OpenAI tiktoken module. 

https://azure-openai-tokenizer.azurewebsites.net/api/tokensCount

This function offers a more precise alternative to the approximate method, ensuring that the token limits are utilized effectively and accurately. Here’s how you can integrate this solution into your Business Central application:

				
					procedure PreciseTokenCount(Input: Text): Integer
var
    RestClient: Codeunit "Rest Client";
    Content: Codeunit "Http Content";
    JContent: JsonObject;
    JTokenCount: JsonToken;
    Uri: Label 'https://azure-openai-tokenizer.azurewebsites.net/api/tokensCount', Locked = true;
begin
    JContent.Add('text', Input);
    Content.Create(JContent);
    RestClient.Send("Http Method"::GET, Uri, Content).GetContent().AsJson().AsObject().Get('tokensCount', JTokenCount);
    exit(JTokenCount.AsValue().AsInteger());
end;

				
			

This function connects to the Azure Function endpoint, sends the input text, and receives the precise token count in response. It’s a simple yet effective way to enhance the accuracy of your AI-driven features in Business Central.

Let's see in action

I made simple test

				
					codeunit 50100 "GPT Tokens Count Impl."
{
    trigger OnRun()
    var
        Text: Text;

    begin
        Text := 'Helllllo AL developers! Welcome to the AI world!!!';
        Message('%1\\Approximate token count: %2\Precise token count: %3', Text, ApproximateTokenCount(Text), PreciseTokenCount(Text));
    end;
    ...
}
				
			

In this example the difference between approximate and presise approaches – almost twice. If we apply approximate method we lose 8 tokens, that we can potentially add to the prompt, to get better results. The other way around also possible.

In a large prompt, used for bank account reconciliation, this difference could be critical. 

List the Tokens

Sometimes, it’s useful to see how model converts the text into the tokens. To address this issue, I also implemented this Azure function

https://azure-openai-tokenizer.azurewebsites.net/api/tokens

Here’s how you can integrate this solution into your Business Central application:

				
					    procedure ListTokens(Input: Text) TokenList: List of [Text]
    var
        RestClient: Codeunit "Rest Client";
        Content: Codeunit "Http Content";
        JContent: JsonObject;
        JTokenArray: JsonToken;
        JToken: JsonToken;
        Uri: Label 'https://azure-openai-tokenizer.azurewebsites.net/api/tokens', Locked = true;
        i: Integer;
    begin
        JContent.Add('text', Input);
        Content.Create(JContent);
        RestClient.Send("Http Method"::GET, Uri, Content).GetContent().AsJson().AsObject().Get('tokens', JTokenArray);

        for i := 0 to JTokenArray.AsArray().Count - 1 do begin
            JTokenArray.AsArray().Get(i, JToken);
            TokenList.Add(JToken.AsValue().AsText());
        end;
    end;
				
			

The result will be

Open Source and Privacy-Focused

The Azure Function for token counting doesn’t collect any user data, and it’s entirely free to use with no authentication required. The source code is open for anyone to review, modify, or deploy independently. You can find the entire project on my GitHub: Azure-Open-AI-Tokenizer.

Final Thoughts

By ensuring that each token is counted accurately, developers and users alike can maximize the potential of AI features without worrying about the limitations imposed by approximate methods.

I’m committed to keep these endpoints alive and free, at least, until Microsoft add similar functionality to the AI Module itself. 

In the next blog, we’ll dive deeper into other components of the Copilot Toolkit, focusing on how they streamline the incorporation of AI into Business Central. Stay tuned for more insights and practical guides on leveraging AI to its fullest potential in your Business Central applications.

Share Post:

Leave a Reply

About Me

DMITRY KATSON

A Microsoft MVP, Business Central architect and a project manager, blogger and a speaker, husband and a twice a father. With more than 15 years in business, I went from developer to company owner. Having a great team, still love to put my hands on code and create AI powered Business Central Apps that just works.

Follow Me

Recent Posts

Tags