Food 2 Recipe: Azure OpenAI & Bing Visual Search for Business Central

Does Image analysis is good enough?

Last week i was at Directions ASIA, which was a blast btw., during keynote Microsoft showed a very nice demo “How to create an item from photo” and then “How to use new Copilot to generate a marketing text”.  This is the demo video (and the full one can be found here)

Do you find it good? Well yeah! Looks amazing. 

When returned home I decided to check it out on the food image, as i work a lot with a food solutions and wanted to propose the partner something special. However out-of the box this didn’t work so well

I dig a bit and found that Image Analysis model used by base Application is /vision/v3.2/analyze . If i call this endpoint (sorry, simple python here) providing link to the same image I’ll get next result.

				
					import requests
import json

headers = {'Content-Type': 'application/json', 'Ocp-Apim-Subscription-Key': api_key}

# Define the request body
body = {'url': image_url}

# Make the request to the Custom Vision API
response = requests.post(endpoint + '/vision/v3.2/analyze?visualFeatures=Description', headers=headers, json=body)

# Get the response content as JSON
result = response.json()

# Print the results
print(json.dumps(result, indent=4))
				
			

response

				
					{
    "description": {
        "tags": [
            "wooden",
            "dish",
            "food",
            "wood",
            "pizza"
        ],
        "captions": [
            {
                "text": "a pan of food",
                "confidence": 0.5040134787559509
            }
        ]
    },
    "requestId": "33a28b7d-65e4-4d5a-a1a1-8d7aca3bff20",
    "metadata": {
        "height": 711,
        "width": 474,
        "format": "Jpeg"
    },
    "modelVersion": "2021-05-01"
}
				
			

so, image analysis is not helping here much. We can do custom vision and train the model based on the domain images, but… yeah, i wanted to keep things simple. 

I thought about another solution, similar approach I apply in the CentralQ.ai service.

Why not to use Azure Bing Visual Search, get bunch of search results about the image and then ask chatGPT to combine results into one question “Guess, what’s there in the picture?

Why not, let’s go!

Concept

Here is the main idea. When user creates an item from image, we will

  • send this image to the Azure Bing Visual Search
  • get 3-5 top search results describing image
  • send descriptions to chatGPT, asking to name the image
  • save name in the Item Card

We can do even more! As this is a picture of food, why not also to get a recipe?

  • take name from chatGPT response
  • ask chatGPT to provide us recipe
  • save recipe in Item Card Notes 

Azure Bing Visual Search

To use Azure Bing Visual Search we need to create a Bing Search v 7.0 resource and choose S9 pricing tier (only this tier supports visual search).

The detailed workflow can be found here, but it’s very strait-forward. Then you can get the subscription key and the call the https://api.bing.microsoft.com/v7.0/images/visualsearch endpoint and you are ready to go.

Now let’s do simple AL code to call Azure Bing Visual Search.

As the input we use Picture InStream that get from the Item Picture

				
					    local procedure SearchItemPicture()
    var
        PicInStream: InStream;
        TenantMedia: Record "Tenant Media";
        BingVisualSearchImpl: codeunit "GPT Bing Visual Search Impl.";
    begin
        if Rec.Picture.Count = 0 then
            exit;

        TenantMedia.Get(Rec.Picture.Item(1));
        TenantMedia.Calcfields(Content);
        if not TenantMedia.Content.HasValue then
            exit;

        TenantMedia.Content.CreateInStream(PicInStream);

        BingVisualSearchImpl.Search(PicInStream);
    end;

				
			

Then we construct web service call to the Azure Bing Visual Search, using the url mentioned above and a key from Azure Bing Search resource.

				
					    procedure Search(var PictureInStream: InStream): Text
    var
        Request: HttpRequestMessage;
        Response: HttpResponseMessage;
        Headers: HttpHeaders;
        Content: HttpContent;
        Client: HttpClient;
        ResponseText: Text;
        BingSearchSetup: Record "GPT Bing Search Settings";
    begin
        // configure url and request type
        Request.Method('POST');
        Request.SetRequestUri(BingSearchSetup.GetEndpoint());

        // configure headers
        Request.GetHeaders(Headers);
        Headers.Add('Accept', 'application/json');
        Headers.Add('Ocp-Apim-Subscription-Key', BingSearchSetup.GetSecret());

        // configure content
        BuildFormDataContent(PictureInStream, Content);

        Request.Content := Content;

        Client.Send(Request, Response);

        Response.Content.ReadAs(ResponseText);

        BuildDescriptions(ResponseText);
    end;

				
			

There is some challenge part, as according to the documentation the content type should be form-data and that’s not something i use to work with in AL. 

Thanks to this blog i ended with correct syntax.

				
					    local procedure BuildFormDataContent(var PictureInStream: InStream; var Content: HttpContent)
    var
        ContentHeaders: HttpHeaders;
        TempBlob: Codeunit "Temp Blob";
        PayloadInStream: InStream;
        PayloadOutStream: OutStream;
        CR: Char;
        LF: Char;
        NewLine: Text;
    begin
        CR := 13;
        LF := 10;
        NewLine += '' + CR + LF;

        Content.GetHeaders(ContentHeaders);
        ContentHeaders.Clear();
        ContentHeaders.Add('Content-Type', 'multipart/form-data; boundary=boundary');

        TempBlob.CreateOutStream(PayloadOutStream);
        PayloadOutStream.WriteText('--boundary' + NewLine);
        PayloadOutStream.WriteText('Content-Disposition: form-data; name="image"; fileName="image.jpeg"' + NewLine);
        PayloadOutStream.WriteText('Content-Type: application/octet-stream' + NewLine);
        PayloadOutStream.WriteText(NewLine);
        // Copy all bytes from the uploaded file to the stream
        CopyStream(PayloadOutStream, PictureInStream);
        PayloadOutStream.WriteText(NewLine);
        PayloadOutStream.WriteText('--boundary--' + NewLine);

        // Copy all bytes from the write stream to a read stream.
        TempBlob.CreateInStream(PayloadInStream);
        // Write all bytes from the request body stream.
        Content.WriteFrom(PayloadInStream);
    end;

				
			

Now, we just need to take the json response, which comes in this format, get top 5 search results and construct simple description text.

				
					    local procedure BuildDescriptions(var ResponseText: Text) ImageDescriptions: TextBuilder
    var
        BingVisionResponseJObj: JsonObject;
        Jtok: JsonToken;
        SearchResults: JsonArray;
        NameJTok: JsonToken;
        i: Integer;
    begin
        BingVisionResponseJObj.ReadFrom(ResponseText);

        BingVisionResponseJObj.SelectToken('tags[0].actions[0].data.value', Jtok);
        SearchResults := Jtok.AsArray();

        for i := 0 to GetMaxNumberOfResults(SearchResults) do begin
            SearchResults.Get(i, Jtok);
            Jtok.SelectToken('name', NameJTok);
            ImageDescriptions.Append(NameJTok.AsValue().AsText());
        end;

        Message(ImageDescriptions.ToText());
    end;

				
			

Great!. Let’s quickly test that on the image we used previously.

Appears, that bing visual search knows more about our food, than Image Analysis. You can also notice, that visual search also found non-English descriptions. If you want to search just for english, you can add setLang query parameter to the request. But, we don’t need that. We are good with multi language description as…. chatGPT understands all languages 🙂

Azure OpenAI

Now we will need to send the descriptions to the Azure OpenAI and ask the model to “guess the name from description”.

Why Azure OpenAI and not OpenAI? The answer is… just because:) Seriously, both services share the same model, so you will get similar results for the same price. I found that Azure OpenAI does the job quicker and as I have access to both, so I choose Azure OpenAI because of less latency. 

If you don’t have access to Azure OpenAI, you can request it here. You will get access to many models including gpt-3.5. If you need gpt-4 request it here additionally.

If you don’t want to wait, or want to use OpenAI APIs directly, you can sign-up here

To create Azure OpenAI resource, once you get access just go to Azure Portal 

This will create a Azure OpenAI resource. 

Now you need to create a model. Don’t panic, not need to train anything, pickup from pre-trained list. For our task select gpt-35-turbo (the same model is used in chat.openai.com). Check out here about other models.

We are ready to call AI-model. The endpoint to make a web request, should be constructed as https://[resourceName].openai.azure.com/openai/deployments/[ModelDeploymentName]/chat/completions?api-version=2023-03-15-preview

In our case we should replace

  • [resourceName] to ‘azureopenaiblog’
  • [ModelDeploymentName] to ‘gpt35’

The final url will be https://azureopenaiblog.openai.azure.com/openai/deployments/gpt35/completions?api-version=2022-12-01

Let’s call it from Business Central.

				
					    procedure GetAIGeneratedItemName(InputText: Text) Name: Text
    var
        Prompt: Text;
        Completion: Text;
    begin
        Prompt := 'You are a ChiefGPT. You will be provided by food descriptions of the same food, separated by ''|''. ' +
                'You should analyse descriptions and tell me the name of the food. ' +
                'Don''t add additional comment like ''the food is'', return just food name.' +
                'Description: ' +
                '_______';

        Completion := GenerateCompletion(Prompt, InputText);

        exit(Completion);
    end;

				
			

Prompt

Prompt is an instruction to the model. It should be clean, describe input, output and any additional requirements model should take into consideration. 

Microsoft issued a prompt guidelines, you can take a look here

Prompt is the most important part in AI-generate tasks. Little change in prompt text can result in very different output. In general you should play with your prompts, to make them best suite your tasks. 

You can try different prompts in the Azure OpenAI Playground.

 

In Business Central there is a system module “Entity Text” used to power Copilot in Business Central (to generate Marketing Text, as in the first video of this blog). This module has Azure OpenAi Impl. codeunit. However, at least for now (May 2023) this module is internal and we can’t call it. 

So, we create our own AI-completions caller logic. I will make it similar to what Microsoft has done, so to easily switch to standard when it’s ready.

				
					    internal procedure GenerateCompletion(Prompt: Text; InputText: Text): Text;
    var
        Configuration: JsonObject;
        Completion: Text;
        NewLineChar: Char;
    begin
        Configuration.Add('max_tokens', 800);
        Configuration.Add('temperature', 0);
        Configuration.Add('messages', GetMessages(Prompt, InputText));

        Completion := SendCompletionRequest(Configuration);

        NewLineChar := 10;
        if StrLen(Completion) > 1 then
            Completion := CopyStr(Completion, 2, StrLen(Completion) - 2);
        Completion := Completion.Replace('\n', NewLineChar);
        Completion := DelChr(Completion, '<>', ' ');
        Completion := Completion.Replace('\"', '"');
        Completion := Completion.Trim();

        exit(Completion);
    end;

				
			

I will not describe here what are the max_tokens and temperature. It’s a topic of another blog, however if you are interested you can read here

The most important here is how you build messages.

				
					    internal procedure GetMessages(Prompt: Text; InputText: Text): JsonArray;
    var
        Messages: JsonArray;
        Message: JsonObject;
    begin
        Clear(Message);
        Message.Add('role', 'system');
        Message.Add('content', Prompt);
        Messages.Add(Message);

        Clear(Message);
        Message.Add('role', 'user');
        Message.Add('content', InputText);
        Messages.Add(Message);

        exit(Messages);
    end;

				
			

The Prompt we constructed, we put into system role. The Bing Visual Search Result we put into user role. Very similar to what we’ve seen in the Azure OpenAI Playground.

Now, we just need to send our request to the endpoint (constructed above) and let’s see the magic happening!

				
					    internal procedure SendCompletionRequest(Configuration: JsonObject): Text;
    var
        AzureOpenAISettings: record "GPT Azure OpenAI Settings";

        Request: HttpRequestMessage;
        Response: HttpResponseMessage;
        Headers: HttpHeaders;
        Content: HttpContent;
        Client: HttpClient;
        StatusCode: Integer;

        Payload: Text;
        ResponseText: Text;
        ResponseJson: JsonObject;
        CompletionToken: JsonToken;
        Completion: Text;
    begin
        Configuration.WriteTo(Payload);

        Request.Method('POST');
        Request.SetRequestUri(AzureOpenAISettings.GetEndpoint());

        Client.DefaultRequestHeaders().Add('api-key', AzureOpenAISettings.GetSecret());
        Content.WriteFrom(Payload);

        Content.GetHeaders(Headers);
        Headers.Remove('Content-Type');
        Headers.Add('Content-Type', 'application/json');

        Request.Content(Content);
        Client.Send(Request, Response);

        StatusCode := Response.HttpStatusCode();
        if not Response.IsSuccessStatusCode() then
            Error(CompletionFailedErr, Response.HttpStatusCode, Response.ReasonPhrase);

        Response.Content().ReadAs(ResponseText);

        ResponseJson.ReadFrom(ResponseText);
        ResponseJson.SelectToken('$.choices[:].message.content', CompletionToken);
        CompletionToken.WriteTo(Completion);

        exit(Completion);
    end;

				
			

We now ready to put alltogether in 1 line of code 

				
					AzureOpenAIImpl.GetAIGeneratedItemName(BingVisualSearchImpl.Search(PicInStream))
				
			

Pretty awesome! 

We combined Upload Picture + Bing Visual Search + chatGPT in one smooth flow and got a name of the food on the picture.

Do you thing it’s “uff… to much engineering for such small task?”

Ok, how about that?

Try make the same by adding new prompt and saving results to Record Links (here is an example of Saving Notes). 

Have any other ideas? You are welcome to make a pull request or provide a comment! 

Share Post:

Leave a Reply

About Me

DMITRY KATSON

A Microsoft MVP, Business Central architect and a project manager, blogger and a speaker, husband and a twice a father. With more than 15 years in business, I went from developer to company owner. Having a great team, still love to put my hands on code and create AI powered Business Central Apps that just works.

Follow Me

Recent Posts

Tags