Leveraging AI for Meal Image Analysis

Create an AI-powered image analyzer using OpenAI's GPT-4 model, Python Flask, and JavaScript Fetch. Learn advanced prompt engineering techniques to extract detailed meal information in JSON format from images.

Key Insights

Implement precise prompt engineering to instruct the GPT-4 model exactly how to format its JSON response, clearly defining properties like meal name, savory description, and total calories.
Effectively manage deeply nested dictionaries and lists in Python, carefully commenting each closure to maintain readability and prevent errors in complex structures.
Utilize base64 encoding to embed images directly into API requests, enabling seamless transmission from JavaScript Fetch to a Python Flask backend, which communicates with OpenAI's API.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

This is a lesson preview only. For the full lesson, purchase the course here.

Back to biz where we were. Okay, open the image in binary format and encode it. Okay, we did that.

Explain what binary Base64 is a little bit. Can we get this in one line? Nope. Well, we could cheat—make it one line smaller.

Okay, write the text prompt. Okay, so we're going to do that response: `client.chat.completions.create`, which we got started on here. We're going to say the model equals GPT-4.0, and the response format equals response—as I say—oh, type: JSON object, right? JSON object. And then messages, which we start off with as a list. And we know we need these two dictionaries: one for the system, one for the user. For the first item in the messages list, provide the system prompt informing the AI of its culinary expertise.

We're going to say—state this one out for now. We're going to say: role = system, content = "You are a dietitian and chef. A user sends you an image of a meal, and you answer with the name of the meal, a savory description, and the total calories." The restaurant menu–style name, you know—savory description of the meal and the total calories of the meal. And just keep repeating that. Probably don’t need to.

Okay, so what that’s going to get you is that prompt will get us a string of info. I mean, we're saying we want JSON, but we're not really telling it how to answer with the JSON, and we're leaving it to the AI perhaps to figure out how to break that up into key-value pairs. Better would be if we just explicitly told the AI what we want.

Python for Data Science Bootcamp: Live & Hands-on, In NYC or Online, Learn From Experts, Free Retake, Small Class Sizes, 1-on-1 Bonus Training. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

That’s prompt engineering. We specify JSON here in the response format for a reason. We want the AI's answer in parsable chunks of properties—key-value pairs—for output to their respective HTML tags, right? We want to be able to get the name, output it to the name; get the description, output it to the description, right? We need to be more specific in our prompt.

We need to spell out the JSON. So Step Six: prompt engineering—tell AI exactly how to answer in the JSON format. We're going to engineer the system prompt now, spelling out precisely how we want the AI to provide its JSON answer.

JSON requires double quotes. We're going to literally write the JSON, which has to use double quotes for the keys and the string values. And of course, our content string—"You are a dietitian and chef"—all that stuff—that is also in double quotes.

We've got nested double quotes. We need to escape the inner JSON double quotes. And we do that in a kind of cool move by wrapping the content part—the outer string—in triple double quotes.

And then cooler still is we have to escape the curly braces of the JSON that we're describing to the AI. And we can't let it get confused with the curly braces of the prompt. We're going to escape the inner JSON curly braces with double curly braces so that they don't conflict with the dictionary's own outer curly braces.

So here’s the move. First, we're going to go to the content and do triple double quotes so that we can come in with double quotes. And then we're going to add on: respond in JSON format with keys called "meal name", "meal description", "total calories".

Meal name, meal description, and total calories. Then we're going to give it the JSON. We're going to write the JSON here.

But we're going to do it in sets of double quotes to escape it. The prompt itself has curly braces. So the prompt has curlies.

There's inner curlies—we escape them as double curlies. I'm going to say: meal name = "Name of the meal", just the value.

Just write it how you want it to be. Meal description = "Savory description of the meal", total calories = "Total calories in the meal". We end with this double curly + triple quote as we come out of the escapes, right? So there it is.

Then you're going to come out of the user prompt, come out of the messages, come out of—we should actually comment this stuff. We'll say: end user dictionary, end messages list, end create method. Because there's going to be more, and it's already getting kind of complicated.

More prompt engineering. Now the user prompt—we need the user prompt, which will remind AI about JSON. Remind AI to use JSON.

In this project, we are not chatting with the AI, right? The user prompt is not getting into a back-and-forth chat. We're just sending an image—no text—and getting back an answer. Nevertheless, the messages list still requires a user dictionary.

So the user part—it’s got to be: role = user, content = […]. Here, the content is itself a list of dictionaries, as we must provide the user prompt text as well as the Base64 image data. In the prompt, we're going to reiterate the importance of answering in JSON.

Now proceed with extra care at this point. It's getting easy to mess up. The `create()` is going to end with brackets, curly braces, and parentheses—nested six deep.

So it would be helpful to comment these closers line by line, which I have begun to do, right? You don’t usually see—programmers don’t typically comment their closers—but I like to. So there’s three of them now. There’s going to be three more. It’s going to be really hard to read at the end.

So after the total calories in meal thing, we’re going to add another dictionary. So the end of the user dictionary—that’s the end of the user dictionary. No, excuse me—that is not the end.

That’s just the end of the content. That’s the end of the user dictionary. We’re going to next open up another little dictionary: role, content.

And this is—this is—that was for the system. Whoa—that’s actually the system. That’s a system dictionary.

Let’s end system dictionary, right? System. We have another dictionary now. And it’s going to be: role = user, content = […].

And the content is going to be quite short: "Answer in JSON. Provide information about this meal as JSON per the instructions."

Like just refer to the instructions: "Provide information about this meal as JSON per the instructions." And it’ll figure it out.

What instructions? The one that I gave you as a system. Oh—and this itself needs to be a list. Going to be a list with type = "text", as opposed to "image".

Because we also have to provide image content. Actually, it’s just going to be the text, right? Yep. Type is text.

And then the text is this string. That is itself a dictionary inside a list. Close or end user text dictionary.

And then this would be: end user role dictionary, end system role dictionary. Role-content, role-content. So this is the type = text.

This one right here is inside. Then we need another one. This is for the image.

So in the content of the user, we’re providing not just one piece of information. We’re not just providing content as text. We’re providing content as text and an image.

So they’re broken out into their own little sub-dictionaries, which live in their own little list. Now you’re looking at: end user content list. There it is.

And next we’re going to add the Base64 image data to the user prompt. So right after the end user content text dictionary or type = text, here we go again. We’re going to do a type called "image_url".

We need to spell out—and then there’s another property called "image_url", the value of which is that Base64 encoding.

Well—the value of which is actually a dictionary with a "url" property, the value of which is the info about the Base64. So it gets pretty nested and quite complicated. And no, you’re not going to remember how to do this.

So you just have to know that you have to do this. And you break out the code. So the URL value is going to be—we’re going to do a little f-string here.

It’s going to be: `data:image/jpeg;base64, ` and then the Base64 image variable.

And then we close. Well, this thing is closed then. All right.

This one, right—it’s got its own: end URL dictionary, end type image URL dictionary. Now, if you look, you're six-deep here.

So it's getting pretty nested. We could outdent a little bit. We're allowed to do that.

That’s why you label this stuff. Otherwise, it’s just curly-curly, square-square-bracket, curly-square-bracket, parentheses—and you have no idea what any of it is. So yes, we want to comment this stuff—these closing braces.

So our minor masterpiece of prompt engineering and nested coding returns a response. Unpack that. That returns an AI response—unpack that—and save the result as meal JSON. This would be your AI answer. And remember—it’s `response.choices[0].message.content`—choice zero, first item in the choices array.

That is what the AI says. We'll say `meal_json`—this is the AI’s answer.

AI answer. Well—we’ll just call it AI’s `meal_json`. I think you should call it AI, because it’s not just any old JSON.

And we are going to return that. I'm going to say: `return json.loads(ai_meal_json)`. You could take this entire `response.choice[…]` and pass it in there, but it’s nicer to do it in two steps—easier to read.

This is going back to the JS: fetch → then → then, right? Finally, remove the temp file. Literally finally.

Finally: `os.remove(temp_file.name)`. And I don’t think we want the exception right now. We can still do the exception, but then we have to do `finally` at the end.

We could say after the exception—the except part—move the temp file from the system. I mean, as we are done with it, just delete the temp file from memory.

Since we are done with it—we are now done with it. And then you end. I suppose we could keep the except part.

A little context here isn’t bad. Yeah, we have room too. Good.

All right. Finally. And then this—then we end it all.

Gosh. So difficult. A lot of stuff.

Wow. You can hang with this. If you’ve hung with it this far—congrats.

Nice. Good job. Seriously.

I mean—who’s even listening to this? If you’re listening to this, you stuck with it a long time. But you’re getting the payoff now, right? Okay. Run the app and test it on a meal image.

Run the app—Server 03. Should load the page.

Click Choose File. Browse for food image. We have provided some, right? You have a folder full.

Add more that you like. When the image appears, click Analyze to submit it to the AI for analysis. Oh, well, we get it.

After a delay, the AI’s should appear. Be patient. If there are no errors in the console, it’s just thinking.

We don’t have a spinner to indicate progress. So just give it a little time. It could take up to 30 seconds.

It could be as fast as five seconds. We did not do a little spinner GIF. Oh well.

So that ought to work. Let’s quit the server. Let’s see if it goes.

Okay. Por favor. Let’s do it.

Tell me about this Thanksgiving dinner. Classic. Oh—undefined.

Okay. We got the name and it got the calories, but it’s undefined on the description. So I probably have the description name wrong.

Okay. Let’s fix that. Meal description.

What are we referring—what are we trying to call it in the JS? Oh—it’s meal description. And we should also say calories. Label that.

One more time. So I had the wrong name—the wrong property name in the JavaScript.

There. Boom. Quinoa Stuffed Avocado Boats.

Indulge in creamy avocado halves generously filled with fluffy quinoa. See a savory description.

Cherry tomatoes, fresh herbs, and a sprinkle of seeds for delightful crunch. A refreshing and nutritious dish.

Perfect for any meal. Yeah. I have to agree.

Those Avocado Boats look pretty good. And low-cal. Oh, there you go.

We already have one. I'm going to switch. I like the Avocado Boats.

Well, maybe do two. Why not have two, right? Just to show the AI can handle whatever.

Vegetarian, meat—you name it. All right. And then finishing up with our final code.

There it is. And that completes Lesson 14. We have an image analyzer working using Fetch, Python, JS Fetch, Python Flask, and of course, star of the show, the OpenAI API model GPT-4.0.

Key Insights

Brian McClain

How to Learn Python