Leveraging AI for Meal Image Analysis

Create an image analysis application using JavaScript Fetch, Python Flask, and OpenAI's GPT-4 API to identify meals, provide savory descriptions, and calculate calorie totals from uploaded meal images.

Create an AI-powered image analyzer using OpenAI's GPT-4 model, Python Flask, and JavaScript Fetch. Learn advanced prompt engineering techniques to extract detailed meal information in JSON format from images.

Key Insights

  • Implement precise prompt engineering to instruct the GPT-4 model exactly how to format its JSON response, clearly defining properties like meal name, savory description, and total calories.
  • Effectively manage deeply nested dictionaries and lists in Python, carefully commenting each closure to maintain readability and prevent errors in complex structures.
  • Utilize base64 encoding to embed images directly into API requests, enabling seamless transmission from JavaScript Fetch to a Python Flask backend, which communicates with OpenAI's API.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

This is a lesson preview only. For the full lesson, purchase the course here.

Back to biz where we were. Okay, open the image in binary format and encode it. Okay, we did that.

Explain what binary base64 is a little bit. Can we get this in one line? Nope. Well, we could cheat, make it one smaller.

Okay, write the text prompt. Okay, so we're going to do that response client creations, completions create, which we got started on here, and we're going to say the model equals GPT-4 little o and the response format equals response, as I say, oh type JSON object, right, JSON object, and then messages, which we start off with as a list, and we know we need these two dictionaries, one for the system, one for the user. For the first item in the messages list, provide the system prompt informing the AI of its culinary expertise.

We're going to say, state this one out for now, so we're going to say role system content, you are a dietitian and chef, a user sends you an image of a meal, and you answer with the name of the meal, a savory description, and the total calories. The restaurant menu style style name, you know, savory description of the meal, and the total calories of the meal, and just keep repeating that. Probably don't need to.

Okay, so what that's going to get you is that prompt will get us a string of info. I mean, we're saying we're on JSON, but we're not really telling it how to answer with the JSON, and we're leaving it to the AI, perhaps to figure out how to break that up into key value pairs. Better would be if we just explicitly told the AI what we want.

Python for Data Science Bootcamp: Live & Hands-on, In NYC or Online, Learn From Experts, Free Retake, Small Class Sizes,  1-on-1 Bonus Training. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

That's prompt engineering. We specify JSON here in the response format for a reason. We want the AI's answer in parsable chunks of properties, key value pairs, for output to their respective HTML tags, right? We want to be able to get the name, output it to the name, get the description, output it to the description, right? We need to be more specific in our prompt.

We need to spell out the JSON. So step six, prompt engineering, tell AI exactly how to answer in the JSON format. We're going to engineer the system prompt now, spelling out precisely how we want the AI to provide its JSON answer.

JSON requires double quotes. We're going to literally like write the JSON, which has to be in double quotes, the keys and the string values. And of course, our content string, you are a dietitian and chef, all that stuff, that is also in double quotes.

We've got nested double quotes. We need to escape the inner JSON double quotes. And we do that in kind of a cool move by wrapping the content part, the outer string, in triple double quotes.

And then cooler still is we have to escape the curly braces of the JSON that we're describing to the AI. And we can't let it get confused with the curly braces of the prompt. We're going to escape the inner JSON curly braces with double curly braces so that they don't conflict with the dictionary's own outer curly braces.

So here's the move. So first, we're going to go to the content and do triple curlies so that we can come in with double quotes. And then we're going to add on respond in JSON format with keys called meal name, meal description, total calories.

Meal name, meal description, and total calories. Then we're going to give it the JSON. We're going to write the JSON here.

But we're going to do it in sets of double double quotes to escape it. The prompt itself has these not quotes, curlies. So the prompt has curlies.

There's inner curlies. We escape them as double curlies. I'm going to say meal name, um, name of the meal, just the value.

Just write it how you want it to be. Meal description, savory description of the meal, total calories, total calories in the meal. We end with this double curly triple quote as we come out of the escapes, right? Whereas there it is.

And then you're going to come out of the user prompt, come out of the messages, come out of, we should actually comment this stuff. We'll say end user dictionary and messages list and create method. Because there's going to be more and it's going to get, it's already getting kind of complicated.

More prompt engineering. Now the user prompt, we need the user prompt, which will remind us, remind AI about JSON. Remind AI to do JSON, to use JSON.

In this project, we are not chatting with the AI, right? We're not getting into, the user prompt is not getting into a back and forth chat. We're just sending an image, no text and getting back an answer. Nevertheless, the messages list still requires a user dictionary.

So the user part, it's got to be the role user, you know, user role content. Here the content is itself a list of dictionaries, as we must provide the user prompt text, as well as that base 64 image data. In the prompt, we're going to reiterate the importance of answering in JSON.

Now proceed with extra care at this point. It's getting easy to mess up. The create is going to end, will end with brackets, curly braces, and parentheses nested six deep.

So it would be helpful to comment these closers line by line, which I have begun to do, right? You don't usually see, for some reason, programmers don't typically comment their closers, but I like to. So there's three of them, there's three now, there's going to be three more. It's going to be really hard to read at the end.

So after the total calories in meal thing, we're going to add another dictionary. So the end of the user dictionary, that's the end of the user dictionary. No, excuse me, that is not the end.

This, that's just the end of the content. That's the end of the user dictionary. We're going to next open up another little dictionary role content.

And this is, this is, that was for the system. Whoa, that's, that's actually the system. That's a system dictionary.

Let's end system dictionary, right? System. We have another dictionary now. And it's going to be role user comma content.

And the content is going to be quite short. Answer in JSON. Provide information about this meal as JSON per the instructions.

Like just refer to the instructions. Provide information about this meal as JSON per the instructions. And it'll figure it out.

What instructions? The one that I gave you as a system. Oh, and this itself needs to be a list. Going to be a list with type text as opposed to image.

Because we also have to provide image content. Actually, it's just going to be the text, right? Yep. Type is text.

And then the text is this string. That is itself a dictionary inside a list. Close or end user text dictionary.

And then this would be end user role dictionary. End system role dictionary, right? Role content, role content. So this is the type text.

This one right here is inside. Then we need another one. This is for the image.

So in the content of the user, we're providing not just one piece of information. We're not just providing content as text. We're providing content as text and an image.

So they're broken out into their own little sub dictionaries which live in their own little list. Now you're looking at end user content list. There it is.

And next we're going to add the base 64 image data to the user prompt. So right after the end user content text dictionary or type text, whatever, here we go again. We're going to do a type called image URL.

We need to spell out. And then there's another property called image URL. The value of which is that base 64 encoding.

Well, the value of which is actually a dictionary with a URL property, the value of which is the info about the base 64. So it gets pretty nested and quite complicated. And no, you're not going to remember how to do this.

So you just have to know that you have to do this. And you break out the code. So the URL value is going to be, we're going to do a little F string here.

It's going to be data colon image JPEG. That's how we're going to send the data. We're going to make a JPEG.

Doesn't matter what we sent over here. This is our temporary image. We're telling it how to receive and understand this image.

So that'll be F data image. Okay. Let's say F data colon image JPEG semi-colon another value base 64 comma, and then base 64 image variable.

And then we close. Well, this thing is closed then. All right.

This one, right. It's got its own end URL dictionary and type image URL dictionary. So it's just now, if you look, now you're into six deep here.

So it's getting pretty nested. We could out down a little bit. We're allowed to do that.

That's why you label this stuff. Otherwise it's just curly, curly, square, square bracket, curly, square bracket, parentheses, and you have no idea what any of it is. So yes, we want to, we want to comment this stuff, these closing braces.

So our minor masterpiece of prompt engineering and nested coding returns response unpack that returns an AI response, unpack that and say the result as meal JSON. This would be your AI answer. And remember it's response.choices zero message content choices zero first item in the choices array.

That is what the AI says. We'll say meal JSON. This is the AI's answer.

AI answer. Well, we'll just call it AI's, AI meal JSON. I think you should call it AI because it's not just any old JSON.

And we are going to return that. I'm going to say return JSON loads AI meal JSON. You could take this entire response.choice and pass it pass it in there, but it's nicer to do it in two steps, easier to read.

This is going back to the JS. Fetch, then, then, right? Finally remove the temp file. Literally finally.

Finally, OS.remove temp file that name. And I don't think we want the exception right now. We can still do the exception, but then we have to do finally at the end.

We could say after the exception, the except part, move the temp file from the system. I mean, as we are done with it, just delete, delete the temp file from memory. As we are done with it, just delete it.

Since we are done with it, we are now done with it. And then you end. I suppose we could keep the except part.

A little context here isn't bad. Yeah, we have room too. Good.

All right. Finally. And then this, then we end it all.

Gosh. So difficult. A lot of stuff.

Wow. You can hang with this. If you've hung with it this far, congrats.

Nice. Good job. Seriously.

I mean, who's even listening to this? If you're listening to this, you stuck with it a long time. But you're getting their payoff now, right? Okay. Run the app and test it on a meal image.

Run the app. Server 03. Should load the page.

Click choose file. Browse for food image. We have provided some, right? You have a folder full.

Add more that you like. When the image appears, click analyze to submit it to the AI for analysis. Oh, well, we get it.

After delay, the AI's should appear. Be patient. If there are no errors in the console, it's just thinking.

We don't have a spinner to indicate progress. So just give it a little time. It could take up to 30 seconds.

It could be as fast as five seconds. We did not do a little spinner gif. Oh, well.

So that ought to work. Let's quit the server. Let's see if it goes.

Okay. Por favor. Let's do it.

Tell me about this Thanksgiving dinner. Classic. Oh, undefined.

Okay. We got the name and it got the calories, but it's undefined on the description. So I probably have the description name wrong.

Okay. Let's fix that. Meal description.

What are we referring, what are we trying to call it in the JS? Oh, it's meal description. And we should also say calories. Label that.

One more time. So I had the wrong name. The wrong property name in the JavaScript.

There. Boom. Quinoa stuffed avocado boats.

Indulge in creamy avocado halves. Generously filled with fluffy quinoa. See a savory description.

Cherry tomatoes. Fresh herbs and a sprinkle of seeds for delightful crunch. Refreshing and nutritious dish.

Perfect for any meal. Yeah. I have to agree.

Those avocado boats look pretty good. And locale. Oh, there you go.

We already have one. I'm going to switch. I like the avocado boats.

Well, maybe do two. Why not have two? Right. Just to show the AI can handle whatever.

Vegetarian, meat, you name it. All right. And then finishing up with our final code.

There it is. And that completes lesson 14. We have an image analyzer working using Fetch, Python, JS Fetch, Python Flask, and of course, star of the show, the OpenAI API model GPT-4.0.

Brian McClain

Brian is an experienced instructor, curriculum developer, and professional web developer, who in recent years has served as Director for a coding bootcamp in New York. Brian joined Noble Desktop in 2022 and is a lead instructor for HTML & CSS, JavaScript, and Python for Data Science. He also developed Noble's cutting-edge Python for AI course. Prior to that, he taught Python Data Science and Machine Learning as an Adjunct Professor of Computer Science at Westchester County College.

More articles by Brian McClain

How to Learn Python

Master Python with hands-on training. Python is a popular object-oriented programming language used for data science, machine learning, and web development. 

Yelp Facebook LinkedIn YouTube Twitter Instagram