Integrating Image Data with OpenAI API

Explain how to send uploaded image data from JavaScript to Flask, encode it as Base64, and submit it with a text prompt to the OpenAI API to receive and display the JSON-formatted AI answer on the webpage.

Discover how to seamlessly send image data to the OpenAI API using Flask, JavaScript, and Base64 encoding. This article walks you through updating your server-side Python code and JavaScript to handle image analysis and JSON responses.

Key Insights

  • Demonstrates encoding an uploaded image with Base64 in Python to safely transmit data to the OpenAI API, converting binary image data into a text-based string suitable for JSON communication.
  • Highlights the importance of updating JavaScript fetch calls to handle JSON responses from Flask, specifically returning three key properties—meal name, description, and total calories—and displaying them on the webpage.
  • Explains necessary server-side adjustments, including importing dependencies (OpenAI, JSON, Base64, OS) and instantiating the OpenAI client to process and analyze images effectively.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

This is a lesson preview only. For the full lesson, purchase the course here.

Hi, welcome back to this lesson 14 of Python for AI apps with Flask, JavaScript and the OpenAI API. My name is Brian McLean, thanks for coming back. So in this lesson what we're going to do is send our image data, which we know we've got an image that we uploaded and we've sent the image to Flask and we got some jibberishy string back indicating that it was able to make a temp file URL for the image, so we know the image made its way over to Flask.

In that lesson though, lesson 13, we did not actually send the image to the OpenAI API because that involves a rather sophisticated prompt where we have to specify JSON and we have to get back JSON, handle the response, send that back to fetch, to output to the web page, and there are a bunch of extra steps we have to take. We have to import some more dependencies and we also have to encode the image data as something called base64. So that's why we broke that whole thing into extra steps.

So in this lesson we're going to do all that stuff I just said, which probably doesn't make any sense, which is why we're going to now dive into the book and the code. So this is the after here. We don't have a spinner, we don't have anything indicating progress when we hit analyze, and it does take a little while for the AI to answer.

Could take five seconds like just now, it could take 30 seconds. So there's the result. It gave us the name, description, and total calories, which is all we asked for.

We didn't ask for the macronutrients or the micronutrients. That's all for later. We have it working.

Python for Data Science Bootcamp: Live & Hands-on, In NYC or Online, Learn From Experts, Free Retake, Small Class Sizes,  1-on-1 Bonus Training. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

So let's now rebuild this continuing from where we left off in the previous lesson. Lesson 14. We're sending image data to Flask and returning temp URL.

We're actually doing a lot more than that. We're sending image data to OpenAI, sending prompt an image to OpenAI API and getting answer and get AI answer. In this lesson we'll send our uploaded image data to Flask, which will make a temp file URL, which we did before in the previous lesson.

We're going to load in and encode that image, temp image, which we did, and send the image with text prompt to the OpenAI API. This is the new stuff, sending the image to the OpenAI API along with a prompt. We have to send two things to the OpenAI API, a text prompt and the image itself.

We're going to then return the result, which will be the AI's answer to JSON, answer to JS as JSON. It's going to be JSON that we send back. The JS will output the AI's answer to the web page.

To achieve all this, we need a lot more server code. The JS changes are few. The HTML is unchanged.

We change the server, the PY file a lot, the JS only a little bit, and the HTML not at all. Step one in the JS, update the second then to handle the AI response. In the JS, we have to do just two things.

We have to get the tags for displaying the meal info. That would be meal name, meal description, total calories. Then we need to update the second then in our fetch then then triple play to output the AI's response, which will be JSON, a parsed object with three properties.

We're going to tell the AI to answer us as JSON using these exact three properties. That's later. That's in the server code, which, as I said, involves a lot of changes.

Right now in the JS, we're just going to anticipate that we're getting back the expected JSON from the AI, from the server via the AI. Let's open up Meal Analyzer 02 HTML and save as Meal Analyzer 03. No changes.

The goal is to get AI-generated output answers, right, info to these three tags. Update the name of the JS file being imported. That's the only change.

So that's not really a change. Open Meal Analyzer 02 JS and save as Meal Analyzer 03 JS. This file will have a few changes as described.

02 becomes 03. In Meal Analyzer 03 JS, we're going to get the P tags for the description and calories info. We already have the meal name.

So let's just copy paste. We'll say description. I don't want to leave it at that.

That sounds like a string. Description P, as in P tag, is query selector, hashtag description. And calories P, hashtag calorie count.

Those being the IDs of those tags. In the second then down after the fetch then then, output the AI's property values for their respective tags. We're anticipating that we're getting back JSON, which we parse.

We don't have to change that line. We're just parsing. Same deal.

And then we'll have the result object with a meal name property, meal description, and total calories. And that's what we're outputting to its respective tags. We're not going to just, we're not outputting this file path at all.

That will be the result object dot meal name. And of course we have to make all this in the server later, these properties. Description, and that will be the description P. And we're going to output total calories to the calories P. That could be text content, but we'll just leave it at that.

And close the curly braces of the second then and the parentheses. Finishing off with this catch, which has not changed, nor has the parsing of the response object changed. That's the same too.

Posting the form data is the same. Prompting the user to upload an image if they didn't do so is the same. We know the function works.

Find function, run user, click to analyze. This is short in some of this. Yeah, all that stuff's the same.

No change other than grabbing these two tags. And then outputting to the three tags that we have grabbed the data coming back from the AI. So this is assuming that we have the server working.

Not much in the way of changes, as I said, on the JS file. So that takes us to step two. All right, step one is just update the second then.

Step two, in the user server file, import additional dependencies. So step two, you know, writ large is make the whole server work. But breaking it down, we're starting by, we will need additional dependencies.

From server meal analyzer 02, save as, call it 03. Import additional dependencies. We need open AI, JSON, Base64, and OS.

Open AI, of course, we've worked with this before. JSON is used by a route function to return existing JSON. Not to be confused with JSONify, which we have also used, which makes JSON from a dictionary.

Base64, that's new to us. That is for encoding image data into a string of alphanumeric characters for transmission across a network. In other words, it takes an entire image and turns it into a bunch of letters and numbers, slashes, and plus signs.

This gargantuan thing. But it makes it safe to transmit as text across a network. OS operating system is for accessing the system, which we do at the very end of the server script to delete the big image temp file from system memory.

So, here are our new imports. Import open AI. Import JSON.

Import Base64. Import OS. You can add all those comments.

You know what? Here, let's just do it one more time with the comments. We've got our app name. We've got our home route, although it needs to be 03 now.

We've got our upload route. We unpack the incoming image, make a temp file, and then we try saving the image temp file to the temp file, right? We're trying to make the temp file be the image. And then we're returning all this JSONify stuff.

This we're not going to do. That is not what we're doing here. So, open the image in binary format onto step 3 and encode it as Base64.

Lesson 14, step 3. In the try block, after declaring the image path variable, open the image. We use that with open before to open up to load the FAQ if you're calling the AI chat assistant. The second argument, besides the path that we're opening, this last time with the chat assistant was RB for read.

It was R for read. This is RB for read binary. We use binary mode because image files are stored as raw binary data, not as plain text.

And we're going to save the result as this alias variable, which we'll call foodimg, food image. So, we'll say with open image path RB as foodimg. Open the image, open the temp image as in binary mode.

Okay. And then we have to encode it as Base64. I've got info about Base64 encoding in here.

You can read more about it. On the Base64 object next, inside this with, we're going to call the B64 encode method, which we're going to pass the foodimg variable with the read method called on that, which is going to return the image. It's going to read or load the image, reads the entire image file as binary data.

And it's going to encode it as in B64 format, which is this big alphanumeric representation or string. And we'll save the result as Base64 img. So, we're going to say in here, Base64 img is what we'll call the result.

That's going to be a call to the Base64 method called Base64 encode. We're going to encode food image. Let's see if I got that right.

Yep. B64, B64 encode. The food image dot read.

We have to call the read method on the food image, which will actually load it. But what is Base64 encoding anyway? Base64 is a method of converting binary data, such as images, into a text-based string that only uses letters, numbers, plus signs and slashes. This is useful for embedding images in text-based formats like HTML, JSON, or databases without worrying about special characters causing issues.

A Base64 encoded image is much larger than the original file, but can be safely transmitted as text. So, we've got this Base64 thing. This Base64 img, we need to do one more thing to it.

We have to chain the decode method, passing it utf-8 onto the Base64 encode method. So, it's chaining, right? One method is done, chain another one, and that will convert the Base64 into a human-readable string of gargantuan string of the numbers and letters and slashes and plus signs. We have to add that.

Dot decode utf-8. And you might be thinking, how am I supposed to remember that particular move? And the answer is, of course, you're not. You just look it up when and if you ever need it.

Hopefully, when. Because this is a core move. If you ever want to do, if you ever want to submit images to the OpenAI API for analysis, you must have that move in there.

All right, the next step, we're going to write the text prompt now for sending the image to the OpenAI API. Write the text prompt for, write the text prompt to the OpenAI API. The text part of the OpenAI API prompt, because we need the image as well.

But we start with the text. All the APIs create method, which, as we recall, requires considerable drilling, right? It's got four Cs in a row, client.chat.completions.create. Takes all these arguments, the model, the response format, the messages list of dictionaries, right? So, and that will create method, will make the call, and it'll come back with an answer, hopefully, which will be stored as the response object. So, we're going to specify our trusty GPT-4 model, and we are also going to specify that we want JSON coming back.

And then we will get into the messages list, which takes these dictionaries of role content. A little more complex this time than before with the AI chat assistant. So, we took out the return statement, and instead we're going to send prompt with image to OpenAI response and save response.

That's going to be client.chat.completions. .create. And that's a big, that's, create is method, takes a bunch. Oh, we have to define our, yikes, let's go up. We got to go up.

We have to, there's a bunch of stuff we have to do. We have to, we have to instantiate, instantiate an OpenAI client and pass it our API key. That's a huge step.

So, we got to do that. All right. All that's step three.

This will become step four, step five, and so on. A bunch of steps. We're going to just do this in one to back up a little bit.

Grab. No. Yeah.

I'm going to just because let's just, I don't, I don't have the API key memorized, right? I need to copy it anyway. So, that the API key as a variable. Nope.

That's for sure. We need that. Not getting anywhere without the API key.

That's fake key, obviously, you know, all that. Instantiate. OpenAI client.

Passing it the API key. Just one little step. If you look at how much time it takes to fix one little step, you can appreciate how much time it took to do this little book.

All right. So, yep. That's what we need.

Brian McClain

Brian is an experienced instructor, curriculum developer, and professional web developer, who in recent years has served as Director for a coding bootcamp in New York. Brian joined Noble Desktop in 2022 and is a lead instructor for HTML & CSS, JavaScript, and Python for Data Science. He also developed Noble's cutting-edge Python for AI course. Prior to that, he taught Python Data Science and Machine Learning as an Adjunct Professor of Computer Science at Westchester County College.

More articles by Brian McClain

How to Learn Python

Master Python with hands-on training. Python is a popular object-oriented programming language used for data science, machine learning, and web development. 

Yelp Facebook LinkedIn YouTube Twitter Instagram