Integrating Image Data with OpenAI API

Discover how to seamlessly send image data to the OpenAI API using Flask, JavaScript, and Base64 encoding. This article walks you through updating your server-side Python code and JavaScript to handle image analysis and JSON responses.

Key Insights

Demonstrates encoding an uploaded image with Base64 in Python to safely transmit data to the OpenAI API, converting binary image data into a text-based string suitable for JSON communication.
Highlights the importance of updating JavaScript fetch calls to handle JSON responses from Flask, specifically returning three key properties—meal name, description, and total calories—and displaying them on the webpage.
Explains necessary server-side adjustments, including importing dependencies (OpenAI, JSON, Base64, OS) and instantiating the OpenAI client to process and analyze images effectively.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

This is a lesson preview only. For the full lesson, purchase the course here.

Hi, welcome back to this Lesson 14 of Python for AI apps with Flask, JavaScript, and the OpenAI API. My name is Brian McLean, thanks for coming back. So in this lesson, what we're going to do is send our image data—we know we've got an image that we uploaded, and we've sent the image to Flask, and we got some jibberishy string back indicating that it was able to make a temp file URL for the image. We know the image made its way over to Flask.

In that lesson, though—Lesson 13—we did not actually send the image to the OpenAI API because that involves a rather sophisticated prompt where we have to specify JSON, and we have to get back JSON, handle the response, send that back to fetch to output to the webpage, and there are many extra steps we have to take. We have to import some more dependencies, and we also have to encode the image data as something called Base64. So that's why we broke that whole thing into extra steps.

So in this lesson, we're going to do all that stuff I just said—which probably doesn't make any sense, which is why we're going to now dive into the book and the code. So this is the after here. We don't have a spinner, we don't have anything indicating progress when we hit Analyze, and it does take a little while for the AI to answer.

Could take five seconds like just now; it could take 30 seconds. So there's the result. It gave us the name, description, and total calories, which is all we asked for.

We didn't ask for the macronutrients or the micronutrients. That's all for later. We have it working.

Python for Data Science Bootcamp: Live & Hands-on, In NYC or Online, Learn From Experts, Free Retake, Small Class Sizes, 1-on-1 Bonus Training. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

So let's now rebuild this, continuing from where we left off in the previous lesson—Lesson 14. We're sending image data to Flask and returning a temp URL.

We're actually doing a lot more than that. We're sending image data to OpenAI—sending a prompt and image to the OpenAI API and getting an answer. In this lesson, we'll send our uploaded image data to Flask, which will make a temp file URL, which we did before in the previous lesson.

We're going to load in and encode that image—temp image, which we did—and send the image with a text prompt to the OpenAI API. This is the new stuff: sending the image to the OpenAI API along with a prompt. We have to send two things to the OpenAI API: a text prompt and the image itself.

We're going to then return the result, which will be the AI's answer, to JSON—answer to JS as JSON. It's going to be JSON that we send back. The JS will output the AI's answer to the webpage.

To achieve all this, we need a lot more server code. The JS changes are few. The HTML is unchanged.

We change the server—the PY file—a lot, the JS only a little bit, and the HTML not at all. Step one in the JS: update the second then to handle the AI response. In the JS, we have to do just two things.

We have to get the tags for displaying the meal info. That would be meal name, meal description, total calories. Then we need to update the second then in our fetch-then triple play to output the AI's response, which will be JSON—a parsed object with three properties.

We're going to tell the AI to answer us as JSON using these exact three properties. That's later—that's in the server code, which, as I said, involves a lot of changes.

Right now in the JS, we're just going to anticipate that we're getting back the expected JSON from the AI—from the server via the AI. Let's open up Meal Analyzer 02 HTML and Save As Meal Analyzer 03. No changes.

The goal is to get AI-generated output answers—right—info to these three tags. Update the name of the JS file being imported. That's the only change.

So that's not really a change. Open Meal Analyzer 02 JS and Save As Meal Analyzer 03 JS. This file will have a few changes as described.

02 becomes 03. In Meal Analyzer 03 JS, we're going to get the P tags for the description and calories info. We already have the meal name.

So let's just copy-paste. We'll say description. I don't want to leave it at that.

That sounds like a string. Description P, as in P tag, is query selector hashtag description. And Calories P, hashtag calorie count.

Those being the IDs of those tags. In the second then down after the fetch then, output the AI's property values for their respective tags. We're anticipating that we're getting back JSON, which we parse.

We don't have to change that line. We're just parsing. Same deal.

And then we'll have the result object with a meal name property, meal description, and total calories. And that's what we're outputting to its respective tags. We're not going to just—we're not outputting this file path at all.

That will be the resultObject.mealName. And of course, we have to make all this in the server later—these properties. Description, and that will be the Description P. And we're going to output total calories to the Calories P. That could be text content, but we'll just leave it at that.

And close the curly braces of the second then and the parentheses. Finishing off with this catch, which has not changed, nor has the parsing of the response object changed. That's the same too.

Posting the form data is the same. Prompting the user to upload an image if they didn't do so is the same. We know the function works.

Find function, run user click to Analyze. This is short in some of this. Yeah, all that stuff's the same.

No change other than grabbing these two tags. And then outputting to the three tags that we have grabbed the data coming back from the AI. So this is assuming that we have the server working.

Not much in the way of changes, as I said, on the JS file. So that takes us to Step Two. All right, Step One is just update the second then.

Step Two: In the user server file, import additional dependencies. So Step Two, writ large, is make the whole server work. But breaking it down, we're starting by—we will need additional dependencies.

From server meal analyzer 02, Save As, call it 03. Import additional dependencies. We need OpenAI, JSON, Base64, and OS.

OpenAI, of course—we've worked with this before. JSON is used by a route function to return existing JSON—not to be confused with jsonify, which we have also used, which makes JSON from a dictionary.

Base64—that's new to us. That is for encoding image data into a string of alphanumeric characters for transmission across a network. In other words, it takes an entire image and turns it into many letters and numbers, slashes, and plus signs.

This gargantuan thing. But it makes it safe to transmit as text across a network. OS—Operating System—is for accessing the system, which we do at the very end of the server script to delete the big image temp file from system memory.

So, here are our new imports. Import openai. Import json.

Import base64. import os. You can add all those comments.

You know what? Here, let's just do it one more time with the comments. We've got our app name. We've got our home route, although it needs to be 03 now.

We've got our upload route. We unpack the incoming image, make a temp file, and then we try saving the image temp file to the temp file, right? We're trying to make the temp file be the image. And then we're returning all this jsonify stuff.

This we're not going to do. That is not what we're doing here. So, open the image in binary format onto Step 3 and encode it as Base64.

Lesson 14, Step 3. In the try block, after declaring the image path variable, open the image. We used that with open before to load the FAQ, if you’re recalling the AI Chat Assistant. The second argument, besides the path that we're opening—this last time with the Chat Assistant—was "rb" for read.

It was "r" for read. This is "rb" for read binary. We use binary mode because image files are stored as raw binary data, not as plain text.

And we're going to save the result as this alias variable, which we'll call `foodimg` (food image). We'll say: `with open(image_path, "rb") as foodimg:`. Open the image—open the temp image—in binary mode.

Okay. And then we have to encode it as Base64. I've got info about Base64 encoding in here.

You can read more about it. On the Base64 object next, inside this with, we're going to call the `b64encode` method, which we're going to pass the `foodimg` variable with the `read` method called on it, which is going to return the image. It’s going to read or load the image—reads the entire image file as binary data.

And it's going to encode it as in Base64 format, which is this big alphanumeric representation or string. And we'll save the result as `base64_img`. We're going to say in here: `base64_img = base64.b64encode(foodimg.read())`.

We have to call the `read()` method on the `foodimg`, which will actually load it. But what is Base64 encoding anyway? Base64 is a method of converting binary data such as images, into a text-based string that only uses letters, numbers, plus signs, and slashes. This is useful for embedding images in text-based formats like HTML, JSON, or databases without worrying about special characters causing issues.

A Base64-encoded image is much larger than the original file, but it can be safely transmitted as text. We've got this Base64 thing. This `base64_img`, we need to do one more thing to it.

We have to chain the `decode()` method, passing it `"utf-8"`, onto the Base64 encode method. So it's chaining, right? One method is done, chain another one, and that will convert the Base64 into a human-readable string—a gargantuan string of the numbers and letters and slashes and plus signs. We have to add that.

`.decode("utf-8")`. And you might be thinking, how am I supposed to remember that particular move? And the answer is, of course, you're not. You just look it up when and if you ever need it.

Hopefully when. Because this is a core move. If you ever want to submit images to the OpenAI API for analysis, you must have that move in there.

All right, the next step—we're going to write the text prompt now for sending the image to the OpenAI API. Write the text prompt to the OpenAI API. The text part of the OpenAI API prompt—because we need the image as well.

But we start with the text. All the API's `create()` method, which, as we recall, requires considerable drilling, right? It's got four Cs in a row: `client.chat.completions.create`. Takes all these arguments—the model, the response format, the messages list of dictionaries, right? And that `create()` method will make the call, and it'll come back with an answer, hopefully, which will be stored as the `response` object. We're going to specify our trusty GPT-4 model, and we are also going to specify that we want JSON coming back.

And then we will get into the messages list, which takes these dictionaries of `role` and `content`. A little more complex this time than before with the AI Chat Assistant. So, we took out the return statement, and instead we're going to send the prompt with the image to OpenAI and save the response.

That’s going to be `client.chat.completions.create`. And that’s a big method—`create()`—takes a bunch. Oh, we have to define our—yikes—let's go up. We’ve got to go up.

We have to—there's many things we have to do. We have to instantiate an OpenAI client and pass it our API key. That’s a huge step.

So, we’ve got to do that. All right. All that’s Step 3.

This will become Step 4, Step 5, and so on. A bunch of steps. We’re going to just do this in one—to back up a little bit.

Grab. No. Yeah.

I’m going to—just because—let’s just—I don’t—I don’t have the API key memorized, right? I need to copy it anyway. So, that’s the API key as a variable. Nope.

That’s for sure. We need that. Not getting anywhere without the API key.

That’s a fake key, obviously—you know all that. Instantiate the OpenAI client.

Passing it the API key. Just one little step. If you look at how much time it takes to fix one little step, you can appreciate how much time it took to do this little book.

All right. So, yep. That’s what we need.

Key Insights

Brian McClain

How to Learn Python