Advanced Data Analysis with ChatGPT

Upload files into ChatGPT to analyze data and generate visualizations using plain language instructions.

Unlock the power of data analysis seamlessly using ChatGPT's advanced data analysis capabilities. Learn how to effortlessly derive insights, visualize trends, and manage sensitive company data securely.

Key Insights

  • Perform detailed analyses by uploading Excel, CSV, or PDF files into ChatGPT, enabling functions like revenue calculations, annual growth comparisons, and data visualizations without needing in-depth Python coding knowledge.
  • Safeguard sensitive business data by disabling the training function in free and paid individual ChatGPT accounts; Teams and Enterprise accounts have this feature disabled by default to maintain data privacy and security.
  • Leverage Python-generated charts and interactive visualizations directly within ChatGPT, offering the flexibility to customize charts (e.g., adjusting colors, adding data labels) and export results as downloadable CSV files or images.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

This is a lesson preview only. For the full lesson, purchase the course here.

So, we do have a question about this from Terrence in the chat. So, one of the things you can do on your paid plan, so we're going to get into some other paid features here, is what they call advanced data analysis. They could have just called it, they could have called it just data analysis, but they put on advanced to make it sound better.

Like, did you ever have basic data analysis? No, they just went right to advanced. It's just, it's data analysis. So, you can, for example, upload an Excel file, or a PDF, or some file that contains something like text, whatever, whatever your input is.

You could also copy paste numbers in there, and then you can ask it to analyze it somehow, maybe create some visualizations. Now, if you're doing this, generally, you're doing this on business type data, a lot of times. So, just remember, if you're on the free or the paid plus plan, the individual's plans, I know we earlier went in and we disabled the training.

But if you're giving it company information, make sure you disable training, because you don't want to be giving away company secrets or anything to ChatGBT that they might know. Let's say you're uploading how much your company made. If somebody then asked later how much your company made, which is not public knowledge, maybe ChatGBT could learn how much your company made, even though it was never told to the public.

That might not be good. When your company wants to go public and have an IPO, and then they can go back and find out how much you were making, and they're like, oh, you were lying or something. Just make sure your training is turned off.

AI Classes: Live & Hands-on, In NYC or Online, Learn From Experts, Free Retake, Small Class Sizes, 1-on-1 Bonus Training. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

AI Classes & Bootcamps

  • Live & Hands-on
  • In NYC or Online
  • Learn From Experts
  • Free Retake
  • Small Class Sizes
  • 1-on-1 Bonus Training

Named a Top Bootcamp by Forbes, Fortune & Time Out

Learn More

Team and enterprise default to it off. Free and the plus, we got to turn it off, which you probably did, but just keep in mind, each account, if you have, let's say, a company account versus your own account, each account is different. So, make sure you turn those off for all of your accounts if you have multiple accounts.

So, this does have to be done with either 4.0 or the older 4.0 still did this. Before we got 4.0, this did work in the regular GPT-4, that old legacy model. So, we need to be able to upload a file or copy and paste files in.

But here, if we have an Excel file, I can upload this. Now, let me just give you an idea of what this would be like. So, here is Global Superstore here.

Let's look at this Excel file here. So, here I've got some orders, actually quite a lot of orders. In there, I've got the order date, the ship date.

Actually, let me open this up in Excel and zoom in. So, this is a lot of company data, 8.5 megs, so it's not huge. It's also not too small.

We've got some data in there. So, maybe you're not good at knowing how to do math, to do formulas. Maybe you're not good at doing data analysis, or maybe even if you know how to do it, maybe you just don't have the time.

So, we've got order dates, ship dates, so I know when it was ordered, when it was shipped. Ship modes, I know the kind of shipping, different segments, consumer, corporate. I know where they live, like states, countries, different marketing regions, like sales regions, I should say.

Categories, products. Then I know the sales amounts. I know how many were sold, the profit, how much shipping costs.

So, there's quite a bit of information here that I could ask questions on. Now, if you don't know what you have in there, you don't know what questions you can ask. We also have some stuff about returns, so we know which ones were returned.

People, these are for our different sales regions, so we know who was the salesperson in the various regions. But maybe I'm interested most in the orders, because I want to know something about our customers. So, I could ask things like, what was the total revenue per year? So, we could go, we could add up everything and do it per year.

So, I'm going to go to my paid account here and do a new one. We'll just use the regular 4.0, and I'll upload from my computer this global superstore file. I'll say, what was the total revenue per year? I do not have a column called revenue.

I didn't tell it where to look for the year. Notice I'm just using plain English language to ask, and let's see if it can figure out how to do this analysis. So, we give it a moment.

It shows you, which is nice, it shows you the different worksheets, so you can see a preview of this. So, you can see there's order date, ship date. And over here, we got the sales amounts.

And you would think it would calculate sales to figure out our revenue. And then based on per year. Well, per year, are we considering the order date for that? Are we considering the ship date for that? Because there's different dates.

So, let me provide the revenue in a table format. So, there was an issue making it into a table, but here's our total revenue. I'll calculate them and display them.

Now, I might be curious, what did it use for this? This little button here, view analysis. This is showing you the Python. Now, you do not have to know Python code, but you can look in here and see order date.

So, obviously it used order dates. And it grabbed the year from the order dates. So, not even understanding the code, because I could just see order dates, I know that they're using the order date, not the ship dates.

I could have said, calculate the revenue by year using the ship dates. But since I didn't, I guess they figured the order date is better than the ship dates, right? So, I could go back and say, use the ship dates instead, if that's what I wanted. Maybe I didn't think about that ahead of time.

And notice I don't have to repeat the whole thing again, because context in this chat, it understands what we're talking about. And see now, different revenue numbers. And if I look at the analysis, I can see they're using the ship date.

I don't take its word for truth. I always look at the stuff that it's generating and just double check. Because this gets tricky, because if you're making business decisions based on this, what if ChatGPT was wrong? That would be bad to make your decisions based on that.

So, this is where being able to at least look over some Python code to just kind of verify that it makes sense. You know, like here, they say they extract the year, right? So, they're kind of telling you what they're doing. You don't have to know how to write this stuff, but at least you look there and see, like, okay, they're using the ship year from sales, and they're summing it up.

Like, a lot of times this code, if you read it, will kind of make sense, even though you don't know how to write it. Look for the names, make sure they're using the right columns. Now, as it gets more complex, again, it might get harder to understand, but you want to kind of look over it and see, does this make sense? And I call it the sniff test.

Does it smell right? Like, do the numbers seem to make sense? Or does something smell a little fishy? Because if you look at something, you're like, wait a minute, we didn't make that much money. Like, you want to start investigating and seeing, like, wait, that seems too low, or that seems too high. Like, make sure that it does make sense, and try to double check it that way.

But it's pretty cool that I could upload that, and then they started doing this. Now, I don't want to use the ship date. And I'll say, go back to using the order date.

Sorry, I wanted to switch back, because I don't want to use that. Meanwhile, I want to think about other things. Like, for example, what's the year-over-year growth? Because it might not just be good to see the numbers, I want to see what my growth is.

What's the year-over-year growth? Notice I didn't say of revenue, because it understands that that's what we're talking about here. It understands the context. Okay, so now they put it into a chart.

Now they had the revenue. They don't include 2020, because there was no prior year. We can't know the growth of the prior year, if there was no prior year.

But then they put in the different revenues, which match. And then they have the year-over-year growth, 18%. So on.

Oh, and actually, this is pretty cool. I wonder if that's new. I don't remember them letting me refer to a cell here.

Or a column. Oh, actually, oh, this is cool. Like, this is new.

Like I said, every time I teach this class, there's something new, something different. So, like, here, I want to say, make this two decimal. That's the model.

Actually, no. Yeah, two decimal places. For that column.

Because that was way too many decimal places. Much better. And this, make this dollar.

I'm a lazy typer. So I want to see if chat can understand what I'm saying. So it should put a dollar sign.

And two decimal places. Right? So for this, it's somewhat different from the generative in that it stands with the exact number and it's not changing the number or reintegrating the number. Right.

So in theory, even though this is generative, it should, because math has an answer, it should, at least if it's doing its job, consistently give you the same answer. So all of you, if you're doing this, should, in theory, be getting the same exact answers. Even if it was presented in text versus a table, even if the way it's presented is different, you should get the same exact numbers.

Because it should, if it gets it right, always do the same math, and math does not change. Unlike language, which can, math shouldn't. I can't guarantee you foolproof stuff here, but it should be pretty consistent.

It's still doing the same generative stuff, but truth is truth, and so numbers don't lie, as they say. If it's doing the right thing, but understood what I said, in theory, that this should always give us the same answer. In theory, at least.

I can't guarantee that. But it should. You should be able to consistently get this.

Also, there's a download table button here, that if I click download table, that will download a CSV file, and CSV files you can open up in Excel. It's a comma-separated value file. Not numbers.

Put this up with Excel. Just CSV files are something that are a standard thing that you can open up in any app. If you want to get that into Excel, you can download this CSV file.

That opens up in Excel all the time. What else could I do? I could also say create a bar chart for that. Because maybe I want to turn this into a visual.

So I could say create a bar chart for that. And let it generate an image for me. Because Python has libraries that can generate graphics.

So behind the scenes, it writes Python to do all this. Now it's just turning those numbers into a chart. I might want to have data labels on these things.

Because I might want the numbers. So I'll say add data labels. Because otherwise I don't know the numbers.

I can see 2022 is higher than 2023, but I don't see the numbers on it. So those things are called data labels. So there it added the data labels.

Make the data label bold. So I'm going to make the data labels bold. See how you could run through your messages pretty quick, depending on what you're doing? Make them larger.

Now, if you think about what you want to do, you can try to cut down on the number of messages you use. I could have said make them bold and larger. But I forgot, so I used another message for that.

Change the blue to orange. And I could say to make the column labels bold or bigger. I could say make all the text larger.

All different things you could do to change the visuals. And you could download the chart as an image. So that's an image.

The interactive part. If I make it interactive. So this is a static, but if I make it interactive, now I can hover over things.

Yeah. What is this little button here? Oh, they even give me a little color picker. And I can even input, like, if I have a specific number, I can put in a specific number.

But that's if I'm in interactive mode. If I'm in the image mode, it writes the Python behind the scenes. And aren't you glad you didn't have to write the Python code? But if you did want it, you could actually copy the code, and you could go, and now you can have some Python code that they wrote to generate that.

So, yeah. So a lot of data scientists, data analysts do things like Python, and this is writing the code to do that. And you can even go full screen here, go bigger.

I'll make it small again. So because it's using Python, any sort of charts that Python can create, which are pretty exhaustive, you can do all sorts of different charts and graphs. If you want to see what's possible.

So they're using NumPy or something for this. But if you look at types of Python data visualization, types of data visualization Python. So lots of different, lots of different ones that are out there.

Tree maps, bar charts, scatter plots. So basically you need to say, yeah, just scatter plot on this column. Yeah, exactly.

Yeah, exactly. So there's lots of different, Python's very, very capable. That's one of the reasons why it's very popular with data scientists, data analytics.

Does it do maps? Yes. Yeah. I believe so.

I haven't tried it. I haven't tried it specifically, but in theory, at least, I would think that because I thought I can do it. It looks like the tragedy.

Should be able to do that, but I haven't said I haven't tried doing maps with it. Yeah. Terence, did you have any questions because you were asking about analysis? Do you have any questions about this? Yeah.

Perfect. Yes. Yeah, yeah, yeah.

So yeah, you can say like, does anything stand out in this data set. You can ask more specific things. Right.

So, to identify notable trends or anomalies, I'll analyze the following aspects growth trends, revenue levels, potential outliers. So. In 2022, the year shows the highest year over year growth rate at 27.2, which is higher than the 18.5 of 2021 and slightly above this one.

This could indicate an event or strategy that significantly boosted revenue that year. Consistent growth. The data reflects steady growth across the years, which suggests a well managed or expanding business.

There are no negative growth years highlighting positive performance trends. The highest total revenue was this suggests either peak operational capacity, increased market share or favorable market conditions. If you'd like, I can dig deeper into things like category segment or region.

Yes. Look into the. Categories.

You can also ask it to do things like. Clean up the data. Because sometimes you have data that has things like duplicates.

You can ask it to make changes. You can even say, like, please clean up the data. Look for anomalies.

Now, that's where it gets into. You can also ask it to. How many hours are in strawberry? Like, it might catch things that might not catch everything.

Like, for example, if sometimes in a column, you have us, USA, United States, United States. It might not be able to get every single one of those, but maybe you can do some data cleansing for those things. In that case, when it cleanses the data, it would then give you an updated Excel file, for example.

Or it just gives you a CSV file to download that you can open back up in Excel. Yeah. And.

I like in any way where it's the cells where it's changed things. Because this sounds really useful. But once you've got 20,000 roads and you want to be sure that it's.

Yeah. So. So as far as knowing what it changed.

Remember before, when we ask it, what did you change? Like, how many spelling errors do I trust it? I don't trust it to tell me. So I would have to use something else. Like, for example, if you have a CSV file, that's plain text.

You can run that through through some sort of app that will compare two files. And you can compare the before and after. So I would take like an original CSV file of the first data.

And then the fixed CSV file. And I would use like a code editor or something that will check files, compare files. Even if you just look for like online file comparison or something.

And those things will just look for differences. And then you can go through those. That's a part where I just don't trust to tell me those things.

Theoretically, it could. It just doesn't seem to be very that it starts to hallucinate with those kind of things. Yeah.

So here they're saying a quick analysis of these like technology, furniture and office supplies. Technology is consistently the highest revenue generator across all the years with a sharp increase, especially from those years. Noticeable growth over the years.

A steady increase, but a slightly lower growth rate compared to the other ones. And again, you can be more specific if there's something specifically that you want. For example, what was the country with the lowest revenue? What country had the highest revenue? The Gambia.

Made one sale. Never even heard of that country. What was the one with the highest or what? Were the countries.

The. Top. Top five countries with the highest revenue.

And they're actually listing the revenue. Which is nice. It's so different to do things this way to just talk.

It's like you're you're talking to somebody that knows the revenue numbers. Versus doing this in Excel or Power BI or something. It's a bit different.

Yeah, it feels like this, this should be mathematically reproducible every time you do this. Yeah. Yeah.

And, you know, if you're looking here, you know, they're doing. So doing five, you know, there's sorting. Yeah.

You know, so they figured out the country revenue and they're sorting by it. So, again, even if I don't understand all the code that I could write it from scratch. It does look like that should be correct because essentially that's what it does.

It generates the code to do the calculation. And Python, assuming they write the right Python code that Python code should generate the correct answers. Because that's doing math like.

Python is not generative AI. Python is computer code. It just knows Python, so it can do this analysis because it knows Python.

And you can run it without you having to install it. And then even suggest things like what category grew the fastest. Sure.

What category grew the fastest? Technology. So they might even suggest things to talk about or to ask questions about. Visualize revenue trends by category, compare revenue growth by region.

Let's do that. Also, this could be done from a PDF. So let's say you get a company's sales report, like annual report and you put it in there.

It doesn't have to be an Excel file. Could be plain text, CSV, Excel file. PDF.

So here's revenue growth. Year over year growth. And create these tables for you.

Expand them if you want to see more. Like you're in Excel. With the data analysis person right here.

Switch between those two tables to stay generated. And going back to my mutual question, which is, so if you buy a team's business account and double check that the training thing is off, your Excel spreadsheet in theory is not shared. Right.

As long as training is off on any accounts, paid or free, because we can always disable training. On business accounts, it's automatically turned off. We shouldn't have to do anything on those.

Then, in theory, as long as your account does not get hacked or anything, then your data is safe. The model is not being trained on your data. It's only accessible to you.

It should not be stuff that other people can see. Unless somehow they hack into your account. I mean, that's always an option, but I mean, they can also hack into your computer.

And if it's digital, there's some way that they can get it in theory. But, yeah. So.

Yeah, so data analysis.

photo of Dan Rodney

Dan Rodney

Dan Rodney has been a designer and web developer for over 20 years. He creates coursework for Noble Desktop and teaches classes. In his spare time Dan also writes scripts for InDesign (Make Book JacketProper Fraction Pro, and more). Dan teaches just about anything web, video, or print related: HTML, CSS, JavaScript, Figma, Adobe XD, After Effects, Premiere Pro, Photoshop, Illustrator, InDesign, and more.

More articles by Dan Rodney
Yelp Facebook LinkedIn YouTube Twitter Instagram