Discover how ChatGPT can streamline your database queries, improve data analysis accuracy, and help you better understand SQL concepts. Learn to leverage AI effectively by providing clear, explicit instructions and verifying results carefully.
Key Insights
- Provide ChatGPT with explicit instructions to generate accurate SQL queries, specifying database systems (e.g., Postgres) clearly to avoid ambiguity and incorrect assumptions.
- Regularly verify ChatGPT-generated queries by examining the logic, ensuring correct table joins and calculations, such as summing price times quantity based on user locations rather than shipping addresses.
- Use ChatGPT as a learning tool to clarify SQL functionalities, such as understanding the purpose of the NULLIF function in queries to avoid errors like division by zero.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
Okay, let's talk through the first challenge. Where did our revenue in 2021 come from? How much were we making in each state? So the first thing I need to do is upload a screenshot of my database structure, which I already have a screenshot of this. So I'm going to upload that and I'm going to ask it.
Now, let's say you're a little too generic in the way that you ask. If I say to calculate the sales by state for 2021, the ChatGPT can do what they call advanced data analysis, meaning that if you upload files in like a CSV file, an Excel file or something, it can actually do the analysis for you. See, I didn't tell it that I wanted a SQL query.
If I do want a SQL query, I have to be explicit about that. So I need to change this around. I need to do something a little different.
And I need to say something like write a Postgres query. If I said SQL, I run the risk of it guessing SQL Server or some other sort of flavor of SQL. Always be specific.
Postgres, Oracle, SQL Server, whatever kind of flavor of SQL you want. So write a Postgres query that calculates the sales by state. So it's only the same thing, but I'm being more specific that I wanted to write a query for that.
Okay. I already uploaded the image of my database schema. So it knows what my tables are, my columns are.
So it joins the things together here. And looking at this a little closer here. So it's joining orders to line items and it's doing by order ship date.
So O is for orders. So ship state is for where orders were shipped to. But I actually don't want it based on where orders were shipped to.
I want it based on where people are from. See in my schema here, I've got users and where users live, as opposed to where orders were shipped to. So they did kind of take the easiest route and said, oh, let's just join orders to line items and use where orders were shipped to.
So I'm going to say, please use where users live, not where orders were shipped to. So sometimes there's some back and forth, and this is where you have to be very careful to look at what ChatGPT is writing for you. So in this case, it is taking our orders, joining it to users, joining to line items.
And so now it is considering where users live, what state they're living in, which is good. So that's what I want. And it talks through what it's doing here.
Now, I can't just assume this is good. I do have to read through it, copy it, go check out how the query runs. So I'll run this on here and look that I am getting my states.
I am seeing the sales across states. It is ordered by total sales. So the states with the highest sales are at the top, least sales down here at the bottom.
And seeing that we have 39 rows, I see that we have users from 39 out of the 50 states that we had had sales to in this year. So looking through this, it is adding up the price times quantity. Okay.
So that's good. If we look at our schema here, we do need to factor in the price and the quantity. So it is doing that calculation correct, summing all of that stuff up based on where users live, not based on where orders were shipped to.
So it is grabbing the year from when the order was placed, making sure it's the correct year and grouping by the user state. So all this code looks correct. The result set looks good.
Sniff test, everything smells good. So this looks like it's a good query. And in this case, took a little bit of back and forth, but once you get used to these kinds of issues, you can know that based on past problems, you have to be specific enough.
Like I have to be specific that I want a Postgres query, not that I want it to do the calculations, but I want it to write a query so that I can do the calculations in my database. And just be specific, these kinds of things. Don't just assume it's going to get everything right.
Be as specific as you can, and you'll get it closer to being in just kind of a one-shot query. But if you don't get it the first time, just add clarification and go back and forth and then verify that ChatGPT is getting what you want. Now, if I want to go a little bit further here, kind of going back on how much are our sales growing or shrinking? Like what's my year over year revenue doing? Comparing the current year that we were looking at, which is this 2021 year here, to the previous year that we're looking from.
So from 2021, going back to 2020, what was the year over year growth? So one way I could do that is I could just add a column. If I don't want to just do a total year's revenue, one thing is I could say, add a column to show the year over year growth from the previous year. And we can see that.
So it's going to calculate 2021, calculate sales for 2020. And here it's using a CTE here and getting the year, getting the total sales. Then it's doing this here where it's getting 2021, 2020.
And let's see, let's copy this and paste it into here just so we can maybe select some stuff here and see. So like, let's say, for example, it's a little smaller here. Where is sales pivot here? So this creates a table called sales pivot, and we're getting everything from, so we're getting certain things from there, ordering by this column here.
And let's run this just to see, first of all, it doesn't run. We've got all of our states. We've got our sales for 2021, sales for 2020.
You can see the percentage growth here compared to 2021 to 2020. Here we only made $50, but then, you know, over 3000. So that was a huge increase there.
Pretty small sales though. And so here, going from here to there increase, some of these will be negative here. We went down in sales here in some of these states.
So you can go through and you want to make sure that you kind of go through piece by piece. If it doesn't do a good enough job explaining how it works, you can say, please give me a longer explanation. If you don't know part of what it's doing, you can ask more details like, what does the sales pivot do? Now it does say transforms yearly sales into columns for easier comparison, but you could say, explain the sales pivots in greater detail.
All right. If there's parts you don't understand, if you can't follow the logic of what it's doing, ask it to explain, and then it'll go through and give you some examples. All right.
You can follow through. So this is how you can learn SQL as well. If there's certain things that it's doing, certain approaches that it can take, try to understand, get into the mindset of what is it doing and ask questions.
This can be a back and forth conversation. It is a chat. So don't be afraid to go in there and follow up with more detailed questions.
Now, if in the original spirit here, if we were just thinking year over year revenue, I might not want to be doing that based on states. So I think what I would do is I think I would create a new chat because I want a totally different kind of query. I don't want to do it by state.
So I'd re-upload from my computer, the tables. And I would say, I wanted to write a Postgres query that does the year over year growth. That shows the year over year growth from 2020 to 2022.
And let's see what that does. All right. So I'm going to copy that and I'm going to create a new query here, paste it in and run the query.
Here we can see our, now I just said growth. Notice how it says orders. So they're thinking total orders.
They are counting. Now, when you count, you count the number of orders, right? We're in the orders table. So counting would count orders.
I was thinking more total sales. So this is some of the back and forth. Instead, well, actually I'll say, instead of the number of orders, please calculate the amount of sales instead.
And I could have said revenue, but I was kind of trying to be a little vague here to see if it could figure it out. But of course, be specific. If you want revenue, say revenue, or if you want to have it in dollars, say dollars.
So let's copy this and go check this out and run this. Aha. Now we're talking sales, not the number of orders.
And so now we can see that my sales were this and they're up to this. We can see the year over year percentage growth. Of course, you want to do the sniff test.
Am I smelling anything wrong? So, you know, you do want to go through, make sure everything makes sense. If there's parts that you don't understand, like let's say no, if you're like, Hmm, what does null if do? Maybe you've never seen null if. So you can say, what does null if do? And again, it might've explained it.
You know, I didn't read through all of the response there just because I'm going through this with you here. So I'm being kind of brief, but it goes through and talks about it. So it returns null if an expression equals another expression.
So why is it used in this case? So they they're doing it to prevent division by zero. So without that, you can get an error. So you don't want to divide by zero there.
And so they're avoiding that error. All right. So, so you can learn about what this thing does, why they are doing it, you know, why it's useful.
And so this can be a great way to have ChatGPT help you do the hard queries to learn, but always verify, always read through the code, try to learn from that. And so sometimes ChatGPT can help you get unstuck. Sometimes ChatGPT can help you with the really hard queries.
But you know, it's not always perfect. This did actually a very good job of writing these queries. As you get really complex, sometimes it's not going to be good enough.
So it's still important to be able to know how SQL works, to be able to read through the queries. Sometimes, of course, of course, to be able to write them yourself. ChatGPT can't always be the answer every single time.
But also, if you don't know anything about how to look and read through the queries, you can't verify the accuracy of these queries. And I don't care how fast you are getting a response from ChatGPT, if ultimately the query that you're running produces a bad result. And it's really up to us to understand our queries, whether we're writing them or whether we're asking ChatGPT to write them, we need to be able to validate that these things are correct and that people can trust the results that we're giving them.
But ChatGPT can be a great ally and a great help to you in trying to write your queries, to challenge through things, to ask you questions, and it can be a great digital assistant for you.