Discover how AI tools like ChatGPT can elevate your SQL coding skills, from enhancing query optimization to clarifying complex concepts. Learn practical tips and precautions to effectively integrate AI into your SQL workflow.
Key Insights
- Leverage AI to clarify SQL concepts and rewrite queries; for instance, converting Postgres-specific syntax into SQL Server-compatible code with optimized functions like using the YEAR function instead of DATEPART.
- Utilize AI to analyze and optimize existing queries, such as replacing subqueries with Common Table Expressions (CTEs) or employing Postgres-specific "DISTINCT ON" clauses for performance improvements.
- Remain cautious when integrating AI-generated SQL solutions; always test and validate the output thoroughly to ensure correctness, as tools like ChatGPT can occasionally produce inaccurate or contextually inappropriate recommendations.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
So how can you use AI like ChatGPT to help with your SQL code? Number one, you can have it explain things to you. You can learn and better understand SQL with the help of AI. Number two, you can have AI rewrite queries.
Let's say you know Postgres and you need to rewrite a query in SQL Server. You can have it rewrite your query from one flavor of SQL to another. Three, you can have AI explain queries that you don't understand.
If somebody gives you a query or if you see an example on a website somewhere, you can have AI explain it to you. Number four, you can optimize your queries. If you want them to run faster, make them easier to read, AI can help with that.
Number five, you can actually have AI help you rewrite or write your queries from scratch. And lastly, I'm going to talk about things to watch out for when using AI. Let's get started.
First of all, you can simply ask ChatGPT, what are ways you can help me write SQL code? And it will go through and explain some things. It can help writing queries, debugging and fixing SQL errors. If you have some code and you can't figure out what's wrong with it, ask ChatGPT.
It can help with database design, query optimization, writing store procedures if you're using SQL Server, helps with functions on different flavors of SQL, working with specific databases. If you're converting from one to another, it can help you with syntax changes, different data analysis, data migration, writing code, automating SQL tasks. Some of these you might not be doing if you're just writing queries, but you can also ask it for other things like, say, I don't have permission to change the database.
What can you help with in writing queries only? So writing select statements, working with queries, doing aggregations, filtering, searching subqueries and CTEs or common table expressions, performance optimizations. If you're not sure of even what it can do, you can ask ChatGPT and it can even tell you what it's capable of doing. Now let's see some specific examples.
Let's say you're struggling to understand a SQL concept, like maybe understanding SQL joins. You could ask it, explain the different types of SQL joins. So they're going to go through and explain inner join.
It returns only matching rows from both tables based on a specified condition. Non-matching rows are excluded. Let me give you an example here.
Use case, when you only need data that exists in both tables. Now left join, which is a left outer join, mostly just called left join by most people, but it is technically an outer join, returns all records from the left table and matching records from the right. So you get all of the ones from the left and only matching ones from the right.
If no match is found in the right table, no values are returned for the columns from the right table. So again, they give you an example and a use case. When you need all records from the left table, which would be this one, even though there's no match in the right table, and then they go through the right join, and I'm not going to read everything here, but it goes through all of the different joins, explaining the different types of joins, and there we go.
So it gives you an example. Sometimes when I've seen this, it also gives a table of differences. So let me try this again and just see if it gives an example this time.
We'll let it write its thing and see if it gives an example that it's done sometimes when I've asked it this question before. Let's see what it writes. I'm going to ask it to give me a short table comparing them.
Sometimes it's given this as part of its explanation, but it's not. I can always ask it. And there it gives a nice example of inner joins, left joins, right joins, and so on.
Does it include matching rows? Does it include non-matching rows? Does it include null values, use cases? So awesome for trying to better understand certain SQL features. Another example is that if you know how to do something in Postgres. So let's say you have some code that works in Postgres.
You know how to do date part. Especially dates are different in Postgres versus SQL Server. So let's say you want to rewrite this because now you have to convert over to SQL Server.
And you know how to do something in Postgres, but you're curious, how does this work in SQL Server? And so you ask it to rewrite some code for you, and it does it. So in this case, date part is slightly different in SQL Server. And they could have used the date part equivalent, but they actually went for the more optimized year function, which is a little less code to write.
And they actually added an alias here as well because order by date part here, Postgres automatically kind of gives it an alias called date part, but SQL Server does not. So they give it an alias, put in a nice little alias there. So they did a nice, good conversion to SQL Server.
And this would work perfectly in SQL Server. Let's say you have a query that you don't understand. You could take your code and you could say, what does this code do? If you get code from a coworker, or if you see something on a website, and you just don't know what it does, ask chat2pt, what does this do? So it says this SQL query retrieves the highest value Jeopardy question for each show.
The breakdown, it does the inner query first, which is good because SQL does inner queries first. So it starts by explaining inner query, because without understanding that, you don't know how that's being passed to the rest of the query. So this inner query is called m for the max value.
And so this inner query called m, which is this code right here, groups the Jeopardy table by show number, grouping by show number, each unique Jeopardy episode. So each show number is for a unique Jeopardy episode. It finds the maximum value, the highest value question for each show number.
That is what that does. Then it takes a look at the outer query. So that comes next, the outer query.
And it joins the Jeopardy table. We're joining the Jeopardy table here, J, with the result of the inner query right here. The join condition on this, so the on part there, ensures that only rows where the question value matches the highest value for that show number are selected.
And then it orders the results by show number, ensuring that the results are in chronological order. What data does it return? It selects all the columns. So it gives you all the columns from the Jeopardy table.
Only questions that have the highest value for their respective show number. So it's not showing you all the rows, just some of the rows. Potential issues.
If multiple questions in the same show have the highest value, all questions will be returned. That's actually what we wanted to do. So that's not an issue for us, but I can see how they might be.
If they don't know the intent of what we're trying to do, they might raise that question. If value is stored as a string like this, the max value function may not work correctly unless converted to a numeric format. First, I didn't tell it what my data types were, so it's nice that they're thinking about that, but that's not actually an issue with our database.
And then it asks if we'd like to do anything else. But that is a nice explanation that goes through kind of step by step, so we understand what that code is doing. Now, let's say you have a query that's not running very fast, and you think that maybe there's a way that you could run it, make it faster, optimizing the code.
You could say, is there a way I can make this query faster? In this case, yes, your query can be optimized in several ways to improve performance. Here are some. We could use a CTE instead of a subquery.
Common table expressions can sometimes make queries more readable and optimize execution in some databases. So that may or may not work, but they're saying that that's one possible thing. To ensure proper indexing, that's something that you may or not be able to do depending on your permissions with the database.
Now, we are using Postgres, so there is another way that we could do this using a distinct on clause. And so if you're using Postgres, which in this case we are, there is a more efficient alternative, and they provide a way using a CTE here. This is one way that you could do it.
And the other one they're giving you, that if you can actually change the database, which you might not be able to do, there might be some indexing you can do. That would not be something you'd want to do unless you understand how to do those kind of database optimizations. But that's something that maybe you could check with database administrator on.
If you are doing Postgres, you could do this with distinct on. So this is a different way to do this. And let's say you're like, hey, you know, I've never done distinct on.
I'm just kind of curious. Like, tell me more about this distinct on. Tell me more about distinct on.
Now, in this case, this is just for Postgres. This doesn't work in all flavors of SQL, but this is where you can learn more, where maybe ChatGPT explains how to do something and you're like, hey, I've never seen that code. What is that? And you want to understand more about it so you can learn how to use it.
And so you can go through and kind of read the explanation, see how it works. And this can be, you know, they give a nice comparison of different methods, which are the different ones, showing kind of how simple it is, the performance, different notes. So this can be a way that if ChatGPT gives you some examples of code and you're like, hey, I don't understand what this one part is, you can ask it to learn more about it.
And when you're reading this, don't just have ChatGPT, just don't think of it, you're outsourcing your work completely. Try to do this so that you understand SQL better. So you're actually learning SQL code with the help of ChatGPT.
So one of the limitations here is that it is only in Postgres. It doesn't work in things like SQL Server, if you were doing that. And so, you know, something to think about, but these are ways that you can learn more using ChatGPT.
So it can help you to optimize your queries, maybe make them perform better, be more readable for other developers. And then you can also help yourself learn SQL code by asking ChatGPT questions about what it's doing. So you can better understand this.
As far as learning more about SQL code, let's just give another example here. Maybe you want to learn more about subqueries and CTEs or common table expressions. So you can have it explain the difference between those.
They're both used to structure complex queries by breaking them into smaller, more manageable parts. They differ in syntax, readability, performance, and use cases. So a subquery, which is a nested or inner query, is embedding another query inside of a query.
So they give an example of a subquery like this. They're saying that it's used inside of select from and where clauses. They're saying it can be more difficult to read and debug in complex scenarios, and you can't reuse it.
So once you have it there, you can't really reuse it. Now with a common table expression, it's a named temporary result defined within the width. So here it's like we're taking the subquery and we're naming it here.
So they put it with the width part of this. So the CTE, which is this in this case, does this, and then the main query retrieves the data from the CTE. So you say with this thing, and then it pulls all of that from the CTE.
So CTEs are defined using width so they can improve readability. So you kind of know what they are first before you get to this, rather than having like a subquery in here, for example. And one of the benefits of the CTE is that they can be referenced multiple times within the query, rather than running that subquery multiple times.
If you do need to rerun it, of course, rerunning a subquery multiple times wouldn't be as fast, whereas running this once and then referencing it multiple times within a query could enhance performance if you need to run that multiple times, potentially. And when it gets more complex, having those with statements beforehand can increase the legibility of this when you have something more complex. So they give different scenarios in terms of readability.
Subqueries are a little less readable, especially when nested inside of each other. CTEs are more readable, kind of structured syntax. Performance may be reevaluated multiple times if you need to do them multiple times.
If you only need to do them once, that's not an issue. So they can perform faster with a CTE if you use it multiple times. That depends.
So it just depends, you know, simple queries that have a one-time use, subqueries are okay, but the more complex they get, if you want to better organize them or if you need to call something multiple times, then a CTE could be better. Now, let's say you have some code where you are using a subquery and you want to convert it into a CTE. So you could ask it and say, convert this subquery code into a CTE and it will rewrite your code for you.
So it takes this subquery here and moves it up here into a CTE with a name. So it's saying, with this code known as this, and then where we do the join, instead of putting the subquery, it does the join and it uses the name. So it kind of saves this result.
It's kind of like a name subquery, think of it that way. And so it does the rewriting. So this part is a little bit easier to read because you can read the names and you don't have to have the subquery part in there.
Functionally, these are going to work basically the same, but just the legibility of it, the readability of it can be different. And we would want to make sure that this code runs, of course. If I copy that and go back to my code editor here, go back to DBeaver, and I would want to run this just to make sure it works.
And yes, so this code does work. And that's always something important that you want to make sure that your code does function. Now, let's say you want to get help writing a query from scratch.
What's really going to help ChatGPT is having it understand your tables, your table names, your column names. So let's say I'm in DBeaver, for example, and I double click on tables so that I can see my database diagram. Here I can see my schema, my connection of my tables.
Here I can see the table names. I can see the column names, that there is relationships between these. And I've already taken a screenshot of this and saved it to my desktop.
And I'm going to upload that into ChatGPT so it can learn my table names and my column names and understand the relationships of those tables. So here I saved that screenshot of that area so that I don't have to write in an explanation of my table names and the column names. If for some reason you can't take a screenshot, you could always write in to say these are my table names, these are my column names, these are the relationships.
But if you can take a screenshot, it's going to be a lot easier. And then I'm going to write in plain English what I want ChatGPT to do. So I'm going to say the attached image shows my tables and their relationships.
If you can't have that, that's where you'd have to write out what those things are. Write a query that shows how many orders based on where users live, putting states with the highest number of orders at the top. I'm using plain English to describe what I want ChatGPT to do.
And let's see what query it comes up with. So here it is doing a count, which is an aggregate function, doing a group by, so it's grouping by the user's state, which is correct based on where users live. And so it's grouping by states, showing people the states, counting the number of orders, so it had to join users to orders.
It joined on the user ID, which is present in both, so all of that is correct. It's ordering by the total orders, which is this column right here, in descending, which will put the highest number at the top. All of this code looks good, but let's just copy it and test it out just to make sure.
It's always important to test and make sure. So here I can run this code. And indeed, I've got all of my states.
And I've got them ordered properly. So that code looks like it did exactly what I wanted to do. This is one of the real benefits of ChatGPT, is that if you're having difficulty translating plain English into SQL code, maybe you understand what the basic features of SQL does, but you're having problems trying to translate plain English into SQL, this can be one of the things that can help you start to get better at that.
But don't just copy and paste the code and not look at it. Always make sure you're looking at the code, trying to understand what it does, and trying to start to become familiar with the features of SQL and verifying that the code is actually correct. Use this as an opportunity to not just work faster, but also to learn SQL code and understand kind of how to start to think this way.
So to end, I want to make sure we talk about things to watch out for when using AI. First of all, ChatGPT can be wrong. It is very good, and I am often amazed at just how amazing it can be in certain cases.
But then other times I'm amazed at how it just doesn't understand things sometimes. So it can be great. It can also not work so well sometimes.
It can be a mixture. So don't just trust the output that it gives you. Always verify.
I call it the sniff test. Does it smell right? Does something smell off? Does this make sense? So look at the output, and you've got to double-check things. So first of all, you want to make sure it runs properly, like actually test your code, the outputs.
Don't just say, oh, it looks right, and then not run it, and give it to somebody. At least test it. So you've got to make sure you test it.
And don't just test to make sure it runs. Make sure that you look at the answer, the result that it gives you, and make sure that that's the correct answer, the way you expect it to be, that it didn't miss doing anything. It can be fast to use ChatGPT sometimes.
But if you're wrong, or if ChatGPT in this case is wrong, it doesn't matter how fast you are. People are just going to know you as untrustworthy if you're giving them the wrong answer. So focus on making sure that it's correct, and we still have to know enough about SQL code, or at least enough to analyze it, and look at it, and say, okay, this all not only looks correct, but you want to go piece by piece, and make sure that the code is correct, it's doing the right stuff, and that the results are what you would expect when it comes to the output of it.
So double-check the code, make sure it runs, make sure it gives you the output that you expect it to do, and you can't just completely outsource to ChatGPT. But it can be a huge assistant to you. It can help you to understand things better.
It can help you to troubleshoot. It can get you unstuck if you don't know how to do something. It can sometimes just help you to translate plain English into SQL queries.
It can be an immense help. But if you didn't know anything about SQL code, if you didn't know how to troubleshoot things, if you didn't know how to verify what it's doing, it would limit how you can use it. So the more you understand how SQL code works, that you can verify it, make sure it's correct, that's going to be better for you.
So these are just things that you want to make sure that you think about and use AI responsibly, use it to help you, use it to help you understand things, and it can be a great assistant when used properly.