Explore the world of SQL and relational database management systems, their commercial and open-source variants, and the nuances that make each unique yet fundamentally the same.
Relational Databases
Imagine looking at the customer purchase history of a company and all of the data one can put into a database. If we were to look at things such as the item price and description or a buyer’s address, we’ll often see that data repeats itself (one person can buy multiple things, many people can buy one thing), and not only does it repeat itself, but it does so in an organized function. Instead of looking at all the data as a whole, we would logically split the data into parts, such as the store data, the customer identification data, the product data and the purchased data.
All these parts are related, and we can thus link these relations with identifiers known as keys. To access data across tables, we would then employ the JOIN keyword. Instead of having one table of data, we create a schema of multiple tables, and this overall system is known as a Relational Database Management System (RDBMS). Luckily, every RDBMS uses SQL. Every database that is not relational is classified as a NoSQL database.
Commercial
Oracle is the largest commercial RDBMS, and sets the standard for commercial use. Other companies such as Teradata are similar commercial solutions with their own licensing schemes. No matter what the stance is with licensing, one can use SQL to manage data.
Microsoft SQL Server is notable for its use of a Windows-based file system, but their software and hardware philosophy have changed to slowly adopt and integrate UNIX. As a result, much of the once-different SQL language is now mostly the same. MSSQL uses a variant of SQL called T-SQL, which is short for Transact-SQL, and we will explore some of the differences later in this article.
Open Source
You may have heard of PostgreSQL, MySQL, and SQLite, which are all open source variants of SQL. Internally, all have the functionality of a commercial RDBMS, but do not have the licensing costs. The management and troubleshooting of these databases rests squarely on the client side. A common misconception is that an open source database is open to hacking, but all of these databases have strong security and encryption of your data. All use SQL.
So what’s different?
As each of these variants vie for your attention, it’s worth noting that most of the SQL is the same across the board, with minor differences. As choices arise, there is no advantage to having one company write a different SQL keyword, as that would create confusion across the market.
When transitioning from one variant of SQL to the next, it’s important to note these key differences:
Date/Time Objects and Functions
Handling dates varies across SQL, and this changes with each version of a specific SQL package. Dates have their own type, known as a datetime, and functions that convert, use, shorten, or parse datetimes vary across the board. To learn the proper syntax for each version of SQL, it is recommended to look at the documentation for that SQL, as a WHERE clause may not return your desired data.
JOINs
When using the JOIN keyword, we link our data based off of the keys we link using the ON keyword, such as ‘customer.id = cid’. When we perform this link, we create an inner join of the data that is common on both of the tables. However, if we wanted to add the full data of one table or the other, we would use a LEFT, RIGHT, or (FULL) OUTER join. The FULL keyword is not consistent across SQL, and should one perform the task of joining the tables, one should attempt the query both ways to figure out whether or not to add FULL to the join language.
Creating/Adding Data
Creating new tables and adding data is usually done outside of the database, through the website with which the database is joined or with other means such as a POS system. Usually only database administrators have access to altering data, so it is their responsibility to learn the differences across RDBMS.
Limit vs. Top
The query:
SELECT *
FROM table1
LIMIT 5;
In most SQL variants is equivalent to:
SELECT TOP 5
FROM table1;
In T-SQL (MS SQL). Both select the first 5 entries from the desired query. TOP has the added functionality of using the percent keyword to find the TOP n PERCENT of results in a query.
Pivoting
Lastly, as Pivot Tables are a champion of Excel, Pivoting is a function on both MSSQL and Oracle servers using the PIVOT. For SQL variants that do not have this keyword, using CASE in the select statement can achieve the same results.
Conclusion
While there may seem like many differences with SQL, knowing the fundamental language can harness most of the RDBMS functionality.