Master the essentials of pandas data slicing with loc and iloc, two powerful indexing tools that simplify extracting specific data subsets. Learn the differences between their indexing approaches and when to apply each method effectively.
Key Insights
- The dataset analyzed contains 157 rows, each representing a unique car model, with 16 columns detailing information such as manufacturer, model name, and sales figures in thousands.
- The pandas method
iloc
uses integer-based indexing, which is inclusive at the start but exclusive at the end, requiring numeric references (e.g., rows 10–12 accessed as indices 10:13). - In contrast,
loc
provides a more human-readable indexing method, inclusive at both endpoints, allowing direct column name referencing such as "manufacturer" through "sales in thousands."
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
Let's take a look at this cars data. Now we have a bunch of rows, 157 rows, and each row represents one car model. And a lot of information about that car model, manufacturer, model name, sales in thousands, etc.
It's got 16 different columns, meaning 16 pieces of information about each one of these. And we're not going to go over what each one represents. We'll touch on a few of them during this course.
Now, we can grab slices of this data. And it's a very common thing to do, particularly for grabbing certain columns, but grabbing certain rows as well. And there are two main tools for doing this.
They're called loc for location and iloc for integer location, otherwise known as index location. Now, the index is iloc, is the number up here. And this output that, you know, when you output it in Jupyter Notebook, it will show you the first five values and the last five.
And index is counted by starting at zero. That's a, you know, old school computer way of doing things. And almost all, you know, computer systems start counting at zero.
We have zero through four, the first five, and the last five, 152 to 156. And it's the last one, it's, you know, because we started counting at zero, it's one less than the number of rows. 157 rows, last one is number 156.
We have a couple of different ways that we can access different slices, different bits of this overall data frame. One, iloc will be based on numerics. So based on the zero to 156.
And each of these columns will also get a number. This is number zero, this is column one, this is column two, and so on. So here's how we can access specific slices of it.
Let's say that I want to grab the rows 10, 11, and 12, index 10, 11, and 12, and I want to do the first three columns, manufacturer, model, and sales in thousands, which are indexes zero, one, and two. So the way we can access that, let's do it with loc first. I'm going to add using loc.
We're going to access cars dot, actually, let's do iloc first. That's what I intended to do. Cars dot iloc, and we use square brackets as if it's an array, which it is, although it's a two-dimensional array.
And we can say, first, the part of the rows that we want. We want indexes 10 to 12, inclusive, 10, 11, and 12, those rows. So the way that slicing works when it's index-based is we give a starting one, and a colon, and not quite the ending one.
One passed. The reason is that the start is inclusive, meaning start at 10, but the end is exclusive. It's up to but not including index 13.
So that's our row numbers we want. And then we can additionally put in the column numbers that we'd like. Our columns are that we want our manufacturer, model, and sales in thousands, indexes 0, 1, and 2. We want index 0, index 1, and index 2, up to but not including index 3. So to get 0, 1, and 2, we say 0, up to but not including 3. And if we evaluate that as our last value here, we get just what we're looking for, rows, row numbers 10, 11, and 12, and columns, manufacturer, model, and sales in thousands.
Now, iLOC is very good for certain uses, and we'll walk through what those are. But it is, it's hard to read. It can be a little easier to read once you get used to an index-based system.
Once you're thinking in indexes, and you're thinking in, okay, from this number up to but not including this number, then it's easy once you're used to that. But it's very easy to get that wrong. We also have a different one called LOC.
LOC is location, but not with the index-based, with human, more human-readable-based. So here's how we could do this with LOC. Same exact result we want.
When I execute this, if nothing changes, great, we did it right. We'll say cars.LOC this time. And this time we put in, because this is meant to be human-readable, LOC is meant to be human-readable, the indexes for the rows are inclusive on both sides.
So it's from 10 to 12, including 12. Because remember, ILOC is from 10 up to but not including 13. This is sort of what you would assume as a human, not a computer.
And so that's what LOC is. LOC is sort of the more human-readable-focused version of ILOC. It's not just that, right? It's, you know, there are differences in the abilities of each with, you know, using numbers, unlock certain abilities.
But it is definitely more human-readable in the end. So that's a key difference. And then since we're not integer-based, we're not index-based anymore, we can give the names for the columns.
The rows don't currently have names. If they did, we would put those names in there instead of 10 to 12. But they don't.
They're just row numbers. But the columns definitely have names, manufacturer, model, and sales in thousands. We can say, in quotes, a string, manufacturer, but we should spell it right, which includes ... Includes spelling it right at all.
Manufacturer. That's a tough name. We need to make sure that case is correct as well.
This is not the same column name as with the capital letter. From manufacturer up to and including, because it's LOC, sales in thousands. And again, it's up to and including because that's the way humans would think about it.
If you said, give me columns manufacturer to sales in thousands, you wouldn't think, oh, he's saying manufacturer up to, but not including sales in thousands. We should get the exact same result when I run this. Yep, no change.
Perfect. That's what we wanted. These two will produce the exact same result.
And those are the key differences between them. Let's give you a challenge in the next video.