Data Slicing Techniques in Pandas with LOC and ILOC

Master the essentials of pandas data slicing with loc and iloc, two powerful indexing tools that simplify extracting specific data subsets. Learn the differences between their indexing approaches and when to apply each method effectively.

Key Insights

The dataset analyzed contains 157 rows, each representing a unique car model, with 16 columns detailing information such as manufacturer, model name, and sales figures in thousands.
The pandas method iloc uses integer-based indexing, which is inclusive at the start but exclusive at the end, requiring numeric references (e.g., rows 10–12 accessed as indices 10:13).
In contrast, loc provides a more human-readable indexing method, inclusive at both endpoints, allowing direct column name referencing such as "manufacturer" through "sales in thousands."

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

Let's take a look at this cars dataset. Now we have many rows, 157 rows, and each row represents one car model. And a lot of information about that car model, manufacturer, model name, sales in thousands, etc.

It's got 16 different columns, meaning 16 pieces of information about each one of these. We're not going to go over what each one represents. We'll touch on a few of them during this course.

Now, we can grab slices of this data. And it's a very common thing to do, particularly for grabbing certain columns, but also for grabbing certain rows. And there are two main tools for doing this.

They're called `loc` for location and `iloc` for integer location, otherwise known as index location. Now, the index for `iloc` is the number here. And this output that, you know, when you output it in Jupyter Notebook, it will show you the first five values and the last five.

And index is counted by starting at zero. That's an old-school computer way of doing things. And almost all, you know, computer systems start counting at zero.

Data Analytics Certificate: Live & Hands-on, In NYC or Online, 0% Financing, 1-on-1 Mentoring, Free Retake, Job Prep. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

We have zero through four, the first five, and the last five, 152 to 156. And it's the last one, it's, you know, because we started counting at zero, it's one less than the number of rows. 157 rows, last one is number 156.

We have a couple of different ways that we can access different slices, different bits of this overall data frame. One, `iloc`, is based on numbers zero through 156.

And each of these columns will also get a number. This is number zero, this is column one, this is column two, and so on. So here's how we can access specific slices of it.

Let's say that I want to grab the rows 10,11, and 12, index 10,11, and 12, and I want to select the first three columns, manufacturer, model, and sales in thousands, which are indexes 0,1, and 2. So the way we can access that, let's do it with `iloc` first. I'm going to add "using loc."

We're going to access `cars.iloc`, and we use square brackets as if it's an array, which it is, although it's a two-dimensional array.

And we can say, first, the part of the rows that we want. We want indexes 10 to 12, inclusive, 10,11, and 12, those rows. So the way that slicing works when it's index-based is we give a starting one, and a colon, and not quite the ending one.

One past. The reason is that the start is inclusive, meaning start at 10, but the end is exclusive. It's up to but not including index 13.

So that's our row numbers we want. And then we can additionally put in the column numbers that we'd like. Our columns that we want are manufacturer, model, and sales in thousands, indexes 0,1, and 2. We want index 0, index 1, and index 2, up to but not including index 3. So to get 0,1, and 2, we say 0, up to but not including 3. And if we evaluate that as our last value here, we get just what we're looking for, rows, row numbers 10,11, and 12, and columns, manufacturer, model, and sales in thousands.

Now, `iloc` is very good for certain uses, and we'll walk through what those are. But it is hard to read. Once you're thinking in indexes—from this number up to, but not including, this number—it's easy once you're used to it. But it's very easy to get that wrong. We also have another method called `loc`.

`loc` refers to location, but it's not index-based; it's more human-readable. So here's how we could do this with `loc`. Same exact result we want.

When I execute this, if nothing changes, great, we did it right. We'll say `cars.loc` this time. And this time we put in, because this is meant to be human-readable, the indexes for the rows are inclusive on both sides.

So it's from 10 to 12, including 12. Because remember, `iloc` is from 10 up to but not including 13. This is sort of what you would assume as a human, not a computer.

And so that's what `loc` is. `loc` is sort of the more human-readable-focused version of `iloc`. It's not just that, right? It's, you know, there are differences in the abilities of each with, you know, using numbers, unlock certain abilities.

But it is definitely more human-readable overall. So that's a key difference. And then since we're not integer-based, we're not index-based anymore, we can give the names for the columns.

The rows don't currently have names. If they did, we would put those names in there instead of 10 to 12. But they don't.

They're just row numbers. But the columns definitely have names, manufacturer, model, and sales in thousands. We can say, in quotes, a string, Manufacturer, but we should spell it right, which includes spelling it right at all.

Manufacturer—that's a tough name. We need to make sure that case is correct as well.

This is not the same column name as with the capital letter. From Manufacturer up to and including, because it's `loc`, Sales in Thousands. And again, it's up to and including because that's the way humans would think about it.

If you said, "Give me columns Manufacturer to Sales in Thousands, " you wouldn't think, oh, he's saying Manufacturer up to, but not including, Sales in Thousands. We should get the exact same result when I run this. Yep—no change.

Perfect. That's what we wanted. These two will produce the exact same result.

And those are the key differences between them. Let's give you a challenge in the next video.

Key Insights

Colin Jaffe

How to Learn Machine Learning