Pandas offers versatile alternatives to traditional indexing methods with at
and iat
, simplifying data editing tasks. Learn how to effectively filter and manipulate strings within datasets using partial matching and replacements.
Key Insights
- Explore
at
andiat
as effective alternatives to the traditionalloc
andiloc
indexing methods in Pandas, allowing users to modify DataFrame values accurately and efficiently based on row and column locations. - Use the Pandas
.str.contains()
method for partial string matching, improving search results by capturing variations like "seafood salad" or "shrimp salad," and utilize the parametercase=False
to ignore case sensitivity and broaden the search scope. - Leverage the
.str.replace()
method to perform efficient string replacements within DataFrame columns, correcting errors or changing terminology easily while ensuring best practices by using direct selection with.loc
to avoid common Pandas warnings.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
This is a lesson preview only. For the full lesson, purchase the course here.
Now there's an alternative to lock and ilock called at and at and iat. So one more time we're going to change the name of the salad, but we're going to use iat. fooddf.at which is analogous to lock.
We're going to go into row eight, excuse me, at is analogous to, yeah, at is analogous to lock. We're going to come into row eight, column, item, and set it equal to seafood salad. There you go.
Seafood salad. Now iat, the i being integer is analogous to ilock. So let's go back to shrimp salad and we'll say fooddf.iat, which works just like ilock, eight comma zero, right, the integer location of that first item column, setting it equal to shrimp salad, and it's back to shrimp salad.
So that is an alternative to lock and ilock, at and iat. Just due to personal preference and long habit, I prefer lock and ilock, and I think you see lock and ilock more often in documentation tutorials when you're googling stuff. So equals.
Now what if you're looking for a string match? I want every single result. I'm not using greater than or equal to here. I'm going, I'm not using greater than or less than or greater than or equal to or less than or equal to like we did for the numeric filters when we were getting the price greater than 15 and so on, right? Let's go look at that example.
When you're doing a filter, that Boolean comparison is the current price. It's going to iterate through and it's going to take every price and see if it's greater than or equal to 15, and if it is, if that return is true, it's going to accumulate it in the result. But we're not using double equal, right? We could, looking for something that's exactly 500 calories or exactly $20 or something.
Obviously, you get less results when you do an exact match. We're going to be doing it. We're going to do a double equal comparison, but we're going to look for salads.
Now, if you look in our output, we have one, two, three, four, five salads. Granted, there are different kinds of salads, but there's still five items with the word salad in it. So let's get all the salad rows versus via an exact match.
We're going to make a new DF. Let's call it salad DF. That's going to be your food DF and our Boolean comparison is going to be the food DF item equal to salad.
And let's run and see how many salads we get. And we get zero salads. And the reason we got zero salads is because even though you have five salads, none of the salads are exactly called salad.
They're all something salad. So plain old salad doesn't catch any of them because you're going for an exact match. So what we really need to do and what you would typically do when you're looking for a string, looking by a string match, is you do a partial match with string contains.
So you don't just double equal. You say straw. There's a string property of the column and you go into the string and then you look for contains, call the contains method and feed in the word that you're looking for that contains.
It's kind of like an SQL query with like where you're, you know, when you search for Google, you know, when you Google search something like bell curve, you're not saying show me the web pages that only have the word bell curve exactly. No, it contains bell curve. Otherwise search would be useless.
Filter would be useless, ineffective. We just want this, the word, the results, the items that contain the word salad. We're going to say to get all the salads, use straw contains salad and to make it case sensitive or that is not case sensitive because programming is case sensitive.
If you write salad is all under case or all lowercase, unless you put exactly the way it's written, you're not going to get any results unless you turn off the case sensitivity with optional extra argument case false. We're going to say salad df again equals food df. We're going to do a little Boolean comparison.
Food df dot straw dot contains. It's actually food df item. We want to go into that specific column dot straw dot contains and then we'll say salad.
And now that we're saying we don't need an exact match, how many salads are we going to get? And unfortunately we get no salads whatsoever. And the reason for that is because in the data set, in the data frame, salad is capitalized in all cases. Whereas we're searching for lowercase salads.
So so you don't have to care about the case sensitivity. We're going to pass in this extra set, this extra parameter case equals false. Run it again and now you are good with your five salads.
And we've already got our big kahuna burger. Let's take that out. So how many burgers do we have? Let's go back to the original.
You've got one burger, two burgers, three. Hamburger, bison burger, and big kahuna burger. So let's get all the burgers.
Challenge. Get all the burgers into a new data frame called burgers df. Pause the recording and come back when you're ready for me to show you the solution.
Hopefully you'll get it on your own. It's pretty much like salads except you're looking for burger. All right.
Here's the solution. We're going to say burgers df equals food df. And we're going to pass in our little Boolean comparison food df on the item column.
We're going to say string contains burger. And we're careful to say case equals false. And there you go.
There's your three burgers. Now what about string replace? This is another method you can call on strings. We have string contains.
So let's do just a little thing. Let's deliberately mess up the spelling of something. We're going to make a, we're going to set a column equal to itself, but with the string replace method called on it.
You remember string replace. Find cat, replace with dog, find O-R, O-U-R for British spelling, replace with O-R. Let's do a little review of the normal string replace.
Review of regular Python string.replace. And this being the data frame, you know, the pandas version, a little bit different, but kind of similar, right? Just replace. We'll say statement cats are better than dogs. No, we don't want to say that.
Your cats are the best. That's what we want to do. Okay.
Cats are the best. Cats rule. We're going to change that to dogs.
We'll say statement equals statement dot replace. We're going to replace cats for just the word cat. Even what if it's not, doesn't have an S with dog.
And then we'll print the statement. There you go. So that's a straight up simple Python regular string method called replace, which is built in here to pandas.
So what we want to do is we're going to find salad and just mess up the spelling and then see if we can fix it again. So, and we call that on the column. We'll say food df item, set the item column equal to itself inside Boolean comparison of the item dot struh dot replace.
So you're in the item on every single row and you're going to look for salad and you're going to replace it with ensalada. How about case false. What doesn't it like? Not found.
Okay. Food df. Oh, let's see.
Food df. Okay. We're going to the food df.
We're going to the item. We're going to the string and we're replacing the word salad with something else. We're saying case is false.
Oh, you don't wrap it in here because it's just one column to one column. Whoa. Set a copy on a slice.
Okay. So here's what it wants. It doesn't like that.
It works. Ensalada. Okay.
All the salads are ensaladas now, but it doesn't really like it. Now we're going to fix it back how it was. We're going to go ensalada back to salad.
We'll go capital S and we don't want this error message. We're going to say food df lock and we're going to say all rows comma this column. I think that would satisfy it.
All rows that column. There it is. That worked.
Back to salad. We did a direct selection without iLock. Just targeted the column and it didn't really like that.
It warned and said to use lock. So fine. Lock being we want.
Colon all rows comma item column. Set equal to itself with string replace. Ensalada changing to the word salad.
So that should probably be lock colon boom. Lock colon boom. There we go.
And we'll replace it with ensalada.