String Splitting in Python Programming

Demonstrate how to split strings into words or characters, and remove file extensions using Python's split() and list indexing methods.

Master the string.split method in Python to efficiently separate strings into lists of words or characters. Learn practical techniques for handling separators, file extensions, and DataFrame filtering.

Key Insights

  • Understand the functionality of Python's string.split method, which splits strings into lists by spaces or user-defined delimiters like hyphens or dots.
  • Learn how to remove file extensions effectively by splitting filenames on the period character, isolating the base filename, and updating list elements accordingly.
  • Explore practical tips for filtering pandas DataFrames in Python, such as using the tilde (~) operator to invert selection and create a filtered dataset containing only single-word items.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

This is a lesson preview only. For the full lesson, purchase the course here.

OK, string.split, you may recall from unit or lesson four, we had loops and string methods. We got into string split. We would, no, we have not split yet.

We just did string replace. OK, so let's do an example. We're going to split.

We have a little example here that has nothing to do with the data frame. We just want to see the mechanics of splitting a string and what does that even mean. So string.split splits a string into a list of individual words on the space.

You can also say, you can also split and pass in, you can split a single word into letters by passing in a quote into the split method. So here we have an item called Big Kahuna Burger. And what we want to do is split it up into a list.

We'll say item list equals item. Let's call this item string. Item string.split, then we'll print the list.

Python for Data Science Bootcamp: Live & Hands-on, In NYC or Online, Learn From Experts, Free Retake, Small Class Sizes,  1-on-1 Bonus Training. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

And it gives you a list of individual words. You can also split a word into letters, an individual word. Or the whole thing into letters.

We could also say, we'll call this a word list, words list. And then we'll say split a string into letters. We already have our item string.

Now we're going to say letters list, or chars, because it's got spaces, not everything's a letter, is going to be the original item string, though, with a delimiter, as it's called, being a quote passed to it. And now, empty separator. Yeah, OK, it doesn't want spaces.

So let's do this on item string one, which is the word kahuna, right? You go into item string, or words list, actually. If you go into your words list, you have Big Kahuna Burger, 0, 1, 2, right? If you print words list one, you get kahuna. OK, so we're going to take that one word kahuna, and we're going to split that into letters.

Why? Empty separator. OK, you don't feed anything, you just run it. There you go.

Yeah, you don't need to feed it anything. It just automatically, oh, it's going to do it. It's going to do it as a list of one item.

Ah, yeah, yeah, yeah, yeah. You know what? We don't even need to do that. You just pass it to the list method.

It's not even a split move. Split, watch, just list. There, that's what we're after.

Yeah, you don't even split it. If you want to take an individual word and knock it into, so you'd say, charge list equals word equals list string returns a list of individual chars of a string. Splitting on a hyphen.

Now, what if you want to split and it's not a space? So I want to get big kahuna burger in their own separate. As their own separate items in a list, but I don't have a space to split on. When you call split, those string.split splits on spaces.

Splits a string into a list of individual words. That is splitting on the spaces, right? If it's making individual words, sort of the splitting, the axis coming down on the space resulting in a list of individual words. But what if it's technically all one word because there's no spaces, but you still want to split it up into the individual words that you can see here? Well, then the axis has to come down on the hyphen.

We're going to say words list again is going to be pick.split. And now we're going to split on the dash. Print your word. Now, if you don't split on the dash, let's say you don't split on anything.

What are you going to get? You're going to get big kahuna burger JPEG by itself, which is not what you want, right? Like you get a list, all right, but it's a list of one item. So what we really want is split on the dash. So the ax comes down on the dash and look at the difference.

There you go, big kahuna burger.JPEG. And we could then come in and split that on the dot if we like. OK, challenge. We've got our word list is down to this big kahuna burger JPEG.

How can we drop the JPEG and just have burger so we have big kahuna burger as the three items of the list? Now, you could say straw.replace on that words list too. You could replace .JPEG with nothing, but that's not so versatile because what if it's a PNG or a GIF or an SDG or a IPYNB for that matter. So better would be split on the dot into a little list of two items and then drop the second item.

So to do that, we're going to say words list negative 1, the last item, dot split on the dot. And we'll say last word. We'll just call it word with file extension.

Let's print that. That should be burger, should be its own little mini list here, a burger in JPEG. Yep.

See what we did here? Words list negative 1 is the burger JPEG, and we're calling split on that string and splitting on the dot, and that's going to return us a list of two items. Then we're going to pop off of that. Well, we don't have to pop.

We can just extract the item. We can say word no file extension is going to equal word with file extension at index 0. That would be just burger at that point, right? Because we've got our little list of two items, burger JPEG, and we're saying just go into that list and get us the first item and save that. And then we go into the words list where we still have burger JPEG, right? Let's make sure we still have burger JPEG.

Yep, and now we're going to replace the last item with this word no file extension. We'll say words list negative 1, the last item is going to be the new word no file extension. If you then print word list, words list rather, negative 1 is the word with no file extension, and there you go, boom.

We managed to take that file extension, knock it down like so. I mean, there are a little bit easier ways perhaps. Not really, actually.

I mean, it depends what you're trying to do. If you're trying to get them all into a list without the file extension, that's pretty much it. Splitting.

Next challenge, we're going to make a new DF called one word that contains only items of one word. One word DF equals food DF, filter item, that's straw, contains, space. Actually, we want not contains.

We put a tilde at the front there. That tilde is the not operator for Boolean filter in DataFrame. We're saying string contains not space.

If we took that out, we'd just have our multi-word thing again, right? Which is not what we want. Tilde, oh, right, it goes in here. It goes inside the filter one.

Okay, good.

Brian McClain

Brian is an experienced instructor, curriculum developer, and professional web developer, who in recent years has served as Director for a coding bootcamp in New York. Brian joined Noble Desktop in 2022 and is a lead instructor for HTML & CSS, JavaScript, and Python for Data Science. He also developed Noble's cutting-edge Python for AI course. Prior to that, he taught Python Data Science and Machine Learning as an Adjunct Professor of Computer Science at Westchester County College.

More articles by Brian McClain

How to Learn Python

Master Python with hands-on training. Python is a popular object-oriented programming language used for data science, machine learning, and web development. 

Yelp Facebook LinkedIn YouTube Twitter Instagram