Bar Charts and Data Sorting with Matplotlib

Create horizontal bar charts from grouped pandas DataFrames; annotate bars using loops and enumerate in Matplotlib.

Dive into data visualization using Matplotlib and learn how to create clear, readable bar charts. Discover practical looping techniques in Python to effectively label and enhance your visuals.

Key Insights

  • Utilize Matplotlib's barh function to create horizontal bar charts, making category labels easier to read compared to vertical bars.
  • Implement enumerate in Python loops to access both index positions and items within a list, enabling precise labeling of data visualization elements.
  • Enhance readability and aesthetics of charts by adjusting properties such as color, bar order, text alignment, and axis limits in Matplotlib visualizations.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

This is a lesson preview only. For the full lesson, purchase the course here.

Okay, time for a little data visualization with Matplotlib. Charts show data in an XY coordinate system. You have bar charts consisting of side-by-side, vertical or stacked horizontal bars.

You can get their bars to run sideways or straight up. The Y-axis is typically for your numeric values and your X-axis is for categories or in the case of if it's like a line chart, it would be a time series showing progression of time. So what we're gonna do is make a bar chart from the students, but a data frame and that group by data, just bars.

We'll see, you know, how long the bars are will represent how many items are in each category. We're gonna use plt.bar. We're gonna feed in X and Y values to make vertical bars and bar H, we're going to feed in X and Y values to make horizontal bars. Now it'd be better to make horizontal bars because bars have labels and it's hard to read these longer labels like that if they're underneath vertical bars, right? But if they're sideways, they read just like this, you would actually have the names exactly as you see them here.

Instead of numbers or in addition to numbers, you just have a bar, nice and long with a number after it perhaps. Okay, so we're gonna get the count column into a list. That is going to be our Y values.

We'll say count counts list equals edu group df count and listify that because it'll be a series otherwise, which is okay. We can use, we can work with a series on this. There, there's your values.

Python for Data Science Bootcamp: Live & Hands-on, In NYC or Online, Learn From Experts, Free Retake, Small Class Sizes,  1-on-1 Bonus Training. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

All right, what do we wanna do with them? Well, those are gonna supply the data into the chart to set the bar sizes. And now we want to go in somehow. Now we also want, we need for the chart, we need the names of these indices.

They're not columns, they're index. We wanna get a list of these seven. We're gonna see edu levels list.

We're gonna listify. What are we gonna listify here? We're going to listify not a column, but the index, right? Cause this, these values are not a column, they're an index. They're the number they're, you know, they could be zero through five or six.

Let's just see what we got. There you go, it works. Now we've got our two sets of data, the labels of the bars and the numbers to set the length of the bars.

Okay, so we need to learn a little move. We know a little bit about looping, right? But so I'm gonna show you something on how to do a loop. We have these categories, right? Let's say these edu levels list.

Let's say for edu level, we'll just say edu. In edu, let's just call it edu list. What's up, edu list not defined.

Why, who says? Oh, sure, edu group. Okay, so we're gonna loop this edu list. For edu and edu list, let's just print the edu every time.

All right, there you go. So why do we care? Later, we're gonna need to loop our bars to label them with numbers. But what if you wanted to print the index of each item with it? So you want like zero, some college, one associates degree and so on.

Well, this loop doesn't have access to the index of the items, but we could have it. We could say we want the index also, but now we have to wrap the list itself in the for loop in the enumerate method, which will unlock the enumerate method, which gives access to index as well as the item. Regular for loop just gives access to item, right? When you have a regular for loop, all you have access to is the item, but we want the index as well.

Why, you'll see why later, but for now, just theoretically, what if we want the index, okay? Now we could print the index and the item. That's enumerate for you. It unlocks the index and you have to list them out like so with the index first.

So for index item, enumerate list. The syntax is for index item in enumerate some list or iterable. So let's try that with fruits.

We'll say fruits, just have one more example to make sure we get this concept. We're going to print the fruits number. We'll say for index fruit in fruits, print i comma fruit.

So what up? Too many values. Oh, right, too many values, right? Because I didn't do enumerate. You got to have enumerate if you want two values.

There we go. And let's say you wanted to start numbering from one, you can do index i plus one, okay. So challenge, this is hard.

Make smoothies of pairs and consecutive fruits. This requires a current index plus the next item. So use this enumerate where you're grabbing the index and you want to make pairs of consecutive fruits such as apple, banana, smoothie.

Just another bit of practice to see if we can work with the index while we're looping as well as the item. And we need that for what we want to do with our bar chart that we're making. Pause, come back when you're ready.

Okay, here we go. We're going to print. Let's actually throw the results in its own list called smoothies.

We're going to loop. Let's print smoothies when we're all done. We'll pprint it.

I don't think we've brought people. Did we bring in pprint? No, let's do it. We'll say import pprint as pp.

All right. We're making smoothies. In other words, hyphenated consecutive fruits.

Consecutive fruits. So it would be like this apple, banana. We're not going to print.

We're going to take smoothies.append and we can append using that. We haven't used this in a little while. The string formatting with the F. We'll say we want the current fruit-fruits I. Another way of saying fruit.

Run and there you go. But it made self pairs. We want to make the next item.

We're going to say fruits I plus one. And then it throws an error because the list is out of range. Because when you get to the last fruit, you don't have another item afterwards.

You can't do an I plus one when you get to the end. So what we're going to do is say if I is less than fruit than len of fruits minus one. We don't try to make a smoothie with a non-existent fruit off the edge.

Orange peach, right? Peach being the last one. So there's a condition. You don't append if you're already at peach.

You make your last smoothie when you're at orange. Okay, so now that we have a sense of using I enumerate to unlock the index, let's get on to making a bar chart. We're going to use, we're going to say color.

We'd actually, you know what? We just want one color. We're going to use Dodger blue as our color. You don't make different colors when it's the same kind of data.

So I was playing around with that. PLT dot bar H, we want horizontal bars, as we saw, right? Bar H. We're going to feed in the X values and the Y values. So the X values, of course, are going to be the categories.

The EDU, EDU list. And we want to have the counts as the Y values. Typically, your Y value is your counts, be it sales or stock prices or whatever.

Run. And there's your bar chart, look at that. You want the big bar at the top, we can flip that.

You can reverse these lists. We can say EDU list dot reverse. Counts list dot reverse, run that.

And now we're going the way we want, okay. Now, we did that little sidebar move. It's a little side exercise with enumerate because we didn't get to it in lesson four when we first looked at loops and we need it now.

I tried to teach us, you know, as much stuff as we could to when we get into the data science to be well-equipped, but there's inevitably extra stuff that comes up. Here we saw enumerate. Now, why do we need enumerate? Why would we need the index to loop what? We're going to use the index to loop the counts, the numbers that represent the bar sizes.

And as we go, we're going to output the number next to the bar. We're going to say for i comma count in enumerate counts list, let's print i comma count. There you go, there are your counts by index.

That's not really what we want to do. What we want to do is say, okay, PLT text, we're going to label the bars with text. And the PLT text takes an X and a Y position and then a value of text.

So you pick and you go into your X and your Y coordinate system, the X being the bottom and the Y being going vertically. And you come in and say, okay, X, Y, like a X, Y point, like dunk on the chart. And then you set text at that spot.

So what would our X position be for every count? It would be the count. We're going to say PLT dot text. Where do we want to lay text down? At the current count so that it's next to the bar.

And where do we want to lay it down? On the Y axis, we want to lay it down on the index from zero to six. And what do we want to say? We want to output the count, run. There you go.

Now there's a little breathing room here. We're going to say count plus five to get the data, the values, the labels off. That's fine.

The plus five moved the labels off the bars, but we need to widen the bars. We're going to use something called the X limit. We're going to set the X limb, horizontal limit to be wider than the default.

Because by default, these charts aren't any bigger than they need to be to fit their data. So the X limb will go from zero to, instead of 225 or something, we'll go to 250. We started at zero, still we're going to 250.

We're widening the X. And why is it? Oh, because I'm reversing every time, right? Remember we're reversing. We should move these out of here. Yeah, I was flipping every time because we keep reversing.

Let's just reverse the one time. There we go. Now we need to have, we should have, let's set the color.

We'll say comma color, Dodger blue. It's kind of like the current blue, a little different. There you go.

And let's do plt.title. Charts should have a title. And the Y axis should be the number of students. The, excuse me, the X, and it's the X label.

There you go, number of students. We don't need to put anything on the left side. That's pretty obvious, especially when it says it in the title.

Like we don't need to label the Y axis. It's already all labels. We can change the color of the title and the X label.

Why don't we do that also? There you go. Number of students, title. And there's also ticks.

These labels here on the left, the edu labels, those are actually called the ticks. And they're the Y ticks. So let's say plt.yticks, color, equal, we'll do coral for them too.

There you go. I don't know if that's a great color or anything, but, or we could say Dodger blue for that one. Like, even that's colors though, right? Or we could say gray.

Yeah, let's do a 555 or something. There you go. Maybe a 237, let's see, what will that look like? Okay, and when you're all done, okay, let's not print out the counts here.

And plt.show is kind of the last thing you wanna do. It means that you're done with this particular plot. And if you label the Y, it'll say parental education category, but we don't really need to see that, right? So let's just turn that off.

That's just to show you, you could label the Y. And that's pretty much the end of that. This is what you wanna camp out on and work on and do. This is our very first visualization.

Takes a while, you gotta learn your core programming, then you gotta learn Numpy and Pandas. Notice I just keep hitting you with more and more and more and more stuff. And now we only finally can get into visualization.

We can go on and just go on forever with this stuff. Let's, yeah, there it is. Notice the text a little bit up, a little high, writing a bit high.

We could add another argument. We could say VA, vertical alignment, center. And let's watch the text move down.

There it is, it's down centered now. All right, last move, then we're done. Sorting by secondary and tertiary categories.

So let's sort the math score and then go on to sort the reading score. So let's say we wanna sort all the students. We'll say studentsdf.sortvalues by, we're gonna say average, ascending false.

We wanna see the heavy hitters at the top. We'll go in descending order. And we just wanna see the top 40, slice that off.

We don't even need a new df, let's just run it. Okay, so there's your average. You got a few people with 100 average.

Now after that, the question is, do you want a secondary score, a secondary sort? We got two people tied with 99. Would you like to rank them? Because right now we don't really know how they're ranked. They're ranked, just don't even know, right? They have a tie score.

Here's a couple more of the tie score. Here's three in a row with a tie score. Why is one ranked above the other? Is it alphabetized? What is it? Now we could say we prioritize the math score and we could say, okay, rank by average.

And then if there's a tie, rank by math. So you could feed in a second sort field column. So you'll start sorting by average, but in the event of a tie, you'll use math as a tiebreaker.

So watch the orders change a little bit. There you go, the highest math. So in the event of a tie, here's your tie with your 97, 67, the three-way tie.

If you take out math and look at that three-way tie, 97, 67, the math is actually at the bottom. We're not prioritizing the math. Prioritize the math.

And there you go. Beyond that, you wouldn't really probably wanna sort on another level of three-way tie. So just secondary sorting, really.

All right, that's the end of this lesson. Glad you stuck with it. Hope you enjoyed it.

Thank you very much. We are done with lesson eight. Pandas, Matplotlib, and CSV.

Brian McClain

Brian is an experienced instructor, curriculum developer, and professional web developer, who in recent years has served as Director for a coding bootcamp in New York. Brian joined Noble Desktop in 2022 and is a lead instructor for HTML & CSS, JavaScript, and Python for Data Science. He also developed Noble's cutting-edge Python for AI course. Prior to that, he taught Python Data Science and Machine Learning as an Adjunct Professor of Computer Science at Westchester County College.

More articles by Brian McClain

How to Learn Python

Master Python with hands-on training. Python is a popular object-oriented programming language used for data science, machine learning, and web development. 

Yelp Facebook LinkedIn YouTube Twitter Instagram