Transforming Data with Pivoting

Demonstrate pivoting dataframes and plotting population trends using Pandas and Matplotlib.

Learn how to reshape your data effortlessly using the pivot method, turning column values into distinct columns and reshaping rows into indices. Discover practical examples of pivoted data visualizations, creating insightful line and bar charts to illustrate population trends clearly and effectively.

Key Insights

  • Utilize the pivot method on a DataFrame to transform data structure, converting column values into new column headers and reshaping data into distinct rows and columns, as demonstrated with a dataset of 18 years and more than 200 countries.
  • Explore techniques to refine pivoted data for clearer insights, such as selecting subsets of countries (e.g., Western European countries including France, Germany, and Italy) and visualizing their population trends with line and bar charts.
  • Create comparative side-by-side and stacked bar charts to effectively showcase population growth and trends for specific country groups, including visual customizations like adjusting figure size, bar colors, legend positions, and tick orientations.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

This is a lesson preview only. For the full lesson, purchase the course here.

All right, next up, pivoting the data. Pivoting takes column values and makes whole column names out of them, all kinds of manipulations. Here's exactly how it works.

The pivot method is called on a data frame and it changes columns to rows and vice versa. So you would set it equal to a new data frame and it would pass in three arguments, an index, columns, and values. The index is the column that will become the row.

Index names. So for index, you pass it a column and then the values of that column will become the indices, like the named indices, just like we've got country. And what we want for named indices are the years.

We wanna make a data set of 18 different rows, one per year. And then the columns, the second argument, we'll take the column that has the values you wanna use for the column names and that's gonna be the countries. Now you've got like 200 something countries.

So suddenly you're gonna have 18 rows in 200 countries. That's pivoting. Each column, each country gets its own column.

Python for Data Science Bootcamp: Live & Hands-on, In NYC or Online, Learn From Experts, Free Retake, Small Class Sizes,  1-on-1 Bonus Training. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

That's a major restructuring of the data. And then the values at the intersection of year and country, that will be the population. So again, well, we only need the three columns.

We're going to say POP, PIV, DF, and that's gonna be off the original data set, the original POP data frame because that's still got the 4,195 rows. But we're going, all right, so we're gonna make a, we've already actually got our three column thing, right? Okay, so we're gonna say POP, three, what do we call our three column thing? POP, three columns, right? POP, three, call, DF. Okay, so we've already got the POP, three, call data frame.

So what we wanna do is say, all right, from the POP, three, call data frame, we're gonna pivot it, pivot, and we're gonna pass in and the result will be the POP, pivot, DF, POP, PIV. And we're going to, we need these three arguments. We're gonna pass in for index, just follow along, index, columns, values.

The index is gonna be the year, that is the row names. Index is the column that will become the row index name. So your new DF called POP, PIV, DF is going to have one row per year, unique year, and there's 18 unique years.

The columns of the new DF, this pivoted thing we're making, the column will contain the values that we use for column names. There's gonna be one new column. There's gonna be, in this resulting POP, PIV, DF, there'll be one column per country.

So over 200 columns. And then the values that go in the cells themselves will be whatever's in the population boxes. We're going to print the shape of this and then there's only 18 rows, so let's print it all.

Whoa, snap. Data frame has no op, nothing called pivot. I spelled it wrong.

Okay, there we go. All right, 18 × 235, as prophesized. All these countries, population of all of, so for 1955, there's every country.

For 1960, there's every country. It's doing the little dot, dot, dot and jumping. And they're alphabetized.

That's kind of nice. We didn't even ask it to do that really, right? To alphabetize the countries. All right, now we can go into this and shave that down, right? Like we did on our original dataset, we had 14 columns.

Oh, we only need three columns, year, country, population. Okay, we have 200 countries, 235 countries. We only want five countries, let's say, for charting purposes.

So let's do a Western Euro chart. We'll say, we're gonna chart Western Euro. We'll say West EuroDF is gonna equal the PopPIVDF.

And we only want these five countries though. We'll get the shape of that. And it's still 18 rows.

We haven't changed that, but now we've only got five of the 235 countries. And what might we wanna do that? What might we want to do with that? Well, we're gonna plot that as lines through time, each country, its own line, right? So get that to work. DF plot, kind equals line, makes a line for each column.

We're using the pivoted data, we're indexed as a name, right, and year. The index values go along the X axis, the years, right, 1955, 1960. The values, the populations will go on the Y axis.

We're gonna plot the five countries as lines of population trends over time, because the X axis is going to be the index years. So that's time. We'll say West EuroDF.plot, kind equals line, exactly like the bold text there says up there.

And run. There it is. Now we can do better.

Let's expand this thing. It's too stumped, it's too, it's not wide enough. We'll say fig size, the new little move for us, takes a tuple of a width and a height.

11 by six means wider than, like twice as wide as it is tall, which is nice because it makes room. It makes room for the legend, which is automatically supplied, which is kind of cool. So there you see the population trends of these five Western European countries over time.

Let's, we'll do the plt.title. And that'll be actually the, the title will be this one. This will be the Y label. Yeah, missing.

Okay, great. And plt.show to get rid of that scrap of text at the top. There you go.

That's it. Basic, cool, works. Onward.

Now let's make bars. We're gonna make bars comparing three countries side by side per year. So for every year, you're gonna get three bars and we'll use France, Germany, Italy for this.

We'll start with our West EuroDF and just shave it off to be only three of the countries. And then we'll plot on that, kind equals bar. And we're gonna want a fig size, kind of wide.

We'll go 14, four, like almost three, like three times, three and a half times as wide as it is tall. And run. All right, there you go.

Now we can move this legend. The most unobtrusive place would probably be the lower left. You have nine spots like a tic-tac-toe board.

We're going to say plt.legend, lower left. Location, lower left. There it is, kind of out of the way, nice.

And let's copy that because what if, what if we wanna stack these bars? That's also possible. Make stacked bars. Now you think you might have to do all kinds of manipulations to stack the bars.

Nope, we'll just say stacked, true. And now they're stacked. And if you don't fancy these colors, you can change them.

We can feed in our own little color list of a few hex colors and the Y label we might wanna do. Y label equals hundreds of milliones of people, plt. There you go.

And we just set the color, right? Whatever you like. Maybe you can go in and get colors of, and you could put flags, you could put images and patterns on there. That's a little outside the scope for an intro course, but you can set the colors certainly.

Let's do another DF of just China and India, the two biggies. We'll say China India DF equals our poppiv DF bracketing off, slicing off just these two. The shape will be 18 by two, right? 18 rows, one per year sample and two countries.

There you go. There's our biggies. And we can get the tail, just get the last, the three most recent entries, five most recent.

Okay, let's plot that. We'll plot a line to start. We'll say China India DF, dot plot.

Kind is gonna be a line. Run that, and there's your line. Now we'd like to make it look, stretch it out a little bit.

Get that show, get rid of that axis scrap thing here. We're gonna go with a nice wide view. We'll say big size.

Equals a tuple of width and height. How do you know what to put for the width and height? Just make the width way more than the height. Okay, if that's what you're trying to do.

There we go. And pop in billions, right? PLT dot Y label and PLT dot title. There you have it.

All right. Now side-by-side bars for that. Kind of like we did bars with the countries.

So the bar chart version of that is gonna be very much like what we did here. So take this. And we're gonna say, in China India DF.

We don't need to slice. It's just China India. Plot bar figure.

We don't need to be that wide. 10, four. We're definitely not stacking right now.

Colors don't change. Y label will be population in billions. There's the legend will go in the upper corner.

Just leave that alone. And the title populations like so. There you go.

Now we could also turn the years sideways, right? Like they are here. And to do that, those are called the X ticks. We can say PLT dot X ticks.

The labels are the thing you type like year or population billions. Those are the labels. But the little things next to the values with the little lines, little ticks, those are called the ticks.

We're gonna set the rotation of the ticks to zero. And there it is. And maybe knock the font size down a little bit.

There you go. Lovely. And if you wanted to stack that, you sure could.

Let's just stack it, make a copy paste and just add one more property stacked through. And now they're stacked. So that's the combined population change over time.

I think it's more interesting side by side though, because it's more, you know, the data tells a story. You see India catching up to China and, you know, you go five more years in the future. I do believe India has already caught China.

Thank you.

How to Learn Python

Master Python with hands-on training. Python is a popular object-oriented programming language used for data science, machine learning, and web development. 

Yelp Facebook LinkedIn YouTube Twitter Instagram