Creating a DataFrame with Iris Dataset

Convert the Iris dataset into a pandas DataFrame, map numerical targets to species names, and add this as a new column.

Transform the Iris dataset into an intuitive DataFrame by mapping numeric target labels to meaningful flower species names. Learn to streamline this process using Pandas' apply method with both regular Python functions and lambda expressions.

Key Insights

  • Utilize Pandas to convert the Iris dataset into a structured DataFrame, clearly labeling columns such as sepal length, sepal width, petal length, and petal width, covering a total of 150 observations.
  • Map numeric target indicators (0, 1, 2) in the original dataset to their corresponding flower species—setosa, versicolor, and virginica—improving readability by creating an additional "species" column.
  • Apply Pandas' apply method effectively with standard Python functions and lambda functions as demonstrated in the article, enabling efficient data transformation and enhancing dataset comprehension.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

Let's start making this into a data frame that we can work with. We can first take a look at one other bit target names, and that's the setosa, versicolor, virginica. Now we'll use those, we can also look at feature names, sepal length, width, petal length, and width.

So those are gonna be our column names so that we can make this into a proper data set. So let's do that. We're gonna say iris data frame, give me a pandas data frame, where the data is irisdata.data, and the column names are irisdata.featurenames. And then we can look at our data frame.

All right, it's split it up into these four columns from each one having, remember, a row, an array of four items. And we've got our column names, sepal length, sepal width, petal length, petal width. And there are 150 total rows.

Next, we'll take a look at adding target. We don't know which each of these flowers is, which ones setosa, versicolor, or virginica. So before we actually add a target as a column, that's our goal here, we can look at irisdata.target. And that's an array of zeros and ones and twos.

And again, that's setosa, versicolor, and virginica. If that's our target names, we can now get them onto our data. We can say irisdata frame, add a new column target, and it equals irisdata.target. And now let's look at our irisdata frame.

Data Analytics Certificate: Live & Hands-on, In NYC or Online, 0% Financing, 1-on-1 Mentoring, Free Retake, Job Prep. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

You can see now it has a target going from zero and then at the tail end, all twos. All right, now this is all well and good, but I'm definitely gonna forget which one's zero, which one's one, which one's two. I don't know about you, but I'm definitely not gonna keep that in mind.

In fact, I don't know it now. We're going to make a species column. And first to do so, we're gonna need to go over this target column.

And for each thing, translate it from these numbers zero, one, or two to flower names. Now the flower names are in iris target names, and that's an array, stose of versicolor virginica, zero, one, two. We can look at the index in iris.target names to get the flower species.

So for every one that's zero, we'll look at that target index zero. And target names. If it's one, we'll look at that index in the target names.

And if it's two, we'll look at that index in the target names. So to do that, we need to use Panda's apply method. Apply method takes in a function.

Now we'll do this both with a name function and with a lambda. So you can see the different ways to do it. I prefer to start with a regular function, a regular Python function.

And this is one that takes in a target number and it returns a flower name that that target number goes to. So I'm gonna make a flower name variable in within this function that will be in the iris data in the target names, the one at that target number. So again, if target number is zero, this will be the array target names, zero, you know, setosa, versicolor, virginica, at index zero.

If target number is one, then it will be target names at index one and so on. And we'll save that string as flower name and return it. All right, so now what we can do is use that function and give it to Pandas to run on every target.

Right, so the first target, first row, it'll run it on target number and give us back that flower name and make that the value for irisDF at species. So it's irisDF a target, but applying our get flower name function. And now let's, you know, double check a couple of these by, actually let's do irisDF.sample, get 10 random flowers.

We're defining our flower name, our get flower name function. We're saying apply that flower name function to every target value and save that as the species value. Let's try that.

There's some random ones. It applied that function to the target and got versicolor. Applied that function to this one and got setosa and so on.

And here's some twos. Now we have a very human readable species. If you want to, you know, try that with a lambda, we could have skipped this, defining this function to begin with and just done this.

Again, this is if you're pretty comfortable with your Python lambdas, then this is a good way to do it. We could have done instead of this line and instead of these, this function, we could have just done this all in one line, irisDF at species equals irisDF at target, apply a function that's not named. It's a lambda that takes in target number and returns irisData target names at the index target number.

Again, this does the same thing as this function up here. It just does it quicker and in one line. If we run that while I've made a typo at the start, try running it again, there we go.

Same result. Just depends on which style you prefer. But either way, we now have a human readable set of species.

Colin Jaffe

Colin Jaffe is a programmer, writer, and teacher with a passion for creative code, customizable computing environments, and simple puns. He loves teaching code, from the fundamentals of algorithmic thinking to the business logic and user flow of application building—he particularly enjoys teaching JavaScript, Python, API design, and front-end frameworks.

Colin has taught code to a diverse group of students since learning to code himself, including young men of color at All-Star Code, elementary school kids at The Coding Space, and marginalized groups at Pursuit. He also works as an instructor for Noble Desktop, where he teaches classes in the Full-Stack Web Development Certificate and the Data Science & AI Certificate.

Colin lives in Brooklyn with his wife, two kids, and many intricate board games.

More articles by Colin Jaffe

How to Learn Machine Learning

Master machine learning with hands-on training. Use Python to make, modify, and test your own machine learning models.

Yelp Facebook LinkedIn YouTube Twitter Instagram