Gain a clear understanding of the Iris dataset structure, which includes petal and sepal measurements across three distinct flower species. Learn how to organize this dataset into a more readable format for easier analysis.
Key Insights
- The Iris dataset provided by sklearn includes 150 samples, each with four measurements: petal length & width and sepal length & width.
- Species classification is indicated by numerical targets (0, 1, 2) corresponding respectively to the flower types setosa, versicolor, and virginica.
- The next step involves transforming this numerical array structure into a human-readable data frame for streamlined data manipulation and analysis.
Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.
Let's load the Iris data and see what we're working with. Load Iris is a function that we're given by sklearn and it gives us back this big dictionary with lots and lots of different properties that give us more information about the Iris data. Let's run this and check it out.
So the very first thing is the data. Data property is an array of arrays. Each one of these is a row of petal length and width, sepal length and width.
And each of these is one of our flowers. One flower belonging to one of the three species. And there's 150 of them.
When we get far enough down, here's the target. And it's an array of zeros, ones, and twos. Here are all of the 50 flowers of each.
Which ones are which? Zero, one, and two. And those go, there's also a target names property, which is setosa, versicolor, and virginica in order. Setosa is zero, versicolor is one, and virginica is two.
And we can actually take advantage of the fact that these are in order a little later when we wanna give each one a human readable name. We can save this data as Iris data seems good. And if we wanna look at iris data dot data, it's that array of arrays, those 150 rows.
If we wanted to look at one of them, we could. If we wanted to look at whichever feature this is, maybe sepal width, I don't have the order memorized. It's 3.5. Again, you don't need to know a lot about the actual flowers to work with this data.
Okay, so that's our data. Now we're going to, in our next step, put this all together into a data frame and make it look human readable and easy to work with. Let's do that.