Exploring the Iris Dataset with Scikit-Learn

Gain a clear understanding of the Iris dataset structure, which includes petal and sepal measurements across three distinct flower species. Learn how to organize this dataset into a more readable format for easier analysis.

Key Insights

The Iris dataset provided by sklearn includes 150 samples, each with four measurements: petal length & width and sepal length & width.
Species classification is indicated by numerical targets (0, 1, 2) corresponding respectively to the flower types setosa, versicolor, and virginica.
The next step involves transforming this numerical array structure into a human-readable data frame for streamlined data manipulation and analysis.

Note: These materials offer prospective students a preview of how our classes are structured. Students enrolled in this course will receive access to the full set of materials, including video lectures, project-based assignments, and instructor feedback.

Let's load the Iris data and see what we're working with. Load Iris is a function that we're given by sklearn and it gives us back this big dictionary with lots and lots of different properties that give us more information about the Iris data. Let's run this and check it out.

So the very first thing is the data. Data property is an array of arrays. Each one of these is a row of petal length and width, sepal length and width.

And each of these is one of our flowers. One flower belonging to one of the three species. And there's 150 of them.

When we get far enough down, here's the target. And it's an array of zeros, ones, and twos. Here are all of the 50 flowers of each.

Which ones are which? Zero, one, and two. And those go, there's also a target names property, which is setosa, versicolor, and virginica in order. Setosa is zero, versicolor is one, and virginica is two.

Data Analytics Certificate: Live & Hands-on, In NYC or Online, 0% Financing, 1-on-1 Mentoring, Free Retake, Job Prep. Named a Top Bootcamp by Forbes, Fortune, & Time Out. Noble Desktop. Learn More.

And we can actually take advantage of the fact that these are in order a little later when we wanna give each one a human readable name. We can save this data as Iris data seems good. And if we wanna look at iris data dot data, it's that array of arrays, those 150 rows.

If we wanted to look at one of them, we could. If we wanted to look at whichever feature this is, maybe sepal width, I don't have the order memorized. It's 3.5. Again, you don't need to know a lot about the actual flowers to work with this data.

Okay, so that's our data. Now we're going to, in our next step, put this all together into a data frame and make it look human readable and easy to work with. Let's do that.

Colin Jaffe

Colin Jaffe is a programmer, writer, and teacher with a passion for creative code, customizable computing environments, and simple puns. He loves teaching code, from the fundamentals of algorithmic thinking to the business logic and user flow of application building—he particularly enjoys teaching JavaScript, Python, API design, and front-end frameworks.

Colin has taught code to a diverse group of students since learning to code himself, including young men of color at All-Star Code, elementary school kids at The Coding Space, and marginalized groups at Pursuit. He also works as an instructor for Noble Desktop, where he teaches classes in the Full-Stack Web Development Certificate and the Data Science & AI Certificate.

Colin lives in Brooklyn with his wife, two kids, and many intricate board games.

Key Insights

Colin Jaffe

How to Learn Machine Learning