In this series of posts, we'll cover various applications of statistics in Python. This first post talks about calculating the mean using Python.
For the next three posts, we will tackle the topics of mean, median, and mode. We will discuss the motivation behind finding them, how to calculate them, and ultimately show how easy it is to code for these statistics in Python. All of these statistical measurements are useful for finding the central tendency around the data but use different methods and each method has its own pros and cons.
Basics of Mean
You probably have learned about these numerical measurements, but you were too young to understand the real-world application. Let’s discuss the term mean in this post, or more commonly referred to as the average. The mean is used as a “summary” measurement since it takes all the data values into account and divides it by the number of data points. This is very commonly used in all walks of life as it is an efficient way to understand data. Think of an investment fund who has 25 current investments and you want to know if they are doing well but do not have time to analyze each investment, the mean is a great measurement to see “on average” if the fund is performing well. However, the main drawback of using the mean is that it is heavily affected by outliers. For example, if the investment fund has actually made poor investment decisions on 24 of its investments but has one extremely lucrative investment, the lucrative investment will skew the mean and make it seem as if the fund has made consistently good investments, even though they have not.
Finding the Mean: Tutorial
So how do we find the mean? The mean is a relatively easy measure to find mathematically, it takes only two steps. To calculate the mean, you must add up all the numbers and then divide the sum of those numbers by how many data points they are, or in math terms, it is simply the sum divided by the count (remember these two terms). It is good to know the math and understand where the number is coming from but Python does all of this for us with two short commands. Look below on how easy it is to solve for the mean in Python.
-
Step 1: Create a variable named test_scores and populate it with a list of individual test scores.
-
Step 2: Use the len attribute to count how many data points are in test_scores and use the sum attribute to add up all the scores in test_scores.
-
Step 3: Create a variable named count and set it equal to 12 (got 12 from the len function) and create a variable named sum scores and set it equal to 1024 (got 1024 from the sum function).
-
Step 4: Divide the sum by the count which is how you find the mean and then use the print function to show the output to the user.
Note: If you are new to Python or need to brush up on some skills check out our Python Classes that are offered in-person or live online.