In the previous exercise you calculated the mean weight of the Cambridge crew and then the mean weight of the Oxford crew. You surely noticed that writing the code to calculate Oxford's mean was pretty easy since it used the same procedure as the code to calculate Cambridge's mean.

The process for calculating the mean is:

Put the data in a list.
Get the sum of the list.
Count how many items are in the list. (Its "length".)
Divide the sum by the count.

When working with statistics, it's common to find yourself repeating a calculation many times. Statisticians don't like to repeat themselves, so, to save yourself time, you can put the calculation into a function. When you define a function, it's like a recipe for serving ice cream:

Get ice cream
Get cone
Scoop ice cream into cone
Serve

Functions are also very useful for statistics because many of the advanced analysis tools are built on much simpler tools, such as mean, sum, and the number of data points in a dataset.

Your first functions

Before you can use your own function, you must define the function. Defining a function is similar to how dictionaries define words.

What is the meaning of a particular word? It means this, that, or something else.
What is the meaning of a particular function? It's defined as doing this, that, or something else.

You will now define one function, calling it mean(), and the use mean() to calculate stuff. Type this code:

# Lets create a mean() function

def mean(myData):
    """
    Input: a list of numerical data. We'll call it 'myData'
    Output: a number representing the average of the numbers in myData
    Remember that sum() and len() are pre-made functions that
    Python gives us. You will learn about other pre-made functions later.
    """
    return sum(myData) / len(myData)

# Lets try our function with a simple test.
print "I know the average of 2.0, 2.0, and 5.0 to be 3.0."
print "My function says the average is:"
print mean( [2.0, 2.0, 5.0] )
print

# Lets calculate the rowing means. This time, we will put the lists of
# numbers into variables so we don't get confused as to which
# list of numbers belongs to which team.
cambridgeWeights = [188.5, 183, 194.5, 185, 214, 203.5, 186, 178.5, 109]
oxfordWeights = [186, 184.5, 204, 184.5, 195.5, 202.5, 174, 183, 109.5]

print "The average weight for Cambridge is:"
print mean(cambridgeWeights), "pounds"
print "and the average weight for Oxford is:"
print mean(oxfordWeights), "pounds"
print
print "This is a difference of:"
print mean(cambridgeWeights) - mean(oxfordWeights), "pounds "

Save the program as my-mean-function.py.

What you should see

After clicking Run, you should get:

I know the average of 2.0, 2.0, and 5.0 to be 3.0.
My function says the average is:
3.0

The average weight for Cambridge is:
182.444444444 pounds
and the average weight for Oxford is:
180.388888889 pounds

This is a difference of:
2.05555555556 pounds

By making your own function, you just saved yourself a lot of work, since you didn't have to calculate by hand the averages.

Statisticians are lazy, so they make the computer do as much work as possible. This means there is more time to get ice cream. Hooray for functions.

Study Drills

What do you think those sets of double quotes (""") do? Hint: They do something similar to what # does.
Go outside and measure five things of the same type, then write a program that calculates the average of the five measurements using the mean() function above. For example, find five trees and attempt to hug them. How many percent of a hug is each tree? A small tree may be 0.25 hugs, an established tree may be 1.0 hug, and a big old tree may be 1.5 or more hugs. Be sure to perform the average calculation by hand as well, just to double check your work.

Learn Stats in 10,000 Hours by Jonathan B. Miller is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License.

Exercise 7: Functions

Your first functions

What you should see

Study Drills