Mathematical and Statistical Methods

Sum

Suppose our dataset comprises of scores of a student for each item in a certain exam, which can be represented by a simple, one-dimensional array. The sum function allows us to obtain the total score of the students in the exam. The example code below shows the score of a student on a 10-item exam, where each item is worth 5 points.

# 1D array

studentScores = [2,4,5,3,2,4,4,4,5,1]

print(“\nTotal grade : “, np.sum(studentScores), “ out of 50.”)

Output

Total grade : 34 out of 50

Arithmetic Mean Function

Consider two sample datasets of height of tomato plants (measured in centimeters) below.

109

163

156

135

178

144

The arithmetic mean of each dataset, which is equivalent to the sum of data divided by the number of data points, tells the average height of the tomato plants in a certain group. Comparing the average weight of the first group (92) to that of the second group (155.2) may, in some study, tell the researchers that tomato plants subjected to a specific concentration of a natural growth enhancer (which is referred to by the second dataset) may yield better and healthier plants in terms of plant height.

controlGroup = [86, 109, 96, 79, 90]

treatmentGroup = [163, 156, 135, 178, 144]

ave1 = statistics.mean(controlGroup)

ave2 = statistics.mean(treatmentGroup)

print(“The average height of tomato plants in the control group is :”, ave1)

print(“The average height of tomato plants in the treatment group is :”, ave2)

Output

The average height of tomato plants in the control group is : 92The average height of tomato plants in the treatment group is : 155.2

Std, var Standard Deviation, Variance

Consider two sample datasets of children weight (measured in pounds) below.

In the first child weight dataset above, the data ranges from 12 to 39, with a mean of 24. The standard deviation is 10.8. The second dataset, albeit having the same mean of 24, has data ranging from 10 to 56 and a standard deviation of 18.5. Comparing the two standard deviations implies that the data in the first dataset is much more “neighboring” and less dispersed than the data in the second dataset.

weights1 = [12, 18, 31, 39, 20]

weights2 = [14, 56, 23, 17, 10]

print(“Standard deviation of first dataset is “, statistics.stdev(weights1))

print(“Standard deviation of second dataset is “, statistics.stdev(weights2))

Output:

Standard deviation of the first dataset is 10.8397416943394Standard deviation of the second dataset is 18.506755523321747

Min, Max Minimum, and Maximum

Consider two different datasets of children weight (measured in pounds) below.

105

It is easy to determine the min and max for our data are 13 and 105, respectively. The min and max functions allow for filtering unwanted data. Suppose a study is focused on children over the age of two years old. By identifying the minimum of 13 pounds, the researcher may be concerned that they have unknowingly gotten an infant in their study. Upon investigation, the researcher may contemplate what should be done with such data as they proceed with the data analyses. Note that the maximum may also be applicable in an reverse study.

weights1 = [13, 21, 31, 40, 33, 49, 53, 68, 89, 105]

weights2 = [16, 15, 19, 22, 18, 18, 40, 68, 77]

print(“The minimum weight of children above two years old is “, min(weights1), “ pounds.”)

print(“The maximum weight of childen below two years old is “, max(weights2), “ pounds.”)

Output:

The minimum weight of children above two years old is 13 pounds.

The maximum weight of children below two years old is 77 pounds.

Argmin and Argmax

Consider arbitrary oscillating functions on the interval [0, L]. The below code defines a two-dimensional array holding values of several functions for L=1 on a grid of N=100 points (rows) for n=1,2,⋯,5 (columns). In this case, argmax(axis=0) and argmin(axis=0) is used to calculate the position of the maximum and minimum in each column, respectively.

import numpy as np

import pylab

N = 100

L = 1

def f(i, n):

x = i * L / N

lam = 2*L/(n+1)

return x * (L-x) * np.sin(2*np.pi*x/lam)

a = np.fromfunction(f, (N+1, 5))

min_i = a.argmin(axis=0)

max_i = a.argmax(axis=0)

pylab.plot(a, c=’k’)

pylab.plot(min_i, a[min_i, np.arange(5)], ‘v’, c=’k’, markersize=10)

pylab.plot(max_i, a[max_i, np.arange(5)], ‘^’, c=’k’, markersize=10)

pylab.xlabel(r’$x$’)

pylab.ylabel(r’$f_n(x)$’)

pylab.show())

Output:

Cumsum Functions

Cumulative sum, simply called as running totals, may be used when calculating a balance. In the

Presence of new transactions (either deposits or withdrawals), the cumulative sum is refreshed,

and the current balance is displayed. Shown below is a sample balance table.

In the table above, we see that the first transaction occurred on 01 Dec 2020–12–01, an inflow

of $5,000. The starting balance was $5,000. On 03 Dec 2020, the client had withdrawn $50, and so

the balance decreased to $4,950, and so on. In this manner, the cumulative sum calculates the

current account balance. It is the cumulative sum of all of the transactions associated with that

account. With each new transaction, the balance is updated, that is, the cumulative sum is recalculated.

import numpy as np

transactions = [5000, -50, -125, -185, -142, -350, -560, -80, -15]

currBalance = np.cumsum(transactions)

for amount in currBalance:

print(“Current balance is”, amount)

Output:

Current balance is 5000

Current balance is 4950

Current balance is 4825

Current balance is 4640

Current balance is 4498

Current balance is 4148

Current balance is 3588

Current balance is 3508

Current balance is 3493

References

Argmax and argmin. (n.d.). Retrieved September 15, 2021, from

https://scipython.com/book/chapter-6-numpy/examples/argmax-and-argmin/.

Ilic, M. (2021, March 19). 7 real-life situations when you need a running total and how to

compute it in SQL. LearnSQL.com. Retrieved September 15, 2021, from

https://learnsql.com/blog/running-total-sql/.

NEDARC – The National EMSC Data Analysis Resource Center, U. of U. (n.d.). Standard

deviation. NEDARC. Retrieved September 15, 2021, from

https://www.nedarc.org/statisticalhelp/basicStatistics/standardDeviation.html.

Sum

Output

Arithmetic Mean Function

Output

Std, var Standard Deviation, Variance

Output:

Min, Max Minimum, and Maximum

Output:

Argmin and Argmax

Output:

Cumsum Functions

Output:

References

By Hanna Robinson