Sum
Suppose our dataset comprises of scores of a student for each item in a certain exam, which can be represented by a simple, one-dimensional array. The sum function allows us to obtain the total score of the students in the exam. The example code below shows the score of a student on a 10-item exam, where each item is worth 5 points.
# 1D array
studentScores = [2,4,5,3,2,4,4,4,5,1]
print(“\nTotal grade : “, np.sum(studentScores), “ out of 50.”)
Output
Total grade : 34 out of 50
Arithmetic Mean Function
Consider two sample datasets of height of tomato plants (measured in centimeters) below.
86 | 109 | 96 | 79 | 90 |
163 | 156 | 135 | 178 | 144 |
The arithmetic mean of each dataset, which is equivalent to the sum of data divided by the number of data points, tells the average height of the tomato plants in a certain group. Comparing the average weight of the first group (92) to that of the second group (155.2) may, in some study, tell the researchers that tomato plants subjected to a specific concentration of a natural growth enhancer (which is referred to by the second dataset) may yield better and healthier plants in terms of plant height.
controlGroup = [86, 109, 96, 79, 90]
treatmentGroup = [163, 156, 135, 178, 144]
ave1 = statistics.mean(controlGroup)
ave2 = statistics.mean(treatmentGroup)
print(“The average height of tomato plants in the control group is :”, ave1)
print(“The average height of tomato plants in the treatment group is :”, ave2)
Output
The average height of tomato plants in the control group is : 92The average height of tomato plants in the treatment group is : 155.2
Std, var Standard Deviation, Variance
Consider two sample datasets of children weight (measured in pounds) below.
12 | 18 | 31 | 39 | 20 |
14 | 56 | 23 | 17 | 10 |
In the first child weight dataset above, the data ranges from 12 to 39, with a mean of 24. The standard deviation is 10.8. The second dataset, albeit having the same mean of 24, has data ranging from 10 to 56 and a standard deviation of 18.5. Comparing the two standard deviations implies that the data in the first dataset is much more “neighboring” and less dispersed than the data in the second dataset.
weights1 = [12, 18, 31, 39, 20]
weights2 = [14, 56, 23, 17, 10]
print(“Standard deviation of first dataset is “, statistics.stdev(weights1))
print(“Standard deviation of second dataset is “, statistics.stdev(weights2))
Output:
Standard deviation of the first dataset is 10.8397416943394Standard deviation of the second dataset is 18.506755523321747
Min, Max Minimum, and Maximum
Consider two different datasets of children weight (measured in pounds) below.
13 | 21 | 31 | 40 | 33 |
49 | 53 | 68 | 89 | 105 |
It is easy to determine the min and max for our data are 13 and 105, respectively. The min and max functions allow for filtering unwanted data. Suppose a study is focused on children over the age of two years old. By identifying the minimum of 13 pounds, the researcher may be concerned that they have unknowingly gotten an infant in their study. Upon investigation, the researcher may contemplate what should be done with such data as they proceed with the data analyses. Note that the maximum may also be applicable in an reverse study.
weights1 = [13, 21, 31, 40, 33, 49, 53, 68, 89, 105]
weights2 = [16, 15, 19, 22, 18, 18, 40, 68, 77]
print(“The minimum weight of children above two years old is “, min(weights1), “ pounds.”)
print(“The maximum weight of childen below two years old is “, max(weights2), “ pounds.”)
Output:
The minimum weight of children above two years old is 13 pounds.
The maximum weight of children below two years old is 77 pounds.
Argmin and Argmax
Consider arbitrary oscillating functions on the interval [0, L]. The below code defines a two-dimensional array holding values of several functions for L=1 on a grid of N=100 points (rows) for n=1,2,⋯,5 (columns). In this case, argmax(axis=0) and argmin(axis=0) is used to calculate the position of the maximum and minimum in each column, respectively.
import numpy as np
import pylab
N = 100
L = 1
def f(i, n):
x = i * L / N
lam = 2*L/(n+1)
return x * (L-x) * np.sin(2*np.pi*x/lam)
a = np.fromfunction(f, (N+1, 5))
min_i = a.argmin(axis=0)
max_i = a.argmax(axis=0)
pylab.plot(a, c=’k’)
pylab.plot(min_i, a[min_i, np.arange(5)], ‘v’, c=’k’, markersize=10)
pylab.plot(max_i, a[max_i, np.arange(5)], ‘^’, c=’k’, markersize=10)
pylab.xlabel(r’$x$’)
pylab.ylabel(r’$f_n(x)$’)
pylab.show())
Output:
Cumsum Functions
Cumulative sum, simply called as running totals, may be used when calculating a balance. In the
Presence of new transactions (either deposits or withdrawals), the cumulative sum is refreshed, and the current balance is displayed. Shown below is a sample balance table.
In the table above, we see that the first transaction occurred on 01 Dec 2020–12–01, an inflow of $5,000. The starting balance was $5,000. On 03 Dec 2020, the client had withdrawn $50, and so the balance decreased to $4,950, and so on. In this manner, the cumulative sum calculates the current account balance. It is the cumulative sum of all of the transactions associated with that account. With each new transaction, the balance is updated, that is, the cumulative sum is recalculated. import numpy as np transactions = [5000, -50, -125, -185, -142, -350, -560, -80, -15] currBalance = np.cumsum(transactions) for amount in currBalance: print(“Current balance is”, amount) |
Output:
Current balance is 5000
Current balance is 4950
Current balance is 4825
Current balance is 4640
Current balance is 4498
Current balance is 4148
Current balance is 3588
Current balance is 3508
Current balance is 3493
References
Argmax and argmin. (n.d.). Retrieved September 15, 2021, from
https://scipython.com/book/chapter-6-numpy/examples/argmax-and-argmin/.
Ilic, M. (2021, March 19). 7 real-life situations when you need a running total and how to
compute it in SQL. LearnSQL.com. Retrieved September 15, 2021, from
https://learnsql.com/blog/running-total-sql/.
NEDARC – The National EMSC Data Analysis Resource Center, U. of U. (n.d.). Standard
deviation. NEDARC. Retrieved September 15, 2021, from
https://www.nedarc.org/statisticalhelp/basicStatistics/standardDeviation.html.