My question is very similar to this question: "Reverse" statistics. They, however, want to create a normal (and random) distribution of arbitrary size that fits a certain mean and standard deviation. Let's say, however, that we know more than the mean and std. deviation, that we know the number of data points as well as the discrete scale the values fall on.
So I really have two questions. First, given we know,
- mean
- standard deviation
- n
- discrete scale of 1 to 5 (i.e., values can only be 1, 2, 3, 4, or 5)
...is it possible to know the exact dataset? For example, if we know that there are 5 data points on a 1–5 Likert scale, and the mean is 4.40 and the standard deviation is 1.20, is it possible to figure out that the data set is {5, 5, 5, 5, 2} (order of values not being important)?
Second, is there a function already out there to automatically solve this problem?
CodePudding user response:
Since the data sets I am trying to identify are fairly small (n < 30), a friend of mine suggested using itertools
combinations_with_replacement()
to generate all possible data sets and then write a matching function given my parameters.
Here's the final code.
from itertools import combinations_with_replacement
from statistics import pstdev
# Function to figure out exact data set given we know:
## N is integer (number of data points)
## SIGMA is float (standard deviation)
## MU is float (mean)
## discreteScale is list (all possible values of data points)
#### Function returns list containing tuple(s) (possible sets of data points that match criteria)
def find_DataSet(N, MU, SIGMA, discreteScale):
possibleCombs = combinations_with_replacement(discreteScale, N)
container = []
for dataSet in possibleCombs:
mu = sum(dataSet)/len(dataSet)
roundMu = round(mu, 2)
sigma = pstdev(dataSet)
roundSigma = round(sigma, 2)
if ((roundMu == MU) and (roundSigma == SIGMA)):
container.append(dataSet)
return container
Example Output:
result = find_DataSet(20, 4.50, 0.81, [1, 2, 3, 4, 5])
print(result)
[(2, 3, 4, 4, 4, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5), (3, 3, 3, 3, 4, 4, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5)]