Ive found the variance and the mean of the columns that contain a float. Below prints out just that, but I want it to print out as the following below. train_1 float columns ['Age', 'RestingBP'].
variance of Age is 0.03595686694781368 mean of age is: 0.4641151293344085 variance of RestingBP 0.006797712487953147 mean of RestingBP: 0.6499696048632219
How do I do that and can I save this as a set/list to later use it in a formula.
I want to be able to identify which mean and variance goes with each columns, so that later i can potentially multiply the values.
numerical = [var for var in train_1.columns if train_1[var].dtype=='float64']
for var in numerical:
variance1 = variance(train_1[var])
mean1 = statistics.mean(train_1[var])
print(f"variance {numerical}" , variance1)
print("mean:",mean1)
CodePudding user response:
One approach would be to generate a list of results containing a dictionary with the various values which you are interested in. You could achieve this with something like:
numerical = [var for var in train_1.columns if train_1[var].dtype=='float64']
results = [{'variance': (variance(train_1[var]),
'mean': statistics.mean(train_1[var])) } for var in numerical]
for result in results:
print(f'mean: {result["mean"]}')
print(f'variance: {result["variance"]}')
Note you could also do this in the initial list comprehension, but the example minimises changes.
CodePudding user response:
You have all what you need. I have just only slightly changed the f-string for printing and added a list collecting the results:
import pandas as pd
import statistics
train_1 = pd.DataFrame({'Age':[30.0, 40.0, 20.0, 15.0], 'RestingBP': [60.0, 70.0, 50.0, 80.0]})
numerical = [var for var in train_1.columns if train_1[var].dtype=='float64']
lst_results = []
for var in numerical:
variance = statistics.variance(train_1[var])
mean = statistics.mean(train_1[var])
lst_results.append( (var, variance, mean ) )
print(f"variance of {var} is: {variance} and mean of {var} is: {mean}")
print(f'{lst_results=}')
gives:
variance of Age is: 122.91666666666667 and mean of Age is: 26.25
variance of RestingBP is: 166.66666666666666 and mean of RestingBP is: 65.0
lst_results=[('Age', 122.91666666666667, 26.25), ('RestingBP', 166.66666666666666, 65.0)]
And if you want a nice dictionary for storing the results along with a nice print here a debugged and improved version from the another answer:
results = [{var: {'variance': statistics.variance(train_1[var]),
'mean': statistics.mean(train_1[var]) }}
for var in train_1.columns if train_1[var].dtype=='float64']
for result in results:
for column, calc in result.items():
print(column)
print(f' mean: {calc["mean"]}')
print(f' variance: {calc["variance"]}')
print(f'{results=}')
giving:
Age
mean: 26.25
variance: 122.91666666666667
RestingBP
mean: 65.0
variance: 166.66666666666666
results=[{'Age': {'variance': 122.91666666666667, 'mean': 26.25}}, {'RestingBP': {'variance': 166.66666666666666, 'mean': 65.0}}]