I am writing a program to determine the expectation value, expectation of the X^2 and E(X - X_avg)^2. I have written a program like so:
# program : expectation value
import csv
import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency
import seaborn as sns
import matplotlib.pyplot as plt
import logging
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(message)s')
# Step 1: read csv
probabilityCSV = open('probability.csv')
df = pd.read_csv(probabilityCSV)
logging.debug(df['X'])
logging.debug(df['P'])
logging.debug(type(df['X']))
logging.debug(type(df['P']))
# Step 2: convert dataframe to ndarry
# https://stackoverflow.com/questions/13187778/convert-pandas-dataframe-to-numpy-array
X = df['X'].to_numpy()
p = df['P'].to_numpy()
logging.debug(f'X={X}')
logging.debug(f'p={p}')
# Step 3: calculate E(X)
# https://www.statology.org/expected-value-in-python/
def expected_value(values, weights):
return np.sum((np.dot(values,weights))) / np.sum(weights)
logging.debug('Step 3: calculate E(X)')
expectation = expected_value(X,p)
logging.debug(f'E(X)={expectation}')
# Step 4: calculate E(X^2)
logging.debug('Step 4: calculate E(X^2)')
# add normalize='index'
contingency_pct = pd.crosstab(df['Observed'],df['Expected'],normalize='index')
logging.debug(f'contingency_pct:{contingency_pct}')
# Step 5: calculate E(X - X_avg)^2
logging.debug('Step 5: calculate E(X - X_avg)^2')
The dataset that I am using is
X,P
8,1/8
12,1/6
16,3/8
20,1/4
24,1/12
I will appreciate your help.There is never enough time, thank you for yours. Thank you for your integrity. Thank you for your humility. Thank you for your presence.
V.R.
E. M. Gertis
Vivre sans temps mort. (Live without wasted time.)
Expected:
E(X) = 16 E(X^2) = 276 E(X- X_avg)^2 =20
Actual:
Traceback (most recent call last):
File "/Users/evangertis/development/PythonAutomation/Statistics/expectation.py", line 35, in <module>
expectation = expected_value(X,p)
File "/Users/evangertis/development/PythonAutomation/Statistics/expectation.py", line 32, in expected_value
return np.sum((np.dot(values,weights))) / np.sum(weights)
File "<__array_function__ internals>", line 5, in sum
File "/usr/local/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 2259, in sum
return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
File "/usr/local/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
TypeError: cannot perform reduce with flexible type
CodePudding user response:
Your problem is the step 1, so I took the liberty of rewriting it:
# Step 1.1: read csv in the right way
probabilityCSV = open('probability.csv')
df = pd.read_csv(probabilityCSV)
df["P"] = df.P.str.split("/", expand=True)[0].astype(int) / df.P.str.split("/", expand=True)[1].astype(int)
df
:
X P
0 8 0.125000
1 12 0.166667
2 16 0.375000
3 20 0.250000
4 24 0.083333
The second step is right:
# Step 2: convert dataframe to ndarry
X = df['X'].to_numpy()
p = df['P'].to_numpy()
X, p
:
(array([ 8, 12, 16, 20, 24]),
array([0.125 , 0.16666667, 0.375 , 0.25 , 0.08333333]))
After this you correctly defined the function:
def expected_value(values, weights):
return np.sum((np.dot(values,weights))) / np.sum(weights)
You can use this function to compute E(X)
, E(X^2)
and E(X - X_avg)^2
. In particular:
expected_value(X,p)
# returns E(X) = 16.0
expected_value(X**2, p)
# returns E(X^2) = 276.0
expected_value((X-X.mean())**2, p)
# returns E(X - X_avg)^2 = 20.0
The error has occurred because your df["P"]
column is a string column.