How to calculate the expectation value for a given probability distribution-CodePudding

I am writing a program to determine the expectation value, expectation of the X^2 and E(X - X_avg)^2. I have written a program like so:

# program : expectation value
import csv
import pandas as pd
import numpy as np 
from scipy.stats import chi2_contingency

import seaborn as sns
import matplotlib.pyplot as plt
import logging 
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(message)s')

# Step 1: read csv
probabilityCSV       = open('probability.csv')
df      = pd.read_csv(probabilityCSV) 
logging.debug(df['X'])
logging.debug(df['P'])
logging.debug(type(df['X']))
logging.debug(type(df['P']))

# Step 2: convert dataframe to ndarry
# https://stackoverflow.com/questions/13187778/convert-pandas-dataframe-to-numpy-array
X = df['X'].to_numpy()
p = df['P'].to_numpy()
logging.debug(f'X={X}')
logging.debug(f'p={p}')

# Step 3: calculate E(X)
# https://www.statology.org/expected-value-in-python/
def expected_value(values, weights):
    return np.sum((np.dot(values,weights))) / np.sum(weights)

logging.debug('Step 3: calculate E(X)')
expectation = expected_value(X,p)
logging.debug(f'E(X)={expectation}')


# Step 4: calculate E(X^2)
logging.debug('Step 4: calculate E(X^2)')
# add normalize='index'
contingency_pct = pd.crosstab(df['Observed'],df['Expected'],normalize='index')
logging.debug(f'contingency_pct:{contingency_pct}')


# Step 5: calculate E(X - X_avg)^2
logging.debug('Step 5: calculate E(X - X_avg)^2')

The dataset that I am using is

X,P
8,1/8
12,1/6
16,3/8
20,1/4
24,1/12

I will appreciate your help.There is never enough time, thank you for yours. Thank you for your integrity. Thank you for your humility. Thank you for your presence.

V.R.

E. M. Gertis

Vivre sans temps mort. (Live without wasted time.)

www.evan-gertis.com

Expected:

E(X) = 16 E(X^2) = 276 E(X- X_avg)^2 =20

Actual:

Traceback (most recent call last):
  File "/Users/evangertis/development/PythonAutomation/Statistics/expectation.py", line 35, in <module>
    expectation = expected_value(X,p)
  File "/Users/evangertis/development/PythonAutomation/Statistics/expectation.py", line 32, in expected_value
    return np.sum((np.dot(values,weights))) / np.sum(weights)
  File "<__array_function__ internals>", line 5, in sum
  File "/usr/local/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 2259, in sum
    return _wrapreduction(a, np.add, 'sum', axis, dtype, out, keepdims=keepdims,
  File "/usr/local/lib/python3.9/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
TypeError: cannot perform reduce with flexible type

CodePudding user response：

Your problem is the step 1, so I took the liberty of rewriting it:

# Step 1.1: read csv in the right way
probabilityCSV = open('probability.csv')
df = pd.read_csv(probabilityCSV)
df["P"] = df.P.str.split("/", expand=True)[0].astype(int) / df.P.str.split("/", expand=True)[1].astype(int)

df:

    X   P
0   8   0.125000
1   12  0.166667
2   16  0.375000
3   20  0.250000
4   24  0.083333

The second step is right:

# Step 2: convert dataframe to ndarry
X = df['X'].to_numpy()
p = df['P'].to_numpy()

X, p:

(array([ 8, 12, 16, 20, 24]),
 array([0.125     , 0.16666667, 0.375     , 0.25      , 0.08333333]))

After this you correctly defined the function:

def expected_value(values, weights):
    return np.sum((np.dot(values,weights))) / np.sum(weights)

You can use this function to compute E(X), E(X^2) and E(X - X_avg)^2. In particular:

expected_value(X,p)
# returns E(X) = 16.0

expected_value(X**2, p)
# returns E(X^2) = 276.0

expected_value((X-X.mean())**2, p)
# returns E(X - X_avg)^2 = 20.0

The error has occurred because your df["P"] column is a string column.