I am writing a python program to calculate the chi-square value for a set of observed and expected frequencies. The program that I have constructed is written like so
# Author: Evan Gertis
# Date : 10/25
# program : quantile decile calculator
import logging
import coloredlogs
coloredlogs.install() # install a handler on the root logger
import csv
import logging
import coloredlogs
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.ticker import PercentFormatter
import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency
# import seaborn as sns
import matplotlib.pyplot as plt
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(message)s')
# Step 1: read csv
dicerollsCSV = open('dice_rolls.csv')
df = pd.read_csv(dicerollsCSV)
print(df.head())
# Implement steps from: https://predictivehacks.com/how-to-run-chi-square-test-in-python/
contigency= pd.crosstab(df['Observed'], df['Expected'])
contigency
# logging.debug(dicerollsDF['Observed'])
# logging.debug(dicerollsDF['Expected'])
# print(chisquare(dicerollsDF['Observed'],dicerollsDF['Expected']))
I am using https://predictivehacks.com/how-to-run-chi-square-test-in-python/ as a guide for completing this task. The specific dataset that I am using is
Observed, Expected
15, 13.9
35, 27.8
49, 41.7
58, 55.6
65, 69.5
76, 83.4
72, 69.5
60, 55.6
35, 41.7
29, 27.8
6, 13.9
Expected: chi-square value from the observed and expected frequencies
Actual
Traceback (most recent call last):
File "/Users/evangertis/development/PythonAutomation/Statistics/chi_square.py", line 30, in <module>
contigency= pd.crosstab(df['Observed'], df['Expected'])
File "/usr/local/lib/python3.9/site-packages/pandas/core/frame.py", line 3458, in __getitem__
indexer = self.columns.get_loc(key)
File "/usr/local/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
raise KeyError(key) from err
KeyError: 'Expected'
Any help with this problem would be greatly appreciated. Thank you!
CodePudding user response:
I believe your DF does not contains 'Expected' columns.
You can test it with the below code.
import pandas as pd
df = pd.DataFrame(columns = ['a','b'], data=[[1,2],[2,2]])
df['Expected']
You can observe the error is the same as yours.
CodePudding user response:
Expected
column name has a space in the beginning, so use df[' Expected']
or correct your csv.
And also you can read a csv into a pandas df just by giving the path
Ex: pd.read_csv('./test.csv')
If you want to see the column names, run df.columns