Home > Software design >  How to process a column from a dataframe in pandas
How to process a column from a dataframe in pandas

Time:10-28

I am writing a python program to calculate the chi-square value for a set of observed and expected frequencies. The program that I have constructed is written like so

# Author: Evan Gertis
# Date  : 10/25
# program : quantile decile calculator
import logging
import coloredlogs

coloredlogs.install()  # install a handler on the root logger

import csv
import logging 
import coloredlogs
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.ticker import PercentFormatter
import pandas as pd
import numpy as np
from scipy.stats import chi2_contingency
# import seaborn as sns
import matplotlib.pyplot as plt
logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(message)s')

# Step 1: read csv
dicerollsCSV       = open('dice_rolls.csv')
df      = pd.read_csv(dicerollsCSV) 
print(df.head())

# Implement steps from: https://predictivehacks.com/how-to-run-chi-square-test-in-python/
contigency= pd.crosstab(df['Observed'], df['Expected'])
contigency

# logging.debug(dicerollsDF['Observed'])
# logging.debug(dicerollsDF['Expected'])
# print(chisquare(dicerollsDF['Observed'],dicerollsDF['Expected']))

I am using https://predictivehacks.com/how-to-run-chi-square-test-in-python/ as a guide for completing this task. The specific dataset that I am using is

Observed, Expected
15, 13.9
35, 27.8
49, 41.7
58, 55.6
65, 69.5
76, 83.4
72, 69.5
60, 55.6
35, 41.7
29, 27.8
6, 13.9

Expected: chi-square value from the observed and expected frequencies

Actual

Traceback (most recent call last):
  File "/Users/evangertis/development/PythonAutomation/Statistics/chi_square.py", line 30, in <module>
    contigency= pd.crosstab(df['Observed'], df['Expected'])
  File "/usr/local/lib/python3.9/site-packages/pandas/core/frame.py", line 3458, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/usr/local/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'Expected'

Any help with this problem would be greatly appreciated. Thank you!

CodePudding user response:

I believe your DF does not contains 'Expected' columns.

You can test it with the below code.

import pandas as pd
df = pd.DataFrame(columns = ['a','b'], data=[[1,2],[2,2]])
df['Expected']

You can observe the error is the same as yours.

CodePudding user response:

Expected column name has a space in the beginning, so use df[' Expected'] or correct your csv. And also you can read a csv into a pandas df just by giving the path Ex: pd.read_csv('./test.csv') If you want to see the column names, run df.columns

  • Related