I have this very simple code that for some reason returns a key error when I select the second column to the last one. But for some reason the first one works as intended.
import csv
import pandas as pd
import sys
#terminal parameter:
csv_data = sys.argv[1]
df = pd.read_csv(csv_data)
print(df)
print("\n")
col_heading = str(input("Enter Column Heading: "))
value = str(input("Enter Lookup Value: "))
print(df.loc[df[col_heading] == value])
The CSV looks like this:
Last name, First name, SSN, Test1, Test2, Test3, Test4, Final, Grade
Alfalfa, Aloysius, 123-45-6789, 40.0, 90.0, 100.0, 83.0, 49.0, D-
Alfred, University, 123-12-1234, 41.0, 97.0, 96.0, 97.0, 48.0, D
Gerty, Gramma, 567-89-0123, 41.0, 80.0, 60.0, 40.0, 44.0, C
Android, Electric, 087-65-4321, 42.0, 23.0, 36.0, 45.0, 47.0, B-
Bumpkin, Fred, 456-78-9012, 43.0, 78.0, 88.0, 77.0, 45.0, A-
Rubble, Betty, 234-56-7890, 44.0, 90.0, 80.0, 90.0, 46.0, C-
Noshow, Cecil, 345-67-8901, 45.0, 11.0, -1.0, 4.0, 43.0, F
Buff, Bif, 632-79-9939, 46.0, 20.0, 30.0, 40.0, 50.0, B
Airpump, Andrew, 223-45-6789, 49.0 1.0, 90.0, 100.0, 83.0, A
Backus, Jim, 143-12-1234, 48.0, 1.0, 97.0, 96.0, 97.0, A
Carnivore, Art, 565-89-0123, 44.0, 1.0, 80.0, 60.0, 40.0, D
Dandy, Jim, 087-75-4321, 47.0, 1.0, 23.0, 36.0, 45.0, C
Elephant, Ima, 456-71-9012, 45.0, 1.0, 78.0, 88.0, 77.0, B-
Franklin, Benny, 234-56-2890, 50.0, 1.0, 90.0, 80.0, 90.0, B-
George, Boy, 345-67-3901, 40.0, 1.0, 11.0, -1.0, 4.0, B
Heffalump, Harvey, 632-79-9439, 30.0, 1.0, 20.0, 30.0, 40.0, C
This is the error message I get:
Traceback (most recent call last):
File "C:\Python310\lib\site-packages\pandas\core\indexes\base.py", line 3621, in get_loc
return self._engine.get_loc(casted_key)
File "pandas\_libs\index.pyx", line 136, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\index.pyx", line 163, in pandas._libs.index.IndexEngine.get_loc
File "pandas\_libs\hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas\_libs\hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'First name'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "C:\Users\krbss\Documents\PlanIT Testing Automation Testing\Challenge 7.py", line 18, in <module>
print(df.loc[df[col_heading] == value])
File "C:\Python310\lib\site-packages\pandas\core\frame.py", line 3505, in __getitem__
indexer = self.columns.get_loc(key)
File "C:\Python310\lib\site-packages\pandas\core\indexes\base.py", line 3623, in get_loc
raise KeyError(key) from err
KeyError: 'First name'
Any guidance on this would be appreciated.
CodePudding user response:
The columns and values in your dataframe are polluted with leading spaces, which you can verify with these print statements (just after loading the dataframe):
print("Columns: ")
print(df.columns)
print("Example data: ")
print(df[" First name"].values)
Output:
Columns:
Index(['Last name', ' First name', ' SSN', ' Test1', ' Test2', ' Test3',
' Test4', ' Final', ' Grade'],
dtype='object')
Example data:
[' Aloysius' ' University' ' Gramma' ' Electric' ' Fred'
' Betty' ' Cecil' ' Bif' ' Andrew' ' Jim' ' Art'
' Jim' ' Ima' ' Benny' ' Boy' ' Harvey']
Note that the first column 'Last name' has no leading spaces whereas the others do, which explains why you can could access the first column but not the others.
To fix this, you could tell pd.read_csv to skip initial spaces after the delimiter (see skipinitialspaces https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html)
df = pd.read_csv(csv_data, skipinitialspace=True)
Now the columns and data do not have unnecessary spaces and your code should work as intended.
Columns:
Index(['Last name', 'First name', 'SSN', 'Test1', 'Test2', 'Test3', 'Test4',
'Final', 'Grade'],
dtype='object')
Example data:
['Aloysius' 'University' 'Gramma' 'Electric' 'Fred' 'Betty' 'Cecil' 'Bif'
'Andrew' 'Jim' 'Art' 'Jim' 'Ima' 'Benny' 'Boy' 'Harvey']
You could also explicitly remove spaces using something like df.columns = [col.strip() for col in df.columns]
, which will fix your column names but you'd have to do the same for your data, so I think in this case setting skipinitialspaces=True
is the cleanest solution.
CodePudding user response:
You need to remove the leading and trailing spaces. Assuming you dont want to remove the spaces in between (like 'Last name' changed to 'Lastname'). I would recommend removing both leading and trailing spaces by using strip. This should be done for column names and the data. You can do that like this
df.columns = df.columns.str.strip() #For column names
df.columns = [col.strip() for col in df.columns] #For data in each column
Also change your print statment in the end thus to handle both float and strings
if df[col_heading].dtype == 'float64':
print("Float my boat")
print(df.loc[df[col_heading] == float(value)])
else:
print(df.loc[df[col_heading] == value])
This should enable you to print matching rows for both strings and float type