How to fix 'KeyError' when reading a .csv file in python?-CodePudding

I have the following code:

import pandas as pd
import matplotlib as mlp
import matplotlib.pyplot as plt
import csv


df = pd.read_csv (r'C:\Users\User\Desktop\Dataset2.csv', index_col=0) 
print (df)

dataframe1 = df.sort_values('ums', ascending = False)

fig = plt.figure(figsize=(20,5))

ax1 = fig.add_subplot(1,2,1)
ax2 = fig.add_subplot(1,2,2)

ax1.bar(dataframe1.index.dataframe1['ums'])
ax1.set_xticklabels(dataframe1.index, rotation=60, horizontalalignment = 'right', fontsize = '12')

ax1.set_title('Title', fontsize = '22')
ax1.set_ylabel('Text')

plt.show()

It should read the .csv file named "Dataset2" but every time I execute the code I keep getting "Exception has occurred: KeyError 'ums' File "C:\Users\User\Desktop\datafile2.py", line 10, in dataframe1 = df.sort_values('ums', ascending = False)".

My column in the .csv file has exactly the same name. Here is how the first lines of my file look like:

nr  port    country      ums
1   Port1   Australia    47.03
2   Port2   USA          37.47

What can I do to fix this? Any help is appreciated.

CodePudding user response：

I can't reproduce your issue

this is my code, I just simplified your matplotlib codes

import pandas as pd
import matplotlib.pyplot as plt


df = pd.read_csv('Dataset2.csv')
print(df)

dataframe1 = df.sort_values('ums', ascending=False)
names = dataframe1['port']
values = dataframe1['ums']

plt.figure(figsize=(20, 5))
plt.plot(names, values)
plt.show()

Maybe because you're using raw string when defining the dataset path?

try to remove the r and the index_col, as the default pandas read_csv will treat the first row as the header

df = pd.read_csv('Dataset2.csv')

this is the result from my code

CodePudding user response：

Usually csv file use comma , as default to seperate the names, so the content of your file should be like:

nr,port,country,ums
1,Port1,Australia,47.03
2,Port2,USA,37.47

Or specify the separator explicitly as commented by @wwii:

pd.read_csv(r'C:\Users\User\Desktop\Dataset2.csv', index_col=0, sep='\s ')

CodePudding user response：

Have you tried specifying in the following syntax:

df.sort_values(by=['ums'], ascending=False)

Additionally, if that doesn't work, try removing the index_col=0 to see if that makes a difference.