Getting KeyError for pandas df column name that exists-CodePudding

I have

data_combined = pd.read_csv("/path/to/creole_data/data_combined.csv", sep=";", encoding='cp1252')

So, when I try to access these rows:

data_combined = data_combined[(data_combined["wals_code"]=="abk") &(data_combined["wals_code"]=="aco")]

I get a KeyError 'wals_code'. I then checked my list of col names with

print(data_combined.columns.tolist())

and saw the col name 'wals_code' in the list. Here's the first few items from the print out.

[',"wals_code","Order of subject, object and verb","Order of genitive and noun","Order of adjective and noun","Order of adposition and NP","Order of demonstrative and noun","Order of numeral and noun","Order of RC and noun","Order of degree word and adjective"]

Anyone have a clue what is wrong with my file?

CodePudding user response：

The problem is the delimiter you're using when reading the CSV file. With sep=';', you instruct read_csv to use semicolons (;) as the separators for columns (cells and column headers), but it appears from your columns print out that your CSV file actually uses commas (,).

If you look carefully, you'll notice that your columns print out displays actually a list with one long string, not a list of individual strings representing the columns names.

So, use sep=',' instead of sep=';' (or just omit it entirely as , is the default value for sep):

data_combined = pd.read_csv("/path/to/creole_data/data_combined.csv", encoding='cp1252')