Why doesn't my pd.dataframe see the columns in my csv?-CodePudding

I seem to be running into errors when I try to load a CSV file into a dataframe and then do data analysis on it.

I am running into trouble creating a simple plot using the columns as the data points.

df.{column name} isn't working.

The code:

import pandas as pd

#column_names = ['area', 'bedrooms', 'age', 'price', 'Unnamed']
df = pd.read_csv("testfile.csv")
print(df)
df = df.loc[:, ~df.columns.str.contains('^Unnamed')] # rRemove NAN column
print(df)

print(df.bedrooms.median())

The error:

'DataFrame' object has no attribute 'bedrooms'

The CSV file:

area, bedrooms, age, price,
2600, 3,        20,  550000,
3000, 4,        15,  565000,
3200, 0,        18,  610000,
3600, 3,        30,  595000,
4000, 5,        8,   760000,

CodePudding user response：

You columns have whitespace around them:

>>> df.columns
Index(['area', ' bedrooms', ' age', ' price'], dtype='object')
                ^            ^       ^

You can remove it using .str.strip() (just like you'd do with a normal Series):

df.columns = df.columns.str.strip()

Output:

>>> print(df.bedrooms.median())
3.0

You can also correct the CSV file, by removing all whitespace before/after the commas in the header of the CSV:

area,bedrooms,age,price,
2600, 3,        20,  550000,
3000, 4,        15,  565000,
3200, 0,        18,  610000,
3600, 3,        30,  595000,
4000, 5,        8,   760000,