I seem to be running into errors when I try to load a CSV file into a dataframe and then do data analysis on it.
I am running into trouble creating a simple plot using the columns as the data points.
df.{column name}
isn't working.
The code:
import pandas as pd
#column_names = ['area', 'bedrooms', 'age', 'price', 'Unnamed']
df = pd.read_csv("testfile.csv")
print(df)
df = df.loc[:, ~df.columns.str.contains('^Unnamed')] # rRemove NAN column
print(df)
print(df.bedrooms.median())
The error:
'DataFrame' object has no attribute 'bedrooms'
The CSV file:
area, bedrooms, age, price,
2600, 3, 20, 550000,
3000, 4, 15, 565000,
3200, 0, 18, 610000,
3600, 3, 30, 595000,
4000, 5, 8, 760000,
CodePudding user response:
You columns have whitespace around them:
>>> df.columns
Index(['area', ' bedrooms', ' age', ' price'], dtype='object')
^ ^ ^
You can remove it using .str.strip()
(just like you'd do with a normal Series):
df.columns = df.columns.str.strip()
Output:
>>> print(df.bedrooms.median())
3.0
You can also correct the CSV file, by removing all whitespace before/after the commas in the header of the CSV:
area,bedrooms,age,price,
2600, 3, 20, 550000,
3000, 4, 15, 565000,
3200, 0, 18, 610000,
3600, 3, 30, 595000,
4000, 5, 8, 760000,