I have a problem. I have a dataframe (please see below). That contains prices and dates. I want to show the avg price of every month. How could I do this? I tried that, but I got the following error KeyError: 'month'
. How could I plot a chart like below?
listing_id date price month year
0 1 2021-09-07 79.00 9 2021
1 2 2021-08-07 80.00 8 2021
2 3 2021-06-07 90.00 6 2021
3 4 2021-06-05 20.00 6 2021
d = {'listing_id': [1, 2, 3, 4],
'date': ['2021-09-07', '2021-08-07', '2021-06-07', '2021-06-05'],
'price': ['$79.00', '$80.00', '$90.00', '$20.00']}
df = pd.DataFrame(data=d)
df['price'] = df['price'].str.replace('$', '', regex=False)
df['date'] = pd.to_datetime(df['date'])
df['month'], df['year'] = df.date.dt.month, df.date.dt.year
print(df)
x = df['month'].unique()
y = df.groupby('date').avg()[['price']]
plt.plot(x,y)
plt.show()
KeyError: 'month'
CodePudding user response:
You should convert the price column to a numerical type so that its mean can be calculated, and then perform a groupby month on df
.
As @Trenton pointed out, you should make a bar plot instead of a line plot.
df['price'] = df['price'].astype('float')
df.groupby('month')['price'].mean().plot(kind='bar', rot=0)
CodePudding user response:
Not sure what follows is answer to your question. Anyway ...
- Added some data for another year. In the question 'the avg price of every month' is not enough precise on how that should be taken in account
- Plot a bar chart, following Trenton McKinney. But it would be more adequate to introduce 0 values for missing months.
import pandas as pd
d = {'listing_id': [1, 2, 3, 4, 5, 6],
'date': ['2022-09-07', '2022-08-07',
'2021-09-07', '2021-08-07', '2021-06-07', '2021-06-05'],
'price': ['$79.00', '$85.00','$79.00', '$80.00', '$90.00', '$20.00']}
df = pd.DataFrame(data=d)
df['price'] = df['price'].str.replace('$', '', regex=False).astype(float)
df['date'] = pd.to_datetime(df['date'])
(df.groupby([df.date.dt.year, df.date.dt.month])['price']
.mean()
.plot(kind = 'bar',
rot = 0,
xlabel = '(year, month)',
ylabel = 'price')
)