Home > Software engineering >  how do i add dummy values to my dataframe (is this how you say it?)
how do i add dummy values to my dataframe (is this how you say it?)

Time:02-10

A quick summary of what I'm trying to do: I'm trying to retrieve sales data from a CSV file, put them in a dataframe, and make a visualisation from them.

My issue is, for the year 2014, only months November and December are present. And for 2015, all the months are present. So when i make the visualisation, there is an error with the dimensions.

I tried to solve this by making new lists and adding 0's to imply that there were no sales in the previous months, and clearly that did not work.

(I'm new to making graphs with python and the graph I'm making is a line chart showing the sales in each month in both years)

# Retrieve data from each year
month = ['January', 'February', 'March', 'April', 'May', 'June', 
         'July', 'August', 'September', 'October', 'November', 'December']

#2014
#only 2 months instead of 12 as shown above
#month2018 = ['November', 'December']
revenue2014 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]
profits2014 = [0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

year2014 = df[df['order_year'] == 2014]
temp1 = year2014.groupby('order_month')['unitprice_in_usd'].sum().round(decimals = 2)
cost2014 = year2014.groupby('order_month')['unitcost_in_usd'].sum()
temp2 = (temp1 - cost2014).round(decimals = 2)
revenue2014.append(temp1)
profits2014.append(temp2)

#2015
year2015 = df[df['order_year'] == 2015]
revenue2015 = year2015.groupby('order_month')['unitprice_in_usd'].sum().round(decimals = 2)
cost2015 = year2015.groupby('order_month')['unitcost_in_usd'].sum()
profits2015 = (revenue2015 - cost2015).round(decimals = 2)

If you wanna see my code for making the graphs...

fig, axes = plt.subplots(nrows = 1, ncols = 1, figsize=(15, 5))

axes.plot(month, revenue2014, c = 'r', label = '2014')
axes.plot(month, revenue2015, c = 'g', label = '2015')

axes.set_title('Revenue In Each Month From Each Year', fontsize = 20);

axes.set_xlabel('Months', fontsize = 15)
axes.set_ylabel('Revenue ($)', fontsize = 15)

axes.tick_params(axis = 'x', labelsize = 10)
axes.tick_params(axis = 'y', labelsize = 10)

axes.set_xlim(left = -1, right = 12)

axes.grid(c = 'r', alpha = .2, linestyle = '--')

axes.legend(loc = (1.02, 0), borderaxespad = 0, fontsize = 20)

fig.tight_layout()

plt.show()

I appreciate any and all suggestions :))

CodePudding user response:

You can convert column order_month to categorical with all months, so if aggregate sum get all months and for missing values 0:

df['order_month'] = pd.Categorical(df['order_month'], categories=month, ordered=True)
  • Related