Home > Software design >  Make a new dataset from from average of subsets of another data set with pandas
Make a new dataset from from average of subsets of another data set with pandas

Time:03-20

I have a data set (CSV file) with monthly benzine prices in Norway from 1986 to 2022. I want to make a new data frame with an average benzine price per year. Furthermore, I managed to do it, but is there a more efficient way to do it? A link to the CSV file is given. Here is what I did,

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
bensin=pd.read_csv("C:/Users/.../BensinPris.csv",sep=',', delimiter=';')
bensinC=bensin.copy()
#Replace comma with dot in bensin price
bensinC.apply(lambda x: x.str.replace(',','.'))
bensinC= bensinC.stack().str.replace(',','.').unstack()
#Convert prices to floats from strings
bensinC['Priser (kr per liter)']=[float(x) for x in bensinC['Priser (kr per liter)']]
# Make a list of average bensin price each year

average=[]
yrs=np.arange(1986,2023,dtype='int')
for i in yrs:
    av=bensinC[[x[0:4]==str(i) for x in bensinC['ÅrMåned']]]['Priser (kr per liter)'].mean()
    average.append(np.round(av,2))
# Make a data frame for average bensin price
zipped = list(zip(yrs,average))
benisin_average=pd.DataFrame(zipped,columns=['Year','Average bensin price'])
benisin_average
# ploting
X=benisin_average['Year']
y=benisin_average['Average bensin price']
plt.plot(X[:],y[:])
plt.xlabel('Time(Year)')
plt.ylabel('Average bensin price (Norwegian krone/Liter)')
plt.show()

enter image description here

  • Related