I have a dataset (df3) with five columns x, y, r, g and b, although I only need to work with x, y and r. I want to find the average of all the consecutive rows in which the value of r is equal and store it in a database (df_final). To do this, I have generated a code that stores all the values in which r is equal to the one in previous row in a temporary database (df_inter), to later store the average of all the values in the final database (df_final). The code is this one:
for i in range(len(df3)):
if df3.iloc[i,3] == df3.iloc[i-1,3]:
df_inter = pd.DataFrame(columns=['x','y', 'r'])
df_inter.append(df3.iloc[i,1],df3.iloc[i,2],df3.iloc[i,3])
df_inter.to_csv(f'Resultados/df_inter.csv', index=False, sep=',')
else:
df_final.append(df_inter['x'].mean(),df_inter['y'].mean(),df_inter['r'].mean())
del [[df_inter]]
gc.collect()
df_inter=pd.DataFrame()
df_inter = pd.DataFrame(columns=['x','y', 'r'])
df_inter.append(df3.iloc[i,1],df3.iloc[i,2],df3.iloc[i,3])
df_final.to_csv(f'Resultados/df_final.csv', index=False, sep=',')
Nevertheless, when I execute the code I get this error message:
TypeError: cannot concatenate object of type '<class 'numpy.int64'>'; only Series and DataFrame objs are valid
I'm not sure what the problem is or even if there is a code more efficient for the purpose. Please, I would be grateful if you could help me. Thank you in advance.
Irene
CodePudding user response:
If you have some knowledge of SQL
, it can be intuitively done using sqldf
and pandas
:
import sqldf
import pandas as pd
df = pd.DataFrame({"class":[1,1,1,2,2,2,1,2,2,1],"value":[10,10,12,11,15,17,98,23,22,0]})
averages = sqldf.run("""
SELECT class,AVG(value)
FROM df
GROUP BY class
""")
The output being:
class AVG(value)
0 1 26.0
1 2 17.6
Is that what you are looking for ?
CodePudding user response:
You may want to append to the end of the dataframe using
df_inter = df_inter.append({'x':df3.iloc[i,1],'y':df3.iloc[i,2],'r':df3.iloc[i,3]}, ignore_index=True)