it's the situation: in a "for" loop i'm reading csv files as dataframe and add them to a main big df. if new data is not already in main df, it will be simply concat to main df. if new data has already a same value in main df, it has to sum with the pervious one.
an example:
main_df=
building, day, kw/h
1, 1, 50
1, 2, 55
2, 1, 30
2, 2, 40
new_df_1=
building, day, kw/h
3, 1, 55
3, 2, 58
new_df_2=
building, day, kw/h
2, 1, 15
2, 2, 19
2, 3, 14
new_df_2 will simply concat to main_df, but new_df_2 has to sum with existed data. So the desire answer will be:
building, day, kw/h
1, 1, 50
1, 2, 55
2, 1, 45
2, 2, 59
2, 3, 14
3, 1, 55
3, 2, 58
Question: 1. how can i check that, is there already same building and same day in main_df? 2. how can i sum the values of "KW/h" coulmns? i mean:
for csv in path_list:
read= pd.read_csv(csv)
if ( **already exist** ):
**sum with old values**
else:
main_df= pd.concat([main_df,read])
i was searching for some function like "merge" with indicator "on=" for automaticly find the right values and sum them. but i couldn't find any thing. Any idea would help.
CodePudding user response:
concat the three DF and then sum the kw/h on building and day
df=pd.concat([main_df, new_df1, new_df2])
df=df.groupby(['building','day'], as_index=False)['kw/h'].sum()
df
building day kw/h
0 1 1 50
1 1 2 55
2 2 1 45
3 2 2 59
4 2 3 14
5 3 1 55
6 3 2 58