I have two dataframes, just like below.
Dataframe1:
country | type | start_week | end_week |
---|---|---|---|
1 | a | 12 | 13 |
2 | b | 13 | 14 |
Dataframe2:
country | type | week | value |
---|---|---|---|
1 | a | 12 | 1000 |
1 | a | 13 | 900 |
1 | a | 14 | 800 |
2 | b | 12 | 1000 |
2 | b | 13 | 900 |
2 | b | 14 | 800 |
I want to add to the first dataframe column with the mean value from the second dataframe for key (country type) and between start_week and end_week.
I want desired output to look like the below:
country | type | start_week | end_week | avg |
---|---|---|---|---|
1 | a | 12 | 13 | 950 |
2 | b | 13 | 14 | 850 |
CodePudding user response:
here is one way :
combined = df1.merge(df2 , on =['country','type'])
combined = combined.loc[(combined.start_week <= combined.week) & (combined.week <= combined.end_week)]
output = combined.groupby(['country','type','start_week','end_week'])['value'].mean().reset_index()
output:
>>
country type start_week end_week value
0 1 a 12 13 950.0
1 2 b 13 14 850.0
CodePudding user response:
You can use pd.melt and comparison of numpy arrays.
# melt df1
melted_df1 = df1.melt(id_vars=['country','type'],value_name='week')[['country','type','week']]
# for loop to compare two dataframe arrays
result = []
for i in df2.values:
for j in melted_df1.values:
if (j == i[:3]).all():
result.append(i)
break
# Computing mean of the result dataframe
result_df = pd.DataFrame(result,columns=df2.columns).groupby('type').mean().reset_index()['value']
# Assigning result_df to df1
df1['avg'] = result_df
country type start_week end_week avg
0 1 a 12 13 950.0
1 2 b 13 14 850.0