Home > Blockchain >  Pandas: finding mean in the dataframe based on condition included in another
Pandas: finding mean in the dataframe based on condition included in another

Time:09-20

I have two dataframes, just like below.

Dataframe1:

country type start_week end_week
1 a 12 13
2 b 13 14

Dataframe2:

country type week value
1 a 12 1000
1 a 13 900
1 a 14 800
2 b 12 1000
2 b 13 900
2 b 14 800

I want to add to the first dataframe column with the mean value from the second dataframe for key (country type) and between start_week and end_week.

I want desired output to look like the below:

country type start_week end_week avg
1 a 12 13 950
2 b 13 14 850

CodePudding user response:

here is one way :

combined = df1.merge(df2 , on =['country','type'])
combined = combined.loc[(combined.start_week <= combined.week) & (combined.week <= combined.end_week)]
output = combined.groupby(['country','type','start_week','end_week'])['value'].mean().reset_index()

output:

>>
   country type  start_week  end_week  value
0        1    a          12        13  950.0
1        2    b          13        14  850.0

CodePudding user response:

You can use pd.melt and comparison of numpy arrays.

# melt df1
melted_df1 = df1.melt(id_vars=['country','type'],value_name='week')[['country','type','week']]

# for loop to compare two dataframe arrays
result = []
for i in df2.values:
    for j in melted_df1.values:
        if (j == i[:3]).all():
            result.append(i)
            break

# Computing mean of the result dataframe
result_df = pd.DataFrame(result,columns=df2.columns).groupby('type').mean().reset_index()['value']

# Assigning result_df to df1
df1['avg'] = result_df

   country type  start_week  end_week    avg
0        1    a          12        13  950.0
1        2    b          13        14  850.0
  • Related