I want this matrix as outcome:
int = {
"vendor":['A','B','C','D','E'],
"country":['Spain','Spain','Germany','Italy','Italy'],
"yeardum":['2015','2020','2014','2016','2019'],
"sales_year_data":['15','205','24','920','1310'],
"country_image_data":['2','5','-6','7','-1'],
}
df_inv = pd.DataFrame(int)
The data of colum "sales_year_data" in df_inv come df1=
sales_year_data = {
"country":['Spain','France','Germany','Belgium','Italy'],
"2014":['45','202','24','216','219'],
"2015":['15','55','214','2016','209'],
"2016":['615','2333','205','207','920'],
"2017":['1215','255','234','2116','101'],
"2018":['415','1320','214','2516','2019'],
"2019":['215','220','5614','416','1310'],
"2020":['205','202','44','296','2011'],
}
df1 = pd.DataFrame(sales_year_data)
As you can see in the column "sales_year_data" of df_inv, the number 15 is the intersection in df1 between year 2015 and Spain, the number 205 is in the intersection between Spain and 2020, 24 is in the intersection between Germany and 2014 and so on.
Data of colum "country_image_data" in df_inv comes from df2
country_change_data = {
"country":['Spain','Spain','Germany','Italy','Italy'],
"2014":['4','2','-6','6','9'],
"2015":['2','5','-5','2','3'],
"2016":['5','3','5','7','9'],
"2017":['8','7','5','6','1'],
"2018":['5','1','4','6','2'],
"2019":['1','2','4','6','-1'],
"2020":['5','2','4','6','2'],
}
df2 = pd.DataFrame(country_change_data)
As you can see in the column "country_change_data" of df_inv, the number 2 is the intersection in df2 between year 2015 and Spain, the number 5 is in the intersection between Spain and 2020, -6 is in the intersection between Germany and 2014 and so on.
If my original dataframe is:
inv = {
"vendor":['A','B','C','D','E'],
"country":['Spain','Spain','Germany','Italy','Italy'],
"yeardum":['2015','2020','2014','2016','2019'],
}
df0 = pd.DataFrame(inv)
How could I automate the search across various df1 and df2 in the intersections of interest for building df_inv departing prom df0?
CodePudding user response:
This does it.
sales_counters = {}
country_counters = {}
new_df_data = []
for row in df0.iloc:
c = row['country']
y = row['yeardum']
sales_idx = sales_counters[c] = sales_counters.get(c, -1) 1
country_idx = country_counters[c] = country_counters.get(c, -1) 1
d1 = df1[df1['country'] == c]
d2 = df2[df2['country'] == c]
sales_year = d1.iloc[min(sales_idx, d1.shape[0]-1)][y]
country_image = d2.iloc[min(country_idx, d2.shape[0]-1)][y]
new_df_data.append([sales_year, country_image])
df0 = pd.concat([df0, pd.DataFrame(new_df_data)], axis=1).rename({0: 'sales_year_data', 1: 'country_image_data'}, axis=1)
Test:
>>> df0
vendor country yeardum sales_year_data country_image_data
0 A Spain 2015 15 2
1 B Spain 2020 205 2
2 C Germany 2014 24 -6
3 D Italy 2016 920 7
4 E Italy 2019 1310 -1