Home > Blockchain >  Adding columns to a data frame with data coming from multiple dataframes
Adding columns to a data frame with data coming from multiple dataframes

Time:11-12

I want this matrix as outcome:

int = {
"vendor":['A','B','C','D','E'],
"country":['Spain','Spain','Germany','Italy','Italy'],
"yeardum":['2015','2020','2014','2016','2019'],
"sales_year_data":['15','205','24','920','1310'],
"country_image_data":['2','5','-6','7','-1'],
}

df_inv = pd.DataFrame(int)

The data of colum "sales_year_data" in df_inv come df1=

    sales_year_data = {
"country":['Spain','France','Germany','Belgium','Italy'],
"2014":['45','202','24','216','219'],
"2015":['15','55','214','2016','209'],
"2016":['615','2333','205','207','920'],
"2017":['1215','255','234','2116','101'],
"2018":['415','1320','214','2516','2019'],
"2019":['215','220','5614','416','1310'],
"2020":['205','202','44','296','2011'],
}

df1 = pd.DataFrame(sales_year_data)

As you can see in the column "sales_year_data" of df_inv, the number 15 is the intersection in df1 between year 2015 and Spain, the number 205 is in the intersection between Spain and 2020, 24 is in the intersection between Germany and 2014 and so on.

Data of colum "country_image_data" in df_inv comes from df2

    country_change_data = {
"country":['Spain','Spain','Germany','Italy','Italy'],
"2014":['4','2','-6','6','9'],
"2015":['2','5','-5','2','3'],
"2016":['5','3','5','7','9'],
"2017":['8','7','5','6','1'],
"2018":['5','1','4','6','2'],
"2019":['1','2','4','6','-1'],
"2020":['5','2','4','6','2'],
}
df2 = pd.DataFrame(country_change_data)

As you can see in the column "country_change_data" of df_inv, the number 2 is the intersection in df2 between year 2015 and Spain, the number 5 is in the intersection between Spain and 2020, -6 is in the intersection between Germany and 2014 and so on.

If my original dataframe is:

inv = {
"vendor":['A','B','C','D','E'],
"country":['Spain','Spain','Germany','Italy','Italy'],
"yeardum":['2015','2020','2014','2016','2019'],
}

df0 = pd.DataFrame(inv)

How could I automate the search across various df1 and df2 in the intersections of interest for building df_inv departing prom df0?

CodePudding user response:

This does it.

sales_counters = {}
country_counters = {}
new_df_data = []

for row in df0.iloc:
    c = row['country']
    y = row['yeardum']
    sales_idx = sales_counters[c] = sales_counters.get(c, -1)   1
    country_idx = country_counters[c] = country_counters.get(c, -1)   1

    d1 = df1[df1['country'] == c]
    d2 = df2[df2['country'] == c]
    sales_year = d1.iloc[min(sales_idx, d1.shape[0]-1)][y]
    country_image = d2.iloc[min(country_idx, d2.shape[0]-1)][y]

    new_df_data.append([sales_year, country_image])

df0 = pd.concat([df0, pd.DataFrame(new_df_data)], axis=1).rename({0: 'sales_year_data', 1: 'country_image_data'}, axis=1)

Test:

>>> df0
  vendor  country yeardum sales_year_data country_image_data
0      A    Spain    2015              15                  2
1      B    Spain    2020             205                  2
2      C  Germany    2014              24                 -6
3      D    Italy    2016             920                  7
4      E    Italy    2019            1310                 -1
  • Related