Home > Net >  map keywords value to add extra column to long dataframe
map keywords value to add extra column to long dataframe

Time:01-13

I have two dataframes:

`df_1_long = pd.DataFrame({
'company_name': ['Company A', 'Company B', 'Company C'],
'company_country': ['USA', 'Poland', 'Canada'],
'keyword': ['holding', 'services', 'source'],
'value': [1,0,1]
})`

and second df:

`df_training = pd.DataFrame({
 'holding': [1, 0, 0],
 'services': [0, 1, 0],
 'source': [0, 0, 1],
 'sector': ['Retail', 'Finance', 'Energy']
 })`

columns in df_training ['holding', 'services', 'source'] are the keywords in column 'keyword' in df_1_long. I would like to assign a sector to df_1_long - if 'keyword' in df_1_long value is 1 and in df_training value is 1 for the keyword in a column -> then assign a sector from df_training. The output should look like that:

`expected_output = pd.DataFrame({
'company_name': ['Company A', 'Company B', 'Company C'],
'company_country': ['USA', 'Poland', 'Canada'],
'keyword': ['holding', 'services', 'source'],
'value': [1,0,1],
'sector': ['Retail', 'no_sector', 'Energy']
})`

I tried this piece of code, but keep getting errors:

`merged_df = pd.merge(df_1_long, df_training, left_on='keyword',        right_on=df_training.columns[:-1])

`df_1_long['sector'] = merged_df['sector'].where(merged_df['value'] == 1, np.nan)``

CodePudding user response:

Use DataFrame.stack for reshape with filter 1 values and get dictionary from MultiIndex used for Series.map:

s = df_training.set_index('sector').stack().loc[lambda x: x.eq(1)].index.swaplevel()

m = df_1_long['value'].eq(1)
df_1_long['sector'] = np.where(m, df_1_long['keyword'].map(dict(s)), 'no_sector')

Another solution with DataFrame.melt, DataFrame.query and DataFrame.set_index:

d = df_training.melt('sector').query('value == 1').set_index('variable')['sector']


m = df_1_long['value'].eq(1)
df_1_long['sector'] = np.where(m, df_1_long['keyword'].map(d), 'no_sector')
print (df_1_long)
  company_name company_country   keyword  value     sector
0    Company A             USA   holding      1     Retail
1    Company B          Poland  services      0  no_sector
2    Company C          Canada    source      1     Energy

CodePudding user response:

You can use from_dummies to reshape and map those values:

mapper = dict(zip(pd.from_dummies(df_training.set_index('sector')).iloc[:, 0],
                  df_training['sector']))

df_1_long['sector'] = (df_1_long['keyword'].map(mapper)
                       .mask(df_1_long['value'].eq(0), 'no_sector')
                       )

Output:

  company_name company_country   keyword  value     sector
0    Company A             USA   holding      1     Retail
1    Company B          Poland  services      0  no_sector
2    Company C          Canada    source      1     Energy
  • Related