Create binary columns out of data nested in another dfs columns-CodePudding

This one is weird --

let's say I have a df like this:

user_id     city    state   network
123         austin  tx      att
113         houston tx      tmobile
343         miami   fl      att

and I have another df1 like this:

col1
'network': 'att'
'city': 'austin'
'state': 'tx'

I'm trying to build a final_df like this:

user_id     is_network_att      is_city_austin      is_state_tx
123         1                   1                   1
113         0                   0                   1
343         1                   0                   0

Easier to just show it - but a sentence to describe it: I'm trying to create conditional/true-false columns out of df1.col1 in a new final_df that use df column's data.

Strategies I'm tying:

-throw the df1 columns in a list or dictionary and loop through each element and then somehow loop through each row and incorporate and if statement for each row

-maybe make a makeshift column in df1 of the exact code that would create the column in final_df and somehow use the text in this columnd as code

update: here's a handful of the rows i'm trying to put in the dictionary

Here's a handful of rows in that I'm trying to put in a dictionary:
912      'organization': 'atlantic metro communications'
913          'isp_name': 'Atlantic Metro Communications'
915                       'location_name': 'martinez ca'
917                       'location_name': 'martinez ca'
918                       'location_name': 'martinez ca'
919                       'location_name': 'martinez ca'
920                     'isp_name': 'Hurricane Electric'
922                 'organization': 'hurricane electric'
923                 'organization': 'hurricane electric'
924                     'isp_name': 'Hurricane Electric'
925                           'count_users_per_ip': 28.0
926      'organization': 'atlantic metro communications'
927          'isp_name': 'Atlantic Metro Communications'
928                     'isp_name': 'Hurricane Electric'
929                 'organization': 'hurricane electric'
930                     'isp_name': 'Hurricane Electric'
931                 'organization': 'hurricane electric'
932                    'location_name': 'hermosillo son'
933      'organization': 'atlantic metro communications'
934          'isp_name': 'Atlantic Metro Communications'
935                             'location_state': ' son'
966                           'count_users_per_ip': 28.0
1057                       'count_users_per_device': 4.0
1218                           'count_ips_per_user': 3.0
1408                    'moderated_action': 'SOFT_BLOCK'
1418                    'moderated_action': 'SOFT_BLOCK'
1430                    'moderated_action': 'SOFT_BLOCK'
1438                    'moderated_action': 'SOFT_BLOCK'
1517                            'app_build': '405000004'
1605                            'app_build': '405000004'

CodePudding user response：

Assuming that df1 contains strings, you can first merge them and convert to dictionary, then use it as a reference for comparison with eq:

from ast import literal_eval

# or use a different method to create the dictionary
dic = literal_eval(f"{{{', '.join(df1['col1'])}}}")
# {'network': 'att', 'city': 'austin', 'state': 'tx'}

out = (df.drop(columns=list(dic))
         .join(df[list(dic)].eq(dic).astype(int)
               .rename(columns=lambda x: f'is_{x}_{dic[x]}')
              )
      )

Output:

   user_id  is_network_att  is_city_austin  is_state_tx
0      123               1               1            1
1      113               0               0            1
2      343               1               0            0

Reproducible input:

df = pd.DataFrame({'user_id': [123, 113, 343],
                   'city': ['austin', 'houston', 'miami'],
                   'state': ['tx', 'tx', 'fl'],
                   'network': ['att', 'tmobile', 'att']})

df1 = pd.DataFrame({'col1': ['"network": "att"', '"city": "austin"', '"state": "tx"']})