This one is weird --
let's say I have a df
like this:
user_id city state network
123 austin tx att
113 houston tx tmobile
343 miami fl att
and I have another df1
like this:
col1
'network': 'att'
'city': 'austin'
'state': 'tx'
I'm trying to build a final_df
like this:
user_id is_network_att is_city_austin is_state_tx
123 1 1 1
113 0 0 1
343 1 0 0
Easier to just show it - but a sentence to describe it:
I'm trying to create conditional/true-false columns out of df1.col1
in a new final_df
that use df
column's data.
Strategies I'm tying:
-throw the df1 columns in a list or dictionary and loop through each element and then somehow loop through each row and incorporate and if statement for each row
-maybe make a makeshift column in df1
of the exact code that would create the column in final_df
and somehow use the text in this columnd as code
update: here's a handful of the rows i'm trying to put in the dictionary
Here's a handful of rows in that I'm trying to put in a dictionary:
912 'organization': 'atlantic metro communications'
913 'isp_name': 'Atlantic Metro Communications'
915 'location_name': 'martinez ca'
917 'location_name': 'martinez ca'
918 'location_name': 'martinez ca'
919 'location_name': 'martinez ca'
920 'isp_name': 'Hurricane Electric'
922 'organization': 'hurricane electric'
923 'organization': 'hurricane electric'
924 'isp_name': 'Hurricane Electric'
925 'count_users_per_ip': 28.0
926 'organization': 'atlantic metro communications'
927 'isp_name': 'Atlantic Metro Communications'
928 'isp_name': 'Hurricane Electric'
929 'organization': 'hurricane electric'
930 'isp_name': 'Hurricane Electric'
931 'organization': 'hurricane electric'
932 'location_name': 'hermosillo son'
933 'organization': 'atlantic metro communications'
934 'isp_name': 'Atlantic Metro Communications'
935 'location_state': ' son'
966 'count_users_per_ip': 28.0
1057 'count_users_per_device': 4.0
1218 'count_ips_per_user': 3.0
1408 'moderated_action': 'SOFT_BLOCK'
1418 'moderated_action': 'SOFT_BLOCK'
1430 'moderated_action': 'SOFT_BLOCK'
1438 'moderated_action': 'SOFT_BLOCK'
1517 'app_build': '405000004'
1605 'app_build': '405000004'
CodePudding user response:
Assuming that df1
contains strings, you can first merge them and convert to dictionary, then use it as a reference for comparison with eq
:
from ast import literal_eval
# or use a different method to create the dictionary
dic = literal_eval(f"{{{', '.join(df1['col1'])}}}")
# {'network': 'att', 'city': 'austin', 'state': 'tx'}
out = (df.drop(columns=list(dic))
.join(df[list(dic)].eq(dic).astype(int)
.rename(columns=lambda x: f'is_{x}_{dic[x]}')
)
)
Output:
user_id is_network_att is_city_austin is_state_tx
0 123 1 1 1
1 113 0 0 1
2 343 1 0 0
Reproducible input:
df = pd.DataFrame({'user_id': [123, 113, 343],
'city': ['austin', 'houston', 'miami'],
'state': ['tx', 'tx', 'fl'],
'network': ['att', 'tmobile', 'att']})
df1 = pd.DataFrame({'col1': ['"network": "att"', '"city": "austin"', '"state": "tx"']})