I'm automatizing a task where will be necessary the column person_id, but i've these two df's.
df_1 = pd.DataFrame({'Code':['100', '101', '102', '103', '104'], 'person_id': ['4000', '4001', '4002', '4003', '4004']})
df_2 = pd.DataFrame ({'Code':['100', '101', '102', '103', '104'],'owner_1':['False', 'True', 'False', 'False','False'], 'owner_2':['True', 'False', 'False', 'False','False']})
What i need is confirm on what owner is true using columns "Code" and creating a third column with something like "Owner_n_id" using conditionals where i received back person_id column, but i'm a bit confused on how write this, all that i tried dont worked well.
I need my new df looks like this below
df_3 = pd.DataFrame ({'Code':['100', '101', '102', '103', '104'], 'owner_1_id':['False', '4001', 'False', 'False', 'False'],'owner_2_id':['4000', 'False', 'False', 'False','False']})
CodePudding user response:
TL;DR
If this was meant to be used for much more owners, you can create a loop
df_3 = df_1.merge(df_2, on = "Code")
NUM_OWNERS = 2
for owner in range(1, NUM_OWNERS 1):
indexing = df_3[f"owner_{owner}"] == "True"
df_3.loc[indexing, f"owner_{owner}"] = df_3.loc[indexing, "person_id"]
# finally drop the extra column
df_3.drop("person_id", axis = 1, inplace = True)
I will first merge 1 and 2 so the third is more consistent:
>>> df_3 = df_1.merge(df_2, on = "Code")
Code person_id owner_1 owner_2
0 100 4000 False True
1 101 4001 True False
2 102 4002 False False
3 103 4003 False False
4 104 4004 False False
Once we have this, we want to access where the owner_1 == "True"
:
owner_1_is_true = df_3.owner_1 == "True"
# and know we can update the value based in this indexing:
df_3.owner_1[owner_1_is_true] = df_3.person_id[owner_1_is_true]
# repeat for owner 2:
owner_2_is_true = df_3.owner_2 == "True"
df_3.owner_2[owner_2_is_true] = df_3.person_id[owner_2_is_true]
# drop the `person_id` we created on the merge
df_3.drop("person_id", axis = 1, inplace = True)
Result:
Code owner_1 owner_2
0 100 False 4000
1 101 4001 False
2 102 False False
3 103 False False
4 104 False False
If this was meant to be used for much more owners, you can create a loop
NUM_OWNERS = 2
for owner in range(1, NUM_OWNERS 1):
indexing = df_3[f"owner_{owner}"] == "True"
df_3.loc[indexing, f"owner_{owner}"] = df_3.loc[indexing, "person_id"]
# finally drop the extra column
df_3.drop("person_id", axis = 1, inplace = True)
CodePudding user response:
This code changes df_2 directly. You could copy df_2 into df_3 prior to proceeding. Note that using string values for False and True is inefficient in comparison to using boolean operators.
df_2['owner_1_id'] = 'False'
df_2.loc[df_2['owner_1'] == 'True', ['owner_1_id']] = df_2.merge(df_1)['person_id']
df_2['owner_2_id'] = 'False'
df_2.loc[df_2['owner_2'] == 'True', ['owner_2_id']] = df_2.merge(df_1)['person_id']