I'm automatizing a task where will be necessary the column person_id, but i've these two df's.

df_1 = pd.DataFrame({'Code':['100', '101', '102', '103', '104'], 'person_id': ['4000', '4001', '4002', '4003', '4004']})

df_2 = pd.DataFrame ({'Code':['100', '101', '102', '103', '104'],'owner_1':['False', 'True', 'False', 'False','False'], 'owner_2':['True', 'False', 'False', 'False','False']})

What i need is confirm on what owner is true using columns "Code" and creating a third column with something like "Owner_n_id" using conditionals where i received back person_id column, but i'm a bit confused on how write this, all that i tried dont worked well.

I need my new df looks like this below

df_3 = pd.DataFrame ({'Code':['100', '101', '102', '103', '104'], 'owner_1_id':['False', '4001', 'False', 'False', 'False'],'owner_2_id':['4000', 'False', 'False', 'False','False']})

CodePudding user response：

TL;DR

If this was meant to be used for much more owners, you can create a loop

df_3 = df_1.merge(df_2, on = "Code")
NUM_OWNERS = 2
for owner in range(1, NUM_OWNERS 1):
     indexing = df_3[f"owner_{owner}"] == "True"
     df_3.loc[indexing, f"owner_{owner}"] = df_3.loc[indexing, "person_id"]

# finally drop the extra column
df_3.drop("person_id", axis = 1, inplace = True)

I will first merge 1 and 2 so the third is more consistent:

>>> df_3 = df_1.merge(df_2, on = "Code")
  Code person_id owner_1 owner_2
0  100      4000   False    True
1  101      4001    True   False
2  102      4002   False   False
3  103      4003   False   False
4  104      4004   False   False

Once we have this, we want to access where the owner_1 == "True":

owner_1_is_true = df_3.owner_1 == "True"
# and know we can update the value based in this indexing:
df_3.owner_1[owner_1_is_true] = df_3.person_id[owner_1_is_true]

# repeat for owner 2:
owner_2_is_true = df_3.owner_2 == "True"
df_3.owner_2[owner_2_is_true] = df_3.person_id[owner_2_is_true]

# drop the `person_id` we created on the merge
df_3.drop("person_id", axis = 1, inplace = True)

Result:

  Code owner_1 owner_2
0  100   False    4000
1  101    4001   False
2  102   False   False
3  103   False   False
4  104   False   False

If this was meant to be used for much more owners, you can create a loop

NUM_OWNERS = 2
for owner in range(1, NUM_OWNERS 1):
     indexing = df_3[f"owner_{owner}"] == "True"
     df_3.loc[indexing, f"owner_{owner}"] = df_3.loc[indexing, "person_id"]

# finally drop the extra column
df_3.drop("person_id", axis = 1, inplace = True)

CodePudding user response：

This code changes df_2 directly. You could copy df_2 into df_3 prior to proceeding. Note that using string values for False and True is inefficient in comparison to using boolean operators.

df_2['owner_1_id'] = 'False'
df_2.loc[df_2['owner_1'] == 'True', ['owner_1_id']] = df_2.merge(df_1)['person_id']

df_2['owner_2_id'] = 'False'
df_2.loc[df_2['owner_2'] == 'True', ['owner_2_id']] = df_2.merge(df_1)['person_id']