I have a dataframe df1
:
df1 = pd.DataFrame([['a','Yes','abc def msg1'],
['b', 'No', 'ghi jkl msg2'],
['c','Yes','mno pqr msg3'],
['d', 'No', 'stu vwx msg4'],
['a', 'Yes', 'bcd efg msg5'],
['c','No','hij klm msg6'],
['a','No','nop qrs msg7'],
['b','No','tuv wxy msg8']],
columns=['unit_name','is_required','dummy_column'])
unit_name is_required dummy_column
a Yes abc def msg1
b No ghi jkl msg2
c Yes mno pqr msg3
d No stu vwx msg4
a Yes bcd efg msg5
c No hij klm msg6
a No nop qrs msg7
b No tuv wxy msg8
Whose rows having unit_name = 'a' and is_required='Yes' are used to derive another dataframe df2
:
dummy1 dummy2 msg_column value
abc def msg1 val1
bcd efg msg5 val2
Now I want to add the value column of df2
to df1
. The rows that don't have the value must contain '-'. So the expected output I want is:
unit_name is_required dummy_column value
a Yes abc def msg1 val1
b No ghi jkl msg2 -
c Yes mno pqr msg3 -
d No stu vwx msg4 -
a Yes bcd efg msg5 val2
c No hij klm msg6 -
a No nop qrs msg7 -
b No tuv wxy msg8 -
In order to do this, I tried the below line of code:
df1.loc[(df1.unit_name=='a') & (df1.is_required=='Yes'),'value'] = df2['value']
df1.fillna('-')
But I'm getting the result:
unit_name is_required dummy_column value
a Yes abc def msg1 val1
b No ghi jkl msg2 -
c Yes mno pqr msg3 -
d No stu vwx msg4 -
a Yes bcd efg msg5 -
c No hij klm msg6 -
a No nop qrs msg7 -
b No tuv wxy msg8 -
Now I understand that this is happening because while equating two columns, the index value of the LHS will be used to get the values from RHS.
How do I get the output I need? Any ideas are welcome. Thanks in advance!
CodePudding user response:
Problem is there is different indices in both DataFrames.
Possible solution if number of filtered rows is same like number rows in df2
:
print (((df1.unit_name=='a') & (df1.is_required=='Yes')).sum(), len(df2.index))
Then is possible use:
df1.loc[(df1.unit_name=='a') & (df1.is_required=='Yes'),'value'] = df2['value'].to_numpy()