Home > Back-end >  Setting the values of filtered rows of dataframe equal to the column of another dataframe
Setting the values of filtered rows of dataframe equal to the column of another dataframe

Time:10-27

I have a dataframe df1:

df1 = pd.DataFrame([['a','Yes','abc def msg1'],
                    ['b', 'No', 'ghi jkl msg2'],
                    ['c','Yes','mno pqr msg3'],
                    ['d', 'No', 'stu vwx msg4'],
                    ['a', 'Yes', 'bcd efg msg5'],
                    ['c','No','hij klm msg6'],
                    ['a','No','nop qrs msg7'],
                    ['b','No','tuv wxy msg8']],
                  columns=['unit_name','is_required','dummy_column'])

unit_name   is_required dummy_column
    a            Yes    abc def msg1
    b             No    ghi jkl msg2
    c            Yes    mno pqr msg3
    d             No    stu vwx msg4
    a            Yes    bcd efg msg5
    c             No    hij klm msg6
    a             No    nop qrs msg7
    b             No    tuv wxy msg8

Whose rows having unit_name = 'a' and is_required='Yes' are used to derive another dataframe df2:

dummy1  dummy2  msg_column  value
abc      def    msg1        val1
bcd      efg    msg5        val2

Now I want to add the value column of df2 to df1. The rows that don't have the value must contain '-'. So the expected output I want is:

unit_name   is_required dummy_column    value
    a            Yes    abc def msg1    val1
    b             No    ghi jkl msg2    -
    c            Yes    mno pqr msg3    -
    d             No    stu vwx msg4    -
    a            Yes    bcd efg msg5    val2
    c             No    hij klm msg6    -
    a             No    nop qrs msg7    -
    b             No    tuv wxy msg8    -

In order to do this, I tried the below line of code:

df1.loc[(df1.unit_name=='a') & (df1.is_required=='Yes'),'value'] = df2['value']
df1.fillna('-')

But I'm getting the result:

unit_name   is_required dummy_column    value
    a            Yes    abc def msg1    val1
    b             No    ghi jkl msg2    -
    c            Yes    mno pqr msg3    -
    d             No    stu vwx msg4    -
    a            Yes    bcd efg msg5    -
    c             No    hij klm msg6    -
    a             No    nop qrs msg7    -
    b             No    tuv wxy msg8    -

Now I understand that this is happening because while equating two columns, the index value of the LHS will be used to get the values from RHS.

How do I get the output I need? Any ideas are welcome. Thanks in advance!

CodePudding user response:

Problem is there is different indices in both DataFrames.

Possible solution if number of filtered rows is same like number rows in df2:

print (((df1.unit_name=='a') & (df1.is_required=='Yes')).sum(), len(df2.index))

Then is possible use:

df1.loc[(df1.unit_name=='a') & (df1.is_required=='Yes'),'value'] = df2['value'].to_numpy()
  • Related