Home > Software design >  Groupby a column and then compare two other columns and return a value in a different column
Groupby a column and then compare two other columns and return a value in a different column

Time:11-12

I have a dataframe similar to this

    data={'COMB':["PNR1", "PNR1", "PNR11", "PNR2", "PNR2"],
        'FROM':["MAA", "BLR", "DEL", "TRV", "HYD"],
         'TO':["BLR", "MAA", "MAA", "HYD", "TRV"]}
md=pd.DataFrame(data)
md

What I want to do is to create another column based on the condition that if the From of one row is equal to the To of the next row, then it sholud return "R" otherwise it will return "O" in the new column. My final output should look like this. enter image description here

Can anyone help me in python. I tried following method, but it gives me error

md_merged=(md>>
            group_by('COMB')>>
            mutate(TYPE=np.where(md['FROM'].isin(md['TO']),"R","O"))>>
           ungroup)

ValueError: Length of values does not match length of index Please help.

CodePudding user response:

This solution compare all values between groups, not only prvious and next.

You can use custom lambda function in GroupBy.apply for boolean mask, for avoid MultiIndex is added group_keys=False to DataFrame.groupby, last set new values in numpy.where:

mask = md.groupby('COMB', group_keys=False).apply(lambda x: x['FROM'].isin(x['TO']))
md = md.assign(Type=np.where(mask,"R","O"))
print (md)
    COMB FROM   TO Type
0   PNR1  MAA  BLR    R
1   PNR1  BLR  MAA    R
2  PNR11  DEL  MAA    O
3   PNR2  TRV  HYD    R
4   PNR2  HYD  TRV    R

This solution compare previous and next rows per groups:

Another idea is use DataFrameGroupBy.shift, it should be faster like groupby.apply:

mask = (md.groupby('COMB')['FROM'].shift().eq(md['TO']) | 
        md.groupby('COMB')['TO'].shift(-1).eq(md['FROM']))

md = md.assign(Type=np.where(mask,"R","O"))
print (md)
    COMB FROM   TO Type
0   PNR1  MAA  BLR    R
1   PNR1  BLR  MAA    R
2  PNR11  DEL  MAA    O
3   PNR2  TRV  HYD    R
4   PNR2  HYD  TRV    R

CodePudding user response:

compare consecutive values and use np.where to impose the Type. Code below. Worked for me.

md['Type'] =np.where(md.groupby('COMB',as_index=False).apply(lambda x: (x['FROM']==x['TO'].shift())|(x['FROM'].shift(-1)==x['TO'])),'R','O')



   COMB FROM   TO Type
0   PNR1  MAA  BLR    R
1   PNR1  BLR  MAA    R
2  PNR11  DEL  MAA    O
3   PNR2  TRV  HYD    R
4   PNR2  HYD  TRV    R
  • Related