I have a dataframe similar to this
data={'COMB':["PNR1", "PNR1", "PNR11", "PNR2", "PNR2"],
'FROM':["MAA", "BLR", "DEL", "TRV", "HYD"],
'TO':["BLR", "MAA", "MAA", "HYD", "TRV"]}
md=pd.DataFrame(data)
md
What I want to do is to create another column based on the condition that if the From of one row is equal to the To of the next row, then it sholud return "R" otherwise it will return "O" in the new column. My final output should look like this.
Can anyone help me in python. I tried following method, but it gives me error
md_merged=(md>>
group_by('COMB')>>
mutate(TYPE=np.where(md['FROM'].isin(md['TO']),"R","O"))>>
ungroup)
ValueError: Length of values does not match length of index Please help.
CodePudding user response:
This solution compare all values between groups, not only prvious and next.
You can use custom lambda function in GroupBy.apply
for boolean mask, for avoid MultiIndex
is added group_keys=False
to DataFrame.groupby
, last set new values in numpy.where
:
mask = md.groupby('COMB', group_keys=False).apply(lambda x: x['FROM'].isin(x['TO']))
md = md.assign(Type=np.where(mask,"R","O"))
print (md)
COMB FROM TO Type
0 PNR1 MAA BLR R
1 PNR1 BLR MAA R
2 PNR11 DEL MAA O
3 PNR2 TRV HYD R
4 PNR2 HYD TRV R
This solution compare previous and next rows per groups:
Another idea is use DataFrameGroupBy.shift
, it should be faster like groupby.apply
:
mask = (md.groupby('COMB')['FROM'].shift().eq(md['TO']) |
md.groupby('COMB')['TO'].shift(-1).eq(md['FROM']))
md = md.assign(Type=np.where(mask,"R","O"))
print (md)
COMB FROM TO Type
0 PNR1 MAA BLR R
1 PNR1 BLR MAA R
2 PNR11 DEL MAA O
3 PNR2 TRV HYD R
4 PNR2 HYD TRV R
CodePudding user response:
compare consecutive values and use np.where to impose the Type. Code below. Worked for me.
md['Type'] =np.where(md.groupby('COMB',as_index=False).apply(lambda x: (x['FROM']==x['TO'].shift())|(x['FROM'].shift(-1)==x['TO'])),'R','O')
COMB FROM TO Type
0 PNR1 MAA BLR R
1 PNR1 BLR MAA R
2 PNR11 DEL MAA O
3 PNR2 TRV HYD R
4 PNR2 HYD TRV R