I have a pandas dataframe, where the 2nd, 3rd and 6th columns look like so:
start | end | strand |
---|---|---|
108286 | 108361 | |
734546 | 734621 | - |
761233 | 761309 |
I'm trying to implement a conditional where, if strand is , then the value in end becomes the equivalent value in start 1, and if strand is -, then the value in start becomes the value in end, so the output should look like this:
start | end | strand |
---|---|---|
108286 | 108287 | |
734620 | 734621 | - |
761233 | 761234 |
And where the pseudocode may look like this:
if df["strand"] == " ":
df["end"] = df["start"] 1
else:
df["start"] = df["end"] - 1
I imagine this might be best done with loc/iloc
or numpy.where
? but I can't seem to get it to work, as always, any help is appreciated!
CodePudding user response:
You are correct, loc
is the operator you are looking for
df.loc[df.strand==' ','end'] = df.loc[df.strand==' ','start'] 1
df.loc[df.strand=='-','start'] = df.loc[df.strand=='-','end']-1
CodePudding user response:
You could also use numpy.where
:
import numpy as np
df[['start', 'end']] = np.where(df[['strand']]=='-', df[['end','end']]-[1,0], df[['start','start']] [0,1])
Note that this assumes strand
can have one of two values:
or -
. If it can have any other values, we can use numpy.select
instead.
Output:
start end strand
0 108286 108287
1 734620 734621 -
2 761233 761234