I have a table:
genome start end strand etc
GUT_GENOME270877.fasta 98 396
GUT_GENOME270877.fasta 384 574 -
GUT_GENOME270877.fasta 593 984
GUT_GENOME270877.fasta 991 999 -
I'd like to make a new table with column coordinates
, which joins start
and end
columns and looking like this:
genome start end strand etc coordinates
GUT_GENOME270877.fasta 98 396 98..396
GUT_GENOME270877.fasta 384 574 - complement(384..574)
GUT_GENOME270877.fasta 593 984 593..984
GUT_GENOME270877.fasta 991 999 - complement(991..999)
so that if there's a -
in the etc
column, I'd like to do not just
df['coordinates'] = df['start'].astype(str) '..' df['end'].astype(str)
but to add brackets and complement, like this:
df['coordinates'] = 'complement(' df['start'].astype(str) '..' df['end'].astype(str) ')'
The only things i'm missing is how to introduce the condition.
CodePudding user response:
You can use numpy.where
:
m = df['strand'].eq('-')
df['coordinates'] = (np.where(m, 'complement(', '')
df['start'].astype(str) '..' df['end'].astype(str)
np.where(m, ')', '')
)
Or boolean indexing:
m = df['strand'].eq('-')
df['coordinates'] = df['start'].astype(str) '..' df['end'].astype(str)
df.loc[m, 'coordinates'] = 'complement(' df.loc[m, 'coordinates'] ')'
Output:
genome start end strand coordinates
0 GUT_GENOME270877.fasta 98 396 98..396
1 GUT_GENOME270877.fasta 384 574 - complement(384..574)
2 GUT_GENOME270877.fasta 593 984 593..984
3 GUT_GENOME270877.fasta 991 999 - complement(991..999)