I am new to pandas. I am trying to fetch a multiple substring from a string. But I need to check between particular start and end.
if it is present i need to get its position, which substring. If not present print "No".
example:
Words I need to search = hi/lo position to search = 7-10
Input | Output |
---|---|
HelloWorld | NA |
worldofhi | 7,hi |
worldoflove | 8,lo |
final=final.assign(Result=final.Sequence.str.find('hi'|'lo',7,10))
CodePudding user response:
Use str.replace
:
target = 'hi|love'
m = df['sequence'].str.contains(target)
df.loc[m, 'output'] = (df.loc[m, 'sequence']
.str.replace(fr'.*({target}).*',
lambda m: f'{m.start(1) 1},{m.group(1)}',
regex=True)
)
df.loc[~m, 'output'] = 'NA'
Output:
sequence output
0 HelloWorld NO
1 worldofhi 8,hi
2 worldoflove 8,love
Used input:
sequence
0 HelloWorld
1 worldofhi
2 worldoflove
checking only in substring 7:10
target = 'hi|love'
s = df['sequence'].str[7:10 1]
m = s.str.contains(target)
df.loc[m, 'output'] = (s[m]
.str.replace(fr'.*({target}).*',
lambda m: f'{m.start(1) 7 1},{m.group(1)}',
regex=True)
)
df.loc[~m, 'output'] = 'NA'