I have a dataframe that contains a collection of strings. These strings look something like this:
"oop9-hg78-op67_457y"
I need to cut everything from the underscore to the end in order to match this data with another set. My attempt looked something like this:
df['column'] = df['column'].str[0:'_']
I've tried toying around with .find() in this statement but nothing seems to work. Anybody have any ideas? Any and all help would be greatly appreciated!
CodePudding user response:
You can try .str.split
then access the list with .str
or with .str.extract
df['column'] = df['column'].str.split('_').str[0]
# or
df['column'] = df['column'].str.extract('^([^_]*)_')
print(df)
column
0 oop9-hg78-op67
CodePudding user response:
df['column'] = df['column'].str.extract('_', expand=False)
could also be used if another option is needed.
Adding to the solution provided above by @Ynjxsjmh
CodePudding user response:
You can use str.extract
:
df['column'] = df['column'df].str.extract(r'(^[^_] )')
Output (as separate column for clarity):
column column2
0 oop9-hg78-op67_457y oop9-hg78-op67
Regex:
( # start capturing group
^ # match start of string
[^_] # one or more non-underscore
) # end capturing group