Extract substring from left to a specific character for each row in a pandas dataframe?-CodePudding

I have a dataframe that contains a collection of strings. These strings look something like this:

"oop9-hg78-op67_457y"

I need to cut everything from the underscore to the end in order to match this data with another set. My attempt looked something like this:

df['column'] = df['column'].str[0:'_']

I've tried toying around with .find() in this statement but nothing seems to work. Anybody have any ideas? Any and all help would be greatly appreciated!

CodePudding user response：

You can try .str.split then access the list with .str or with .str.extract

df['column'] = df['column'].str.split('_').str[0]

# or

df['column'] = df['column'].str.extract('^([^_]*)_')

print(df)

           column
0  oop9-hg78-op67

CodePudding user response：

df['column'] = df['column'].str.extract('_', expand=False)

could also be used if another option is needed.

Adding to the solution provided above by @Ynjxsjmh

CodePudding user response：

You can use str.extract:

df['column'] = df['column'df].str.extract(r'(^[^_] )')

Output (as separate column for clarity):

                column         column2
0  oop9-hg78-op67_457y  oop9-hg78-op67

Regex:

(       # start capturing group
^       # match start of string
[^_]    # one or more non-underscore
)       # end capturing group