Home > Back-end >  Extract substring from left to a specific character for each row in a pandas dataframe?
Extract substring from left to a specific character for each row in a pandas dataframe?

Time:05-21

I have a dataframe that contains a collection of strings. These strings look something like this:

"oop9-hg78-op67_457y"

I need to cut everything from the underscore to the end in order to match this data with another set. My attempt looked something like this:

df['column'] = df['column'].str[0:'_']

I've tried toying around with .find() in this statement but nothing seems to work. Anybody have any ideas? Any and all help would be greatly appreciated!

CodePudding user response:

You can try .str.split then access the list with .str or with .str.extract

df['column'] = df['column'].str.split('_').str[0]

# or

df['column'] = df['column'].str.extract('^([^_]*)_')
print(df)

           column
0  oop9-hg78-op67

CodePudding user response:

df['column'] = df['column'].str.extract('_', expand=False)

could also be used if another option is needed.

Adding to the solution provided above by @Ynjxsjmh

CodePudding user response:

You can use str.extract:

df['column'] = df['column'df].str.extract(r'(^[^_] )')

Output (as separate column for clarity):

                column         column2
0  oop9-hg78-op67_457y  oop9-hg78-op67

Regex:

(       # start capturing group
^       # match start of string
[^_]    # one or more non-underscore
)       # end capturing group
  • Related