Home > Software design >  How to extract string if delimiter exist in Python
How to extract string if delimiter exist in Python

Time:03-16

I have the following example of dataframe:

df = pd.DataFrame({'Path':['Main\Customer1\Project1\Project_LEAD', 'Main\Customer1\Project1\Project_PM','Main\Customer1\Project1\Project_QA', 'Main\Customer1\Project1\DEV',
                         'Main\Customer1\ProjectSAS\Project_LEAD', 'Main\Customer1\ProjectSAS\Project_PM','Main\Customer1\ProjectSAS\Project_SA', 'Main\Customer1\ProjectSAS\DEV']})

I would like to extract string after delimiter, which is _ in this case, but there are some rows that _ does not exist.

The result that I expect is

enter image description here

Could you please suggest?

CodePudding user response:

One option is to use extract, since your target strings are caps and are at the end of the string:

df.assign(Role =  df.Path.str.extract(r"([A-Z] )$"))

                                     Path  Role
0    Main\Customer1\Project1\Project_LEAD  LEAD
1      Main\Customer1\Project1\Project_PM    PM
2      Main\Customer1\Project1\Project_QA    QA
3             Main\Customer1\Project1\DEV   DEV
4  Main\Customer1\ProjectSAS\Project_LEAD  LEAD
5    Main\Customer1\ProjectSAS\Project_PM    PM
6    Main\Customer1\ProjectSAS\Project_SA    SA
7           Main\Customer1\ProjectSAS\DEV   DEV

CodePudding user response:

You can use str.split and keep the last item:

df['Role'] = df['Path'].str.split(r'[_\\]').str[-1]
print(df)

# Output
                                     Path  Role
0    Main\Customer1\Project1\Project_LEAD  LEAD
1      Main\Customer1\Project1\Project_PM    PM
2      Main\Customer1\Project1\Project_QA    QA
3             Main\Customer1\Project1\DEV   DEV
4  Main\Customer1\ProjectSAS\Project_LEAD  LEAD
5    Main\Customer1\ProjectSAS\Project_PM    PM
6    Main\Customer1\ProjectSAS\Project_SA    SA
7           Main\Customer1\ProjectSAS\DEV   DEV
  • Related