I have the following example of dataframe:
df = pd.DataFrame({'Path':['Main\Customer1\Project1\Project_LEAD', 'Main\Customer1\Project1\Project_PM','Main\Customer1\Project1\Project_QA', 'Main\Customer1\Project1\DEV',
'Main\Customer1\ProjectSAS\Project_LEAD', 'Main\Customer1\ProjectSAS\Project_PM','Main\Customer1\ProjectSAS\Project_SA', 'Main\Customer1\ProjectSAS\DEV']})
I would like to extract string after delimiter, which is _
in this case, but there are some rows that _
does not exist.
The result that I expect is
Could you please suggest?
CodePudding user response:
One option is to use extract
, since your target strings are caps and are at the end of the string:
df.assign(Role = df.Path.str.extract(r"([A-Z] )$"))
Path Role
0 Main\Customer1\Project1\Project_LEAD LEAD
1 Main\Customer1\Project1\Project_PM PM
2 Main\Customer1\Project1\Project_QA QA
3 Main\Customer1\Project1\DEV DEV
4 Main\Customer1\ProjectSAS\Project_LEAD LEAD
5 Main\Customer1\ProjectSAS\Project_PM PM
6 Main\Customer1\ProjectSAS\Project_SA SA
7 Main\Customer1\ProjectSAS\DEV DEV
CodePudding user response:
You can use str.split
and keep the last item:
df['Role'] = df['Path'].str.split(r'[_\\]').str[-1]
print(df)
# Output
Path Role
0 Main\Customer1\Project1\Project_LEAD LEAD
1 Main\Customer1\Project1\Project_PM PM
2 Main\Customer1\Project1\Project_QA QA
3 Main\Customer1\Project1\DEV DEV
4 Main\Customer1\ProjectSAS\Project_LEAD LEAD
5 Main\Customer1\ProjectSAS\Project_PM PM
6 Main\Customer1\ProjectSAS\Project_SA SA
7 Main\Customer1\ProjectSAS\DEV DEV