I want to return the text to the left of department of, school of to return pomona college rutgers college
I am trying to get it to work where I can pass in multile options. ie split on "school of" or "department of" I have tried the | OR operator in the line before_words_or but I am doing something wrong.
regexp_extract(institute_name,r'^(.*)department of|school of',1,1)
If there is no "school of" or "department of" in the institute_name it should return the institute_name
with t1 as
(
select 'pomona college department of chemistry' institute_name,
union all select 'rutgers college school of engineering chemical engineering'
)
select
regexp_extract(institute_name,r'^(.*)department of',1,1) before_words1,
regexp_extract(institute_name,r'^(.*)school of',1,1) before_words2,
regexp_extract(institute_name,r'^(.*)department of|school of',1,1) before_words_or
from t1;
Actual result
before_words1 | before_words2 | before_words_or |
---|---|---|
pomona college | pomona college | |
rutgers college |
Desired result
before_words1 | before_words2 | before_words_or |
---|---|---|
pomona college | pomona college department of chemistry' institute_name | pomona college |
rutgers college school of engineering chemical engineering | rutgers college | rutgers college |
CodePudding user response:
You can use
r'^(.*?)\s*(?:department|school)\s of'
Or, with a word boundary to make sure of
is matched as a whole word:
r'^(.*?)\s*(?:department|school)\s of\b'
See the regex demo.
Details:
^
- start of string(.*?)
- Capturing group: any zero or more chars other than line break chars as few as possible\s*
- zero or more whitespaces(?:department|school)
- a non-capturing group that matches eitherdepartment
or (|
)school
\s
- one or more whitespacesof
- anof
word\b
- a word boundary.