In german language feminine endings are ['/innen','/in','/Innen','/In','Innen','In','innen']
. I want to remove them from the strings, that are in list.
I have come up with the following:
rm_gender = ['/innen','/in','/Innen','/In','Innen','In','innen']
test_list = ['Softwareentwickler',
'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Softwareentwickler',
'Softwareentwickler',
'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Softwareentwickler',
'Softwareentwickler',
'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Hard-Softwareentwickler',
'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker',
'Hard-Softwareentwickler',
'Hard-Softwareentwickler',
'Hard-Softwareentwickler']
result = [vac if any([substring in vac for substring in ['-In',' In']]) else re.sub('|'.join(rm_gender),'',vac) if vac[:2] not in 'In' else 'In' re.sub('|'.join(rm_gender),'',vac) for vac in test_list]
But it doesn't work, because there is a space in front of words like 'SoftwareentwicklerInnen'. How can i correctly do it with regex?
Important is: i want to keep format of the string as it is. Just need to remove feminine ending( or I want to return corrected list of strings)
CodePudding user response:
Try this one:
import re
test_list = test_list[0].split(";")
test_list.append("Informatikerin") # adding one ending with in - I don't know if this is a correct word!
pattern = re.compile("in(?:nen)?$", re.IGNORECASE)
[re.sub(pattern, "", x) for x in test_list]
OUTPUT
['Data Scientists', ' DWH-BI Consultants', ' Softwareentwickler', ' Informatiker', ' Statistiker', 'Informatiker']
FOLLOW UP
If you want to rebuild the string as it was, jusr rejoin by ";":
";".join([re.sub(pattern, "", x) for x in test_list])
OUTPUT
'Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker;Informatiker'
If the idea is to match all the words in each line:
pattern = re.compile("(in(?:nen)?)(?=;|\.|,|;| |:|$)", re.IGNORECASE)
re.sub(pattern, "", "You are a Softwareentwicklerinnen: that is as nice as Informatikerin")
re.sub(pattern, "", "You are a Softwareentwicklerinnen; that is as nice as Informatikerin")
OUTPUT
'You are a Softwareentwickler: that is as nice as Informatiker'
'You are a Softwareentwickler; that is as nice as Informatiker'
CodePudding user response:
You could convert matches of the following regular expression to empty strings:
\/?[Ii](?:nnen|n)\b
This regex can be broken down as follows.
\/? # optionally match '/'
[Ii] # match 'I' or 'i'
(?:nnen|n) # match 'nnen' or 'n' (in that order)
\b # match a word boundary
The word boundary is to prevent matches of strings such as `innenantenne'
CodePudding user response:
You can use
rm_gender_regex = re.compile( r'(?:\b/|\B)i(?:nne)?n\b', re.I )
result = [rm_gender_regex.sub('', vac) for vac in test_list]
See the regex demo. Details:
(?:\b/|\B)
- either a/
that is preceded with a word char or a position that is preceded with a word chari
-i
(?:nne)?
- an optionalnne
substringn
- an
char\b
- a word boundary.
See the Python demo:
import re
test_list = ['Softwareentwickler', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Softwareentwickler', 'Softwareentwickler', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Softwareentwickler', 'Softwareentwickler', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientists; DWH-BI Consultants; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Hard-Softwareentwickler', 'Data Scientist; DWH-BI Consultant; SoftwareentwicklerInnen; InformatikerInnen; Statistiker', 'Hard-Softwareentwickler', 'Hard-Softwareentwickler', 'Hard-Softwareentwickler']
rm_gender_regex = re.compile( r'(?:\b/|\B)i(?:nne)?n\b', re.I )
result = [rm_gender_regex.sub('', vac) for vac in test_list]
for x in result:
print(x)
Output:
Softwareentwickler
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Softwareentwickler
Softwareentwickler
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Softwareentwickler
Softwareentwickler
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Data Scientists; DWH-BI Consultants; Softwareentwickler; Informatiker; Statistiker
Data Scientist; DWH-BI Consultant; Softwareentwickler; Informatiker; Statistiker
Data Scientist; DWH-BI Consultant; Softwareentwickler; Informatiker; Statistiker
Data Scientist; DWH-BI Consultant; Softwareentwickler; Informatiker; Statistiker
Data Scientist; DWH-BI Consultant; Softwareentwickler; Informatiker; Statistiker
Hard-Softwareentwickler
Data Scientist; DWH-BI Consultant; Softwareentwickler; Informatiker; Statistiker
Hard-Softwareentwickler
Hard-Softwareentwickler
Hard-Softwareentwickler