I have dataframe
company = pd.DataFrame({'coid': [1,2,3],
'coname': ['BRIGHT SUNLtd','TrustCo. New Era','PteTreasury']})
I want to separate Ltd, Co., and Pte from the text in coname, so the result will be like this:
coid coname
1 BRIGHT SUN Ltd
2 Trust Co. New Era
3 Pte Treasury
CodePudding user response:
You could use a replacement dictionary, which is quite useful if you should need to add more replacements in the future:
Code:
company = pd.DataFrame({'coid': [1,2,3],
'coname': ['BRIGHT SUNLtd','TrustCo. New Era','PteTreasury']})
replacements = {'Ltd':' Ltd',
'Co.':' Co.',
'Pte':'Pte '}
company["coname"].replace(replacements, regex=True, inplace=True)
print(company)
Output:
coid coname
0 1 BRIGHT SUN Ltd
1 2 Trust Co. New Era
2 3 Pte Treasury
CodePudding user response:
You can use a regex with lookarounds:
company['coname'] = company['coname'].str.replace(r'(?<=\S)(?=Ltd\b|Co\b)|(?<=Pte)(?=\S)', ' ', regex=True)
Output:
coid coname
0 1 BRIGHT SUN Ltd
1 2 Trust Co. New Era
2 3 Pte Treasury
from a list of terms to replace:
space_before = ['Ltd', 'Co']
space_after = ['Pte']
import re
before = "|".join(re.escape(x) r"\b" for x in space_before)
after = '|'.join(f'(?<={x})' for x in space_after)
pattern = fr'(?<=\S)(?={before})|(?:{after})(?=\S)'
# '(?<=\\S)(?=Ltd\\b|Co\\b)|(?:(?<=Pte))(?=\\S)'
company['coname'] = company['coname'].str.replace(pattern, ' ', regex=True)
CodePudding user response:
If there are only three cases that you have shown in your question, you can simply use .replace()
to replace an original one with the one with a preceding space.
for example:
company['coname'] = company['coname'].str.replace('Ltd', ' Ltd')
company['coname'] = company['coname'].str.replace('Co.', ' Co.')
company['coname'] = company['coname'].str.replace('Treasury', ' Treasury')
CodePudding user response:
company['coname'] = company['coname'].str.replace(r'(?i)(Ltd|Co\.|Pte)', r' \1 ', regex=True).str.strip()
Regex description https://regex101.com/r/c26fT6/1
0 BRIGHT SUN Ltd
1 Trust Co. New Era
2 Pte Treasury
Name: coname, dtype: object