Home > Net >  Separate Specific String in Dataframe
Separate Specific String in Dataframe

Time:12-06

I have dataframe

company = pd.DataFrame({'coid': [1,2,3],
                        'coname': ['BRIGHT SUNLtd','TrustCo. New Era','PteTreasury']})

I want to separate Ltd, Co., and Pte from the text in coname, so the result will be like this:

coid coname
1    BRIGHT SUN Ltd
2    Trust Co. New Era
3    Pte Treasury

CodePudding user response:

You could use a replacement dictionary, which is quite useful if you should need to add more replacements in the future:

Code:

company = pd.DataFrame({'coid': [1,2,3],
                        'coname': ['BRIGHT SUNLtd','TrustCo. New Era','PteTreasury']})

replacements = {'Ltd':' Ltd', 
                'Co.':' Co.', 
                'Pte':'Pte '}

company["coname"].replace(replacements, regex=True, inplace=True)

print(company)

Output:

   coid             coname
0     1     BRIGHT SUN Ltd
1     2  Trust Co. New Era
2     3       Pte Treasury

CodePudding user response:

You can use a regex with lookarounds:

company['coname'] = company['coname'].str.replace(r'(?<=\S)(?=Ltd\b|Co\b)|(?<=Pte)(?=\S)', ' ', regex=True)

Output:

   coid             coname
0     1     BRIGHT SUN Ltd
1     2  Trust Co. New Era
2     3       Pte Treasury

regex demo

from a list of terms to replace:

space_before = ['Ltd', 'Co']
space_after  = ['Pte']

import re

before = "|".join(re.escape(x) r"\b" for x in space_before)
after = '|'.join(f'(?<={x})' for x in space_after)
pattern = fr'(?<=\S)(?={before})|(?:{after})(?=\S)'
# '(?<=\\S)(?=Ltd\\b|Co\\b)|(?:(?<=Pte))(?=\\S)'

company['coname'] = company['coname'].str.replace(pattern, ' ', regex=True)

CodePudding user response:

If there are only three cases that you have shown in your question, you can simply use .replace() to replace an original one with the one with a preceding space.

for example:

company['coname'] = company['coname'].str.replace('Ltd', ' Ltd')
company['coname'] = company['coname'].str.replace('Co.', ' Co.')
company['coname'] = company['coname'].str.replace('Treasury', ' Treasury')

CodePudding user response:

company['coname'] = company['coname'].str.replace(r'(?i)(Ltd|Co\.|Pte)', r' \1 ', regex=True).str.strip()

Regex description https://regex101.com/r/c26fT6/1

0        BRIGHT SUN Ltd
1    Trust Co.  New Era
2          Pte Treasury
Name: coname, dtype: object
  • Related