Pandas Strip Characters Left and Right Using Wildcards-CodePudding

I"m trying to remove characters in a df column left and right of key words that keep repeating in rows of a large DF. There are leading numbers too that change so it's not so easy. My data looks like this:

128544  20210831  2200    882.2          342259.UNITED STATES.LAKERS.NBA.csv
128545  20210831  2300    918.9          342259.UNITED STATES.LAKERS.NBA.csv
128546  20210901     0      NaN          544257.UNITED STATES.CELTICS.NBA.csv
128547  20210901   100      NaN          544257.UNITED STATES.CELTICS.NBA.csv

I need the stripped character DF "df" to look like this:

        col1      col2    col3           col4
128544  20210831  2200    882.2          LAKERS
128545  20210831  2300    918.9          LAKERS
128546  20210901     0      NaN          CELTICS
128547  20210901   100      NaN          CELTICS

I've tried - df['col4'] = df.col4.str.replace(".NBA).*","")

and it gives me the right half but I cannot see how to strip the left side or do both at the same time.

Thank you for your help,

CodePudding user response：

Try with split

df['col4'] = df.col4.str.split('.').str[2]

CodePudding user response：

Or for a more precise match try with .str.extract since you’re looking into regular expressions:

>>> df['col4'] = df['col4'].str.extract('\d \.UNITED STATES\.(.*)\.NBA\.csv')[0]
>>> df
     col0      col1  col2   col3     col4
0  128544  20210831  2200  882.2   LAKERS
1  128545  20210831  2300  918.9   LAKERS
2  128546  20210901     0    NaN  CELTICS
3  128547  20210901   100    NaN  CELTICS