Preprocessing of rows of a DataFrame by numeric characters of specified size-CodePudding

Let it be the following Python Panda DataFrame:

                NAME  NUM_OWNERS             NUM_DOCS       NUM_RESIDENTS
               Total   23900137              21028886         44571130.0   
        Macael-04062     366607                324413           727945.0   
               Spain    4283950               3642683          8464411.0   
      Badalona-08911       5829                  6250            15480.0   
      Vallecas-28031       5691                  5215            10358.0

I want to keep the rows containing a 5-digit number and modify the value of the NAME column by that number.

Resulting DataFrame:

                NAME  NUM_OWNERS             NUM_DOCS       NUM_RESIDENTS
               04062     366607                324413           727945.0     
               08911       5829                  6250            15480.0   
               28031       5691                  5215            10358.0

CodePudding user response：

Let us try use contains filter then split assign the new value

out = df[df.NAME.str.contains('-')].assign(NAME = lambda x : x['NAME'].str.split('-').str[-1])
Out[83]: 
    NAME  NUM_OWNERS  NUM_DOCS  NUM_RESIDENTS
1  04062      366607    324413       727945.0
3  08911        5829      6250        15480.0
4  28031        5691      5215        10358.0

CodePudding user response：

df=df[df['name'].astype(str).str.contains(r'[\d]{5}')].assign(name = lambda x : x['name'].str.replace(r'[a-zA-Z]-?',''))

This logic check for 5 numbers if it is found then replaces characters