Fill NAs values with combination of incremental number and column value-CodePudding

I have the following dataframe:

        UNIQUE_IDENTIFIER   COUNTRY_CODE
        
0          1                    CZ
1         nan                   CZ
2          2                    SK
3          4                    AE
4        nan                    DK
5        nan                    CZ
6        nan                    DK
7        nan                    ES

For all blank values in the "UNIQUE_IDENTIFIER" column, I would like to create a value that takes the "COUNTRY_CODE" and add incremental numbers (with a space in between the number and the Country Code) starting from 1 for each different country code. So the final dataframe would be this:

        UNIQUE_IDENTIFIER   COUNTRY_CODE
        
0          1                    CZ
1         CZ 1                  CZ
2          2                    SK
3          4                    AE
4        DK 1                   DK
5        CZ 2                   CZ
6        DK 2                   DK
7        ES 1                   ES

What would be the best way to do it?

CodePudding user response：

Use GroupBy.cumcount only for missing rows by UNIQUE_IDENTIFIER and add COUNTRY_CODE values with space separator:

m = df.UNIQUE_IDENTIFIER.isna()
s = df[m].groupby('COUNTRY_CODE').cumcount().add(1).astype(str)
df.loc[m, 'UNIQUE_IDENTIFIER'] = df.loc[m, 'COUNTRY_CODE']   ' '   s
print (df)
  UNIQUE_IDENTIFIER COUNTRY_CODE
0               1.0           CZ
1              CZ 1           CZ
2               2.0           SK
3               4.0           AE
4              DK 1           DK
5              CZ 2           CZ
6              DK 2           DK
7              ES 1           ES

Or use Series.fillna for replace missing values:

s = df[df.UNIQUE_IDENTIFIER.isna()].groupby('COUNTRY_CODE').cumcount().add(1).astype(str)
df['UNIQUE_IDENTIFIER'] = df['UNIQUE_IDENTIFIER'].fillna(df['COUNTRY_CODE']   ' '   s)
print (df)
  UNIQUE_IDENTIFIER COUNTRY_CODE
0               1.0           CZ
1              CZ 1           CZ
2               2.0           SK
3               4.0           AE
4              DK 1           DK
5              CZ 2           CZ
6              DK 2           DK
7              ES 1           ES

Details:

print (df[m].groupby('COUNTRY_CODE').cumcount().add(1).astype(str))
1    1
4    1
5    2
6    2
7    1
dtype: object

CodePudding user response：

You can set up an incremental count with GroupBy.cumcount, then add 1 and convert to string, and use it either to fillna (option #1) or to replace the values with boolean indexing (option #2):

s = df['COUNTRY_CODE'].where(df['UNIQUE_IDENTIFIER'].isna(), '')

df['UNIQUE_IDENTIFIER'] = (df['UNIQUE_IDENTIFIER']
                          .fillna(s ' ' s.groupby(s).cumcount()
                                         .add(1).astype(str))
                          )

or:

m = df['UNIQUE_IDENTIFIER'].isna()
s = df['COUNTRY_CODE'].where(m, '')
df.loc[m, 'UNIQUE_IDENTIFIER'] = s ' ' s.groupby(s).cumcount().add(1).astype(str)

output:

  UNIQUE_IDENTIFIER COUNTRY_CODE
0               1.0           CZ
1              CZ 1           CZ
2               2.0           SK
3               4.0           AE
4              DK 1           DK
5              CZ 2           CZ
6              DK 2           DK
7              ES 1           ES