I have the following dataframe:
UNIQUE_IDENTIFIER COUNTRY_CODE
0 1 CZ
1 nan CZ
2 2 SK
3 4 AE
4 nan DK
5 nan CZ
6 nan DK
7 nan ES
For all blank values in the "UNIQUE_IDENTIFIER" column, I would like to create a value that takes the "COUNTRY_CODE" and add incremental numbers (with a space in between the number and the Country Code) starting from 1 for each different country code. So the final dataframe would be this:
UNIQUE_IDENTIFIER COUNTRY_CODE
0 1 CZ
1 CZ 1 CZ
2 2 SK
3 4 AE
4 DK 1 DK
5 CZ 2 CZ
6 DK 2 DK
7 ES 1 ES
What would be the best way to do it?
CodePudding user response:
Use GroupBy.cumcount
only for missing rows by UNIQUE_IDENTIFIER
and add COUNTRY_CODE
values with space separator:
m = df.UNIQUE_IDENTIFIER.isna()
s = df[m].groupby('COUNTRY_CODE').cumcount().add(1).astype(str)
df.loc[m, 'UNIQUE_IDENTIFIER'] = df.loc[m, 'COUNTRY_CODE'] ' ' s
print (df)
UNIQUE_IDENTIFIER COUNTRY_CODE
0 1.0 CZ
1 CZ 1 CZ
2 2.0 SK
3 4.0 AE
4 DK 1 DK
5 CZ 2 CZ
6 DK 2 DK
7 ES 1 ES
Or use Series.fillna
for replace missing values:
s = df[df.UNIQUE_IDENTIFIER.isna()].groupby('COUNTRY_CODE').cumcount().add(1).astype(str)
df['UNIQUE_IDENTIFIER'] = df['UNIQUE_IDENTIFIER'].fillna(df['COUNTRY_CODE'] ' ' s)
print (df)
UNIQUE_IDENTIFIER COUNTRY_CODE
0 1.0 CZ
1 CZ 1 CZ
2 2.0 SK
3 4.0 AE
4 DK 1 DK
5 CZ 2 CZ
6 DK 2 DK
7 ES 1 ES
Details:
print (df[m].groupby('COUNTRY_CODE').cumcount().add(1).astype(str))
1 1
4 1
5 2
6 2
7 1
dtype: object
CodePudding user response:
You can set up an incremental count with GroupBy.cumcount
, then add 1 and convert to string, and use it either to fillna
(option #1) or to replace the values with boolean indexing (option #2):
s = df['COUNTRY_CODE'].where(df['UNIQUE_IDENTIFIER'].isna(), '')
df['UNIQUE_IDENTIFIER'] = (df['UNIQUE_IDENTIFIER']
.fillna(s ' ' s.groupby(s).cumcount()
.add(1).astype(str))
)
or:
m = df['UNIQUE_IDENTIFIER'].isna()
s = df['COUNTRY_CODE'].where(m, '')
df.loc[m, 'UNIQUE_IDENTIFIER'] = s ' ' s.groupby(s).cumcount().add(1).astype(str)
output:
UNIQUE_IDENTIFIER COUNTRY_CODE
0 1.0 CZ
1 CZ 1 CZ
2 2.0 SK
3 4.0 AE
4 DK 1 DK
5 CZ 2 CZ
6 DK 2 DK
7 ES 1 ES