Home > database >  Replace Only Integer Values with a sequence of consecutive numbers in a column
Replace Only Integer Values with a sequence of consecutive numbers in a column

Time:10-21

How do I replace only the integer values in the ID column with a sequence of consecutive numbers? I'd like any non-integer or NaN cells skipped.

Current df:

       ID   AMOUNT
        1   0.00
     test   5.00
test test   0.00
     test   0.00
        1   0.00
       xx   304.95
     x xx   304.95
        1   0.00
        1   0.00
    xxxxx   0.00
        1   0.00
      xxx   0.00
    xx xx   0.00
        1   0.00

Desired Outcome:

       ID   AMOUNT
        1   0.00
     test   5.00
test test   0.00
     test   0.00
        2   0.00
       xx   304.95
     x xx   304.95
        3   0.00
        4   0.00
    xxxxx   0.00
        5   0.00
      xxx   0.00
    xx xx   0.00
        6   0.00

I tried making a new column using np.arange(len(df)) and then replacing the ID values with that, but it's not giving me the expected outcome.

Thank you!

CodePudding user response:

You can use:

df['ID'] = (pd
   .to_numeric(df['ID'], errors='coerce')  # convert to numeric
   .cumsum()                               # increment numbers
   .convert_dtypes().astype(object)        # back to integer
   .fillna(df['ID'])                       # restore non-numeric
)

Alternative using slicing and updating:

s = pd.to_numeric(df['ID'], errors='coerce')
df['ID'].update(s[s.notna()].cumsum().astype(int).astype(object))

output:

           ID  AMOUNT
0           1    0.00
1        test    5.00
2   test test    0.00
3        test    0.00
4           2    0.00
5          xx  304.95
6        x xx  304.95
7           3    0.00
8           4    0.00
9       xxxxx    0.00
10          5    0.00
11        xxx    0.00
12      xx xx    0.00
13          6    0.00

CodePudding user response:

Solution 1

Identify numeric values with regex then create a range counter and use boolean indexing to update the values

m = df['ID'].str.match('\d ', na=False)
df.loc[m , 'ID'] = range(1, m.sum()   1)

Solution 2

Identify numeric values with pandas builtin function then create a range counter and use boolean indexing to update the values

m = pd.to_numeric(df['ID'], errors='coerce').notna()
df.loc[m , 'ID'] = range(1, m.sum()   1)

Result

           ID  AMOUNT
0           1    0.00
1        test    5.00
2   test test    0.00
3        test    0.00
4           2    0.00
5          xx  304.95
6        x xx  304.95
7           3    0.00
8           4    0.00
9       xxxxx    0.00
10          5    0.00
11        xxx    0.00
12      xx xx    0.00
13          6    0.00

CodePudding user response:

If you can iterate over the ID-column this can be done easily via pythons isinstance(object, class) function.

count = 0
for index, value in enumerate(df['ID']): # Iterate over the column
    if isinstance(value, int): # Check if this is an integer
        df['ID'][index] = count # Replace integer
        count  = 1
        pass
    pass
  • Related