How do I replace only the integer values in the ID column with a sequence of consecutive numbers? I'd like any non-integer or NaN cells skipped.
Current df:
ID AMOUNT
1 0.00
test 5.00
test test 0.00
test 0.00
1 0.00
xx 304.95
x xx 304.95
1 0.00
1 0.00
xxxxx 0.00
1 0.00
xxx 0.00
xx xx 0.00
1 0.00
Desired Outcome:
ID AMOUNT
1 0.00
test 5.00
test test 0.00
test 0.00
2 0.00
xx 304.95
x xx 304.95
3 0.00
4 0.00
xxxxx 0.00
5 0.00
xxx 0.00
xx xx 0.00
6 0.00
I tried making a new column using np.arange(len(df)) and then replacing the ID values with that, but it's not giving me the expected outcome.
Thank you!
CodePudding user response:
You can use:
df['ID'] = (pd
.to_numeric(df['ID'], errors='coerce') # convert to numeric
.cumsum() # increment numbers
.convert_dtypes().astype(object) # back to integer
.fillna(df['ID']) # restore non-numeric
)
Alternative using slicing and updating:
s = pd.to_numeric(df['ID'], errors='coerce')
df['ID'].update(s[s.notna()].cumsum().astype(int).astype(object))
output:
ID AMOUNT
0 1 0.00
1 test 5.00
2 test test 0.00
3 test 0.00
4 2 0.00
5 xx 304.95
6 x xx 304.95
7 3 0.00
8 4 0.00
9 xxxxx 0.00
10 5 0.00
11 xxx 0.00
12 xx xx 0.00
13 6 0.00
CodePudding user response:
Solution 1
Identify numeric values with regex then create a range counter and use boolean indexing to update the values
m = df['ID'].str.match('\d ', na=False)
df.loc[m , 'ID'] = range(1, m.sum() 1)
Solution 2
Identify numeric values with pandas builtin function then create a range counter and use boolean indexing to update the values
m = pd.to_numeric(df['ID'], errors='coerce').notna()
df.loc[m , 'ID'] = range(1, m.sum() 1)
Result
ID AMOUNT
0 1 0.00
1 test 5.00
2 test test 0.00
3 test 0.00
4 2 0.00
5 xx 304.95
6 x xx 304.95
7 3 0.00
8 4 0.00
9 xxxxx 0.00
10 5 0.00
11 xxx 0.00
12 xx xx 0.00
13 6 0.00
CodePudding user response:
If you can iterate over the ID-column this can be done easily via pythons isinstance(object, class)
function.
count = 0
for index, value in enumerate(df['ID']): # Iterate over the column
if isinstance(value, int): # Check if this is an integer
df['ID'][index] = count # Replace integer
count = 1
pass
pass