I am appending rows to a data frame and have an id column thats needs to be unique based on existing values in the DataFrame.
For new rows I want to add an id.
Here is example starting data (row 2 and 3 newly added)
name | id | |
---|---|---|
0 | A | 65 |
1 | F | 33 |
2 | H | 0 |
3 | V | 0 |
There is no order to the data and the may previously assigned ids are not be sequential. I just want to start at highest id and increment from there.
name | id | |
---|---|---|
0 | A | 65 |
1 | F | 33 |
2 | H | 66 |
3 | V | 67 |
I have implemented that as follows
max_id = df[id_field].max()
for i in df.index:
if df.at[i, id_field] == 0:
max_id = 1
df.at[i, id_field] = max_id
I had to do it like this because i could not update the max_id
inside a comprehension. This doesn't feel very python like. Can I do this in a more elegant way? with lambdas or comprehensions?
thanks
CodePudding user response:
In your case we may need to use groupby
with cumcount
to get the increase and cummax
get the max
x = df.id.ne(0).cumsum()
df.loc[df.id==0,'id'] = df.id.cummax() x.groupby(x).cumcount()
df
Out[561]:
name id
0 A 65
1 F 33
2 H 66
3 V 67
CodePudding user response:
Here is an idea using factorize
:
# identify values to replace
m = df['id'].eq(0)
# compute a factor and add the max 1
df.loc[m, 'id'] = df.loc[m, 'name'].factorize()[0] df['id'].max() 1
Or using cumsum
:
# identify values to replace
m = df['id'].eq(0)
# replace by max cumsum of boolean
df.loc[m, 'id'] = df['id'].max() m.cumsum()
output:
name id
0 A 65
1 F 33
2 H 66
3 V 67
CodePudding user response:
The walrus operator will allow you to update max_id
inside a comprehension:
import pandas as pd
df = pd.DataFrame({'name':["A","F","H","V"], 'id':[65,33,0,0]})
print(df)
max_id = df['id'].max()
df.loc[df['id'] == 0, 'id'] = [(max_id := max_id 1) for _ in range(sum(df['id']==0))]
print(df)
Output:
name id
0 A 65
1 F 33
2 H 0
3 V 0
name id
0 A 65
1 F 33
2 H 66
3 V 67