Home > OS >  can i use list comprehension way to conditionally update values in pandas dataframe column
can i use list comprehension way to conditionally update values in pandas dataframe column

Time:04-02

I am appending rows to a data frame and have an id column thats needs to be unique based on existing values in the DataFrame.

For new rows I want to add an id.

Here is example starting data (row 2 and 3 newly added)

name id
0 A 65
1 F 33
2 H 0
3 V 0

There is no order to the data and the may previously assigned ids are not be sequential. I just want to start at highest id and increment from there.

name id
0 A 65
1 F 33
2 H 66
3 V 67

I have implemented that as follows

 max_id = df[id_field].max()

    for i in df.index:
        if df.at[i, id_field] == 0:
            max_id  = 1
            df.at[i, id_field] = max_id

I had to do it like this because i could not update the max_id inside a comprehension. This doesn't feel very python like. Can I do this in a more elegant way? with lambdas or comprehensions? thanks

CodePudding user response:

In your case we may need to use groupby with cumcount to get the increase and cummax get the max

x = df.id.ne(0).cumsum()
df.loc[df.id==0,'id'] = df.id.cummax()   x.groupby(x).cumcount()
df
Out[561]: 
  name  id
0    A  65
1    F  33
2    H  66
3    V  67

CodePudding user response:

Here is an idea using factorize:

# identify values to replace
m = df['id'].eq(0)
# compute a factor and add the max   1 
df.loc[m, 'id'] = df.loc[m, 'name'].factorize()[0] df['id'].max() 1

Or using cumsum:

# identify values to replace
m = df['id'].eq(0)
# replace by max   cumsum of boolean
df.loc[m, 'id'] = df['id'].max() m.cumsum()

output:

  name  id
0    A  65
1    F  33
2    H  66
3    V  67

CodePudding user response:

The walrus operator will allow you to update max_id inside a comprehension:

import pandas as pd
df = pd.DataFrame({'name':["A","F","H","V"], 'id':[65,33,0,0]})
print(df)

max_id = df['id'].max()
df.loc[df['id'] == 0, 'id'] = [(max_id := max_id   1) for _ in range(sum(df['id']==0))]
print(df)

Output:

  name  id
0    A  65
1    F  33
2    H   0
3    V   0
  name  id
0    A  65
1    F  33
2    H  66
3    V  67
  • Related