Home > front end >  If-else logic to set value of dataframe column
If-else logic to set value of dataframe column

Time:12-04

I have data in a dataframe (df) that resembles the structure below

ID Sessions
1234 400
5678 200
9101112 199
13141516 0

I want to create a new column (new_col) in the dataframe that ranks each example per Session value, except I want to make sure 0 Sessions are not considered in the rank/zeroed out.

I have attempted applying the lambda below, but this not correct:

df['new_col'] = df['Sessions'].apply(lambda x: 0 if x == 0 else df['Sessions'].rank(ascending=True, pct=True))

sample desired output

ID Sessions new_col
1234 400 1.000000
5678 200 0.999987
9101112 199 0.999974
13141516 0 0

CodePudding user response:

something like this ? :

df['new_col'] = df.loc[df.Sessions > 0, 'Sessions'].rank(ascending=True, pct=True)

or

df['new_col'] = df['Sessions'].replace(0, np.NaN).rank(pct=True,).fillna(0)

CodePudding user response:

If you want a secure slicing, assign is your friend. Try this.

df.assign(newcol=lambda d: (
    d["Sessions"] # grab the series
    .replace(0, np.NaN) # replace the 0s with NaNs
    .rank(pct=True, ) # rank as percentages
    .fillna(0) # fill zeros back in.
   )
)

Also, this way you will be able to neatly wrap this pipe in a function.

  • Related