If-else logic to set value of dataframe column-CodePudding

I have data in a dataframe (df) that resembles the structure below

ID	Sessions
1234	400
5678	200
9101112	199
13141516	0

I want to create a new column (new_col) in the dataframe that ranks each example per Session value, except I want to make sure 0 Sessions are not considered in the rank/zeroed out.

I have attempted applying the lambda below, but this not correct:

df['new_col'] = df['Sessions'].apply(lambda x: 0 if x == 0 else df['Sessions'].rank(ascending=True, pct=True))

sample desired output

ID	Sessions	new_col
1234	400	1.000000
5678	200	0.999987
9101112	199	0.999974
13141516	0	0

CodePudding user response：

something like this ? :

df['new_col'] = df.loc[df.Sessions > 0, 'Sessions'].rank(ascending=True, pct=True)

df['new_col'] = df['Sessions'].replace(0, np.NaN).rank(pct=True,).fillna(0)

CodePudding user response：

If you want a secure slicing, assign is your friend. Try this.

df.assign(newcol=lambda d: (
    d["Sessions"] # grab the series
    .replace(0, np.NaN) # replace the 0s with NaNs
    .rank(pct=True, ) # rank as percentages
    .fillna(0) # fill zeros back in.
   )
)

Also, this way you will be able to neatly wrap this pipe in a function.