I have data in a dataframe (df)
that resembles the structure below
ID | Sessions |
---|---|
1234 | 400 |
5678 | 200 |
9101112 | 199 |
13141516 | 0 |
I want to create a new column (new_col
) in the dataframe that ranks each example per Session value, except I want to make sure 0 Sessions are not considered in the rank/zeroed out.
I have attempted applying the lambda below, but this not correct:
df['new_col'] = df['Sessions'].apply(lambda x: 0 if x == 0 else df['Sessions'].rank(ascending=True, pct=True))
sample desired output
ID | Sessions | new_col |
---|---|---|
1234 | 400 | 1.000000 |
5678 | 200 | 0.999987 |
9101112 | 199 | 0.999974 |
13141516 | 0 | 0 |
CodePudding user response:
something like this ? :
df['new_col'] = df.loc[df.Sessions > 0, 'Sessions'].rank(ascending=True, pct=True)
or
df['new_col'] = df['Sessions'].replace(0, np.NaN).rank(pct=True,).fillna(0)
CodePudding user response:
If you want a secure slicing, assign is your friend. Try this.
df.assign(newcol=lambda d: (
d["Sessions"] # grab the series
.replace(0, np.NaN) # replace the 0s with NaNs
.rank(pct=True, ) # rank as percentages
.fillna(0) # fill zeros back in.
)
)
Also, this way you will be able to neatly wrap this pipe in a function.