Pandas Step function with rank-CodePudding

I am trying to rank a column with the following function:

f(x) = if x=0, then y=0 else if x<0 then y=0.5 else y=rank(x) Any ideas on how can I achieve this?

CodePudding user response：

You can use basic indexing

df = pd.DataFrame({"x": [2, 3, 1, -1, 0]})
df["y"] = df["x"].rank()
df["y"][df["x"] == 0] = 0
df["y"][df["x"] < 0] = .5

or loc

df["y"] = df["x"].rank()
df.loc[df["x"] == 0, "y"] = 0
df.loc[df["x"] < 0, "y"] = .5

or multiple .where conditions

df["y"] = df["x"].where(df["x"] == 0, df["x"].rank().where(df["x"] > 0, .5))

CodePudding user response：

So you say that you have your ranks already (with x being a data frame and col being the column name):

x[col] = x[x[col]>0].rank(pct=True, method='average')
x = x.fillna(0)

Patch to include your other conditions:

x[col] = np.where(x[col] < 0, 0.5, x[col])
x[col] = np.where(x[col] == 0, 0, x[col])

There should be no overwrite problems (nan converted to 0 aside) because i > 0, i == 0, and i < 0 are all mutually exclusive for real numbers i.

You could composite all your functions with something like this:

s = df['score'].copy()
df['score'] = np.where(
    s > 0, s.rank(pct=True, method='average'),
    np.where(
        s < 0, 0.5,
        0)
)

CodePudding user response：

Here is one way to do what your question asks:

df['y'] = (df.x < 0) * 0.5   (df.x > 0) * df.x.rank()

For example:

import pandas as pd
df = pd.DataFrame({'x' : [-2, -1, 0, 0, 1, 2, 3, 4]})
df['y'] = (df.x < 0) * 0.5   (df.x > 0) * df.x.rank()
print(df)

Output: