I am trying to rank a column with the following function:
f(x) = if x=0, then y=0 else if x<0 then y=0.5 else y=rank(x)
Any ideas on how can I achieve this?
CodePudding user response:
You can use basic indexing
df = pd.DataFrame({"x": [2, 3, 1, -1, 0]})
df["y"] = df["x"].rank()
df["y"][df["x"] == 0] = 0
df["y"][df["x"] < 0] = .5
or loc
df["y"] = df["x"].rank()
df.loc[df["x"] == 0, "y"] = 0
df.loc[df["x"] < 0, "y"] = .5
or multiple .where
conditions
df["y"] = df["x"].where(df["x"] == 0, df["x"].rank().where(df["x"] > 0, .5))
CodePudding user response:
So you say that you have your ranks already (with x
being a data frame and col
being the column name):
x[col] = x[x[col]>0].rank(pct=True, method='average')
x = x.fillna(0)
Patch to include your other conditions:
x[col] = np.where(x[col] < 0, 0.5, x[col])
x[col] = np.where(x[col] == 0, 0, x[col])
There should be no overwrite problems (nan
converted to 0 aside) because i > 0
, i == 0
, and i < 0
are all mutually exclusive for real numbers i
.
You could composite all your functions with something like this:
s = df['score'].copy()
df['score'] = np.where(
s > 0, s.rank(pct=True, method='average'),
np.where(
s < 0, 0.5,
0)
)
CodePudding user response:
Here is one way to do what your question asks:
df['y'] = (df.x < 0) * 0.5 (df.x > 0) * df.x.rank()
For example:
import pandas as pd
df = pd.DataFrame({'x' : [-2, -1, 0, 0, 1, 2, 3, 4]})
df['y'] = (df.x < 0) * 0.5 (df.x > 0) * df.x.rank()
print(df)
Output:
x y
0 -2 0.5
1 -1 0.5
2 0 0.0
3 0 0.0
4 1 5.0
5 2 6.0
6 3 7.0
7 4 8.0