I have a dataframe with a column distances
with integer values between 1 and 3500. I want to assign a weight in (0.25, 0.5, 1, 2)
to each sample based on the distance
value.
| Distances | weights |
| --------- | ------- |
| >= 3000 | 0.25 |
| >= 2000 and < 3000 | 0.5 |
| >= 1000 and < 2000 | 1 |
| < 1000 | 2 |
For the dataframe as below,
sample | distances |
---|---|
First | 3234 |
Second | 465 |
Third | 1200 |
the weights should be {0.25, 2, 1}
. What is a good way to do this?
CodePudding user response:
Considering that the dataframe is called df
, one can use a list comprehension to do that, as follows
df['weights'] = [0.25 if x >= 3000 else 0.5 if x >= 2000 and x < 3000 else 1 if x >= 1000 and x < 2000 else 2 for x in df['distances']]
[Out]:
sample distances weights
0 First 3234 0.25
1 Second 465 2.00
2 Third 1200 1.00
CodePudding user response:
How about creating a mapping using a Series with an IntervalIndex? You can then cut your distances into appropriate bins as defined by the intervals, and map those to the respective weights:
df = pd.DataFrame({
"sample": ["First", "Second", "Third"],
"distances": [3234, 465, 1200]
})
mapping = {
pd.Interval(3000, np.inf, closed="left"): 0.25,
pd.Interval(2000, 3000, closed="left"): 0.5,
pd.Interval(1000, 2000, closed="left"): 1.0,
pd.Interval(-np.inf, 1000, closed="left"): 2.0,
}
series = pd.Series(data=mapping.values(), index=mapping.keys())
df["weight"] = pd.cut(df["distances"], series.index).map(series)
# sample distances weight
# 0 First 3234 0.25
# 1 Second 465 2.00
# 2 Third 1200 1.00