How map numerical values in pandas dataframe into a discrete set?-CodePudding

I have a dataframe with a column distances with integer values between 1 and 3500. I want to assign a weight in (0.25, 0.5, 1, 2) to each sample based on the distance value.

| Distances            | weights |
| ---------            | ------- |
| >= 3000              | 0.25    |
| >= 2000 and < 3000   | 0.5     |
| >= 1000 and < 2000   | 1       |
| < 1000               | 2       |

For the dataframe as below,

sample	distances
First	3234
Second	465
Third	1200

the weights should be {0.25, 2, 1}. What is a good way to do this?

CodePudding user response：

Considering that the dataframe is called df, one can use a list comprehension to do that, as follows

df['weights'] = [0.25 if x >= 3000 else 0.5 if x >= 2000 and x < 3000 else 1 if x >= 1000 and x < 2000 else 2 for x in df['distances']]

[Out]:

   sample  distances  weights
0   First       3234     0.25
1  Second        465     2.00
2   Third       1200     1.00

CodePudding user response：

How about creating a mapping using a Series with an IntervalIndex? You can then cut your distances into appropriate bins as defined by the intervals, and map those to the respective weights:

df = pd.DataFrame({
    "sample": ["First", "Second", "Third"],
    "distances": [3234, 465, 1200]
})

mapping = {
    pd.Interval(3000,    np.inf, closed="left"): 0.25,
    pd.Interval(2000,    3000,   closed="left"): 0.5,
    pd.Interval(1000,    2000,   closed="left"): 1.0,
    pd.Interval(-np.inf, 1000,   closed="left"): 2.0,
}

series = pd.Series(data=mapping.values(), index=mapping.keys())

df["weight"] = pd.cut(df["distances"], series.index).map(series)

#    sample  distances weight
# 0   First       3234   0.25
# 1  Second        465   2.00
# 2   Third       1200   1.00