Conditional decrementation on a pandas series / dataframe-CodePudding

How do I change the value inside a series/dataframe like so:

labels = df_known["labels"] # get dataframe

for label in labels:
    for c in classes_to_remove:
        if label > c:
            label -= 1 # doesn't actually change the label in the series, just the local variable

This decrements every label by the number of classes removed, that are smaller than the label.

For example, if classes_to_remove = [1, 3] and labels = [0, 2, 4] I would decrement 4 twice, because it's bigger than both 3 and 1,decrement 2 just once, as it is only bigger than 1 and keep 0 unchanged. In the end labels = [0, 1, 2]

Edit:

Example: classes_to_remove = [2]

The dataframe:

labels
0        0
1        0
2        0
3        0
4        0
        ..
20596    6
20597    6
20598    6
20599    6
20600    6
Name: labels, Length: 15497, dtype: int64

np.unique(labels)
array([0, 1, 3, 4, 5, 6, 7], dtype=int64) # notice 2 is missing

expected dataframe:

np.unique(labels)
array([0, 1, 2, 3, 4, 5, 6], dtype=int64)

CodePudding user response：

We can use np.greater.outer to compare the labels with classes_to_remove, then sum the resulting boolean mask along axis=1 and subtract this sum from labels column to get the result

labels -= np.greater.outer([*labels], classes_to_remove).sum(1)

Details:

Here np.greater.outer is used to compare each label to every number in classes_to_remove

>>> np.greater.outer([*labels], classes_to_remove)

array([[False, False],
       [ True, False],
       [ True,  True]])

Now, we sum the boolean mask obtain in the previous step along the axis=1

>>> np.greater.outer([*labels], classes_to_remove).sum(1)

array([0, 1, 2])

Subtract the calculated sum from labels to get the result

>>> labels - np.greater.outer([*labels], classes_to_remove).sum(1)

0    0
1    1
2    2
Name: labels, dtype: int64