How do I change the value inside a series/dataframe like so:
labels = df_known["labels"] # get dataframe
for label in labels:
for c in classes_to_remove:
if label > c:
label -= 1 # doesn't actually change the label in the series, just the local variable
This decrements every label by the number of classes removed, that are smaller than the label.
For example, if classes_to_remove = [1, 3]
and labels = [0, 2, 4]
I would decrement 4
twice, because it's bigger than both 3
and 1
,decrement 2
just once, as it is only bigger than 1
and keep 0
unchanged. In the end labels = [0, 1, 2]
Edit:
Example:
classes_to_remove = [2]
The dataframe:
labels
0 0
1 0
2 0
3 0
4 0
..
20596 6
20597 6
20598 6
20599 6
20600 6
Name: labels, Length: 15497, dtype: int64
np.unique(labels)
array([0, 1, 3, 4, 5, 6, 7], dtype=int64) # notice 2 is missing
expected dataframe:
np.unique(labels)
array([0, 1, 2, 3, 4, 5, 6], dtype=int64)
CodePudding user response:
We can use np.greater.outer
to compare the labels
with classes_to_remove
, then sum
the resulting boolean mask along axis=1
and subtract this sum from labels
column to get the result
labels -= np.greater.outer([*labels], classes_to_remove).sum(1)
Details:
- Here
np.greater.outer
is used to compare each label to every number inclasses_to_remove
>>> np.greater.outer([*labels], classes_to_remove)
array([[False, False],
[ True, False],
[ True, True]])
- Now, we sum the boolean mask obtain in the previous step along the axis=1
>>> np.greater.outer([*labels], classes_to_remove).sum(1)
array([0, 1, 2])
- Subtract the calculated sum from labels to get the result
>>> labels - np.greater.outer([*labels], classes_to_remove).sum(1)
0 0
1 1
2 2
Name: labels, dtype: int64