I have a dataframe (a very large one) that looks as follows:
id | class_number | a_1 | a_2 | a_3 | a_4 |
---|---|---|---|---|---|
0 | 1 | 1 | 0 | 0 | 1 |
1 | 1 | 1 | 1 | 0 | 1 |
2 | 1 | 1 | 1 | 1 | 1 |
3 | 1 | 1 | 0 | 2 | 1 |
4 | 1 | 1 | 2 | 0 | 3 |
For the sake of completeness, here is a screenshot containing a larger cutout of this dataframe:
How can we replace all ones (all values 1
) within the columns a_1
to a_1000
each with a random value other than 0
, 1
and 2
?
What I tried so far works but seems not to be elegant:
cols = ["a_" str(i) for i in range(1, 1000 1)]
for col in cols:
df[col] = df[col].apply(lambda x: random.choice(range(3, 20)) if x == 1 else x)
df.head()
I would be greatful for any hint to implement this in a more staright-forward manner.
Note df[cols].apply(...)
does not work, since it yields an error "ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all()."
CodePudding user response:
IIUC, you can use:
cols = df.filter(like='a_').columns
df[cols] = df.mask(df[cols].eq(1),
np.random.randint(3,1000,(df.shape[0], len(cols))))
reproducible example:
np.random.seed(0)
df = pd.DataFrame(np.random.randint(0, 10, (10,10)),
columns=[f'a_{i 1}' for i in range(10)])
output:
a_1 a_2 a_3 a_4 a_5 a_6 a_7 a_8 a_9 a_10
0 5 0 3 3 7 9 3 5 2 4
1 7 6 8 8 957 6 7 7 8 380
2 5 9 8 9 4 3 0 3 5 0
3 2 3 8 785 3 3 3 7 0 89
4 9 9 0 4 7 3 2 7 2 0
5 0 4 5 5 6 8 4 592 4 9
6 8 773 518 7 9 9 3 6 7 2
7 0 3 5 9 4 4 6 4 4 3
8 4 4 8 4 3 7 5 5 0 846
9 5 9 3 0 5 0 28 2 4 2