I have one column (X) that contains some values with duplicates (several rows have the same value and they all are sequenced). I have a requirement to randomize new values for that columns for testing one issue. so I tried:
np.random.seed(RSEED)
df["X"] = np.random.randint(100, 500, df.shape[0])
But this is not enough, I need to keep the sequences, I mean to group by same value then to randomize for all of the rows of that value a new number, and to do it for all grouped values of the original column. e.g.
X | new X (randomized) |
---|---|
210 | 500 |
210 | 500 |
. | . |
. | . |
340 | 100 |
340 | 100 |
. | . |
. | . |
I started looking if Pandas has something built-in, I can group by pandas.DataFrame.groupBy
but couldn't find a pandas.DataFrame.random
that can be applied for the same group.
CodePudding user response:
Simple approach is to use groupby and transform to broadcast random integers per group
df.groupby('X')['X'].transform(lambda _: np.random.randint(100, 500))
0 137
1 137
2 .
3 .
4 335
5 335
Name: X, dtype: int64