I have a dataset containing a thousand zeros, five hundred ones, and so on. I want to change the first 400 zeros to 0.3, the next 600 zeros to 0.6. Then, I want to change the first 200 ones to 1.4, the next 300 ones to 1.8. And so on.
The whole point being I want to change the integer value to some fractions based on the frequency specified.
Ex: Dataset: 0,0,0,0,0,1,1,1,1,1,1 Output: 0.2,0.2,0.2,0.2,0.8,1.2,1.2,1.2,1.4,1.4,1.4 Input: Frequency, Dataset Frequency=[4,1] for 0 & [3,3] for 1 New dataset=[0.2,0.8] for 0 & [1.2,1.4] for 1
CodePudding user response:
Assuming your datapoints are sorted, a simple solution would be
df = pd.DataFrame({'col':[0,0,0,0,0,1,1,1,1,1,1]})
frequencies = [
[4, 1],
[3, 3],
]
new_values = [
[0.2, 0.8],
[1.2, 1.4],
]
a = np.concatenate(frequencies)
df['new_col'] = np.concatenate(new_values)[np.arange(len(a)).repeat(a)]
Output:
>>> df
col new_col
0 0 0.2
1 0 0.2
2 0 0.2
3 0 0.2
4 0 0.8
5 1 1.2
6 1 1.2
7 1 1.2
8 1 1.4
9 1 1.4
10 1 1.4