Home > database >  How to divide an array in several sections?
How to divide an array in several sections?

Time:08-07

I have an array with approximately 12000 length, something like array([0.3, 0.6, 0.3, 0.5, 0.1, 0.9, 0.4...]). Also, I have a column in a dataframe that provides values like 2,3,7,3,2,7.... The length of the column is 48, and the sum of those values is 36.

I want to distribute the values, which means the 12000 lengths of array is distributed by specific every value. For example, the first value in that column( = 2) gets its own array of 12000*(2/36) (maybe [0.3, 0.6, 0.3]), and the second value ( = 3) gets its array of 12000*(3/36), and its value continues after the first value(something like [0.5, 0.1, 0.9, 0.4]) and so on.

CodePudding user response:

import pandas as pd
import numpy as np


# mock some data
a = np.random.random(12000)
df = pd.DataFrame({'col': np.random.randint(1, 5, 48)})

indices = (len(a) * df.col.to_numpy() / sum(df.col)).cumsum()
indices = np.concatenate(([0], indices)).round().astype(int)
res = []
for s, e in zip(indices[:-1], indices[1:]):
    res.append(a[round(s):round(e)])

# some tests
target_pcts = df.col.to_numpy() / sum(df.col)
realized_pcts = np.array([len(sl) / len(a) for sl in res])
diffs = target_pcts / realized_pcts
assert 0.99 < np.min(diffs) and np.max(diffs) < 1.01
assert all(np.concatenate([*res]) == a)
  • Related