I have a dataframe which i'd like to repeatedly sample, with replacement. Everytime I sample the df, I would like to increase the size of the sample (n) by one, up to N.
For example:
id | value_1 | value_2 |
---|---|---|
a | 5 | 10 |
b | 10 | 30 |
c | 6 | 8 |
d | 9 | 12 |
Would result in something like
id's | sum_of_value_1 | sum_of_value_2 |
---|---|---|
b | 10 | 30 |
a, c | 11 (5 6) | 18 (10 8) |
b,a,d | 24 (10 5 9) | 52 (30 10 12) |
I can do this with a for loop but can't figure how how to add the summation and the append into the query:
for n in range(200):
print(df_groups.sample(n))
CodePudding user response:
you can use pandas.Dataframe.aggregate for summation of all columns and then use pandas.concat to concatinate the new single row dataframe at the end of a new dataframe that you can use as an accumulator of samples.
maybe something like this
acc = df_groups.sample(1).aggregate('sum')
for n in range(2, df_groups.shape[0]):
pd.concat([acc, df_groups.sample(n).aggregate('sum')])
CodePudding user response:
You can use sample
and concat
, then groupby.agg
:
out = (pd.concat({n: df.sample(n)
for n in range(1, len(df))})
.groupby(level=0)
.agg({'id': ','.join,
'value_1': 'sum',
'value_2': 'sum'
})
)
print(out)
Output:
id value_1 value_2
1 a 5 10
2 b,c 16 38
3 a,c,d 20 30