Random sampling with replacement, increasing groupsize, sum and append in dataframe-CodePudding

I have a dataframe which i'd like to repeatedly sample, with replacement. Everytime I sample the df, I would like to increase the size of the sample (n) by one, up to N.

For example:

id	value_1	value_2
a	5	10
b	10	30
c	6	8
d	9	12

Would result in something like

id's	sum_of_value_1	sum_of_value_2
b	10	30
a, c	11 (5 6)	18 (10 8)
b,a,d	24 (10 5 9)	52 (30 10 12)

I can do this with a for loop but can't figure how how to add the summation and the append into the query:

for n in range(200):
    print(df_groups.sample(n))

CodePudding user response：

you can use pandas.Dataframe.aggregate for summation of all columns and then use pandas.concat to concatinate the new single row dataframe at the end of a new dataframe that you can use as an accumulator of samples.

maybe something like this

acc = df_groups.sample(1).aggregate('sum')
for n in range(2, df_groups.shape[0]):
    pd.concat([acc, df_groups.sample(n).aggregate('sum')])

CodePudding user response：

You can use sample and concat, then groupby.agg:

out = (pd.concat({n: df.sample(n)
                  for n in range(1, len(df))})
         .groupby(level=0)
         .agg({'id': ','.join,
               'value_1': 'sum',
               'value_2': 'sum'
              })
       )

print(out)

Output:

      id  value_1  value_2
1      a        5       10
2    b,c       16       38
3  a,c,d       20       30