Home > Blockchain >  Random sampling with replacement, increasing groupsize, sum and append in dataframe
Random sampling with replacement, increasing groupsize, sum and append in dataframe

Time:11-15

I have a dataframe which i'd like to repeatedly sample, with replacement. Everytime I sample the df, I would like to increase the size of the sample (n) by one, up to N.

For example:

id value_1 value_2
a 5 10
b 10 30
c 6 8
d 9 12

Would result in something like

id's sum_of_value_1 sum_of_value_2
b 10 30
a, c 11 (5 6) 18 (10 8)
b,a,d 24 (10 5 9) 52 (30 10 12)

I can do this with a for loop but can't figure how how to add the summation and the append into the query:

for n in range(200):
    print(df_groups.sample(n))

CodePudding user response:

you can use pandas.Dataframe.aggregate for summation of all columns and then use pandas.concat to concatinate the new single row dataframe at the end of a new dataframe that you can use as an accumulator of samples.

maybe something like this

acc = df_groups.sample(1).aggregate('sum')
for n in range(2, df_groups.shape[0]):
    pd.concat([acc, df_groups.sample(n).aggregate('sum')])

CodePudding user response:

You can use sample and concat, then groupby.agg:

out = (pd.concat({n: df.sample(n)
                  for n in range(1, len(df))})
         .groupby(level=0)
         .agg({'id': ','.join,
               'value_1': 'sum',
               'value_2': 'sum'
              })
       )

print(out)

Output:

      id  value_1  value_2
1      a        5       10
2    b,c       16       38
3  a,c,d       20       30
  • Related