Home > OS >  Get random sample with dynamic n parameter from a grouped dataframe using group key
Get random sample with dynamic n parameter from a grouped dataframe using group key

Time:10-13

I want to extract random samples from a groupby data frame object. I'd like to dynamically change the n parameter in the sample(n="dynamic_value") function by the groupby key value. I didn't come across a question or answer like this.

d = {'name': ["n1", "n2", "n3", "n4", "n5", "n6"], 'cc': ["US", "UK", "US", "UK", "US", "US"], 'selected_count':[3, 1, 3, 1, 3, 3], 'view':[4, 64, 52, 2, 65, 21]}
pdf_candidate_names = pd.DataFrame(data=d)

The data frame output looks like this:

 name  cc  selected_count  view
0   n1  US               3     4
1   n2  UK               1    64
2   n3  US               3    52
3   n4  UK               1     2
4   n5  US               3    65
5   n6  US               3    21

According to the above sample data frame, I'd like to get random rows for the given cc using sample() and assign the n parameter according to the number in selected_count. So, for example; when the groupby key is US n=3, when it's UK n=1

I tried below but it didn't work since x["selection_count"] is not an integer but a column.

pdf_selected_names = pd.concat([
    pdf_candidate_names.groupby("cc").apply(lambda x: x.sample(n=x["selection_count"], weights='views')),
    pdf_candidate_names.groupby("cc").apply(lambda x: x.sample(n=x["selection_count"], weights='views'))
]).sample(frac=1.0).reset_index(drop=True)

CodePudding user response:

As groupby.sample is only using a fixed n parameter, you can use sample within a groupby:

out = (df.groupby('cc', group_keys=False)
         .apply(lambda g: g.sample(g['selected_count'].iloc[0]))
       )

output:

  name  cc  selected_count  view
3   n4  UK               1     2
2   n3  US               3    52
5   n6  US               3    21
4   n5  US               3    65
  • Related