I want to extract random samples from a groupby
data frame object. I'd like to dynamically change the n parameter in the sample(n="dynamic_value")
function by the groupby key value
. I didn't come across a question or answer like this.
d = {'name': ["n1", "n2", "n3", "n4", "n5", "n6"], 'cc': ["US", "UK", "US", "UK", "US", "US"], 'selected_count':[3, 1, 3, 1, 3, 3], 'view':[4, 64, 52, 2, 65, 21]}
pdf_candidate_names = pd.DataFrame(data=d)
The data frame output looks like this:
name cc selected_count view
0 n1 US 3 4
1 n2 UK 1 64
2 n3 US 3 52
3 n4 UK 1 2
4 n5 US 3 65
5 n6 US 3 21
According to the above sample data frame, I'd like to get random rows for the given cc
using sample()
and assign the n
parameter according to the number in selected_count
. So, for example; when the groupby key is US n=3, when it's UK n=1
I tried below but it didn't work since x["selection_count"]
is not an integer but a column.
pdf_selected_names = pd.concat([
pdf_candidate_names.groupby("cc").apply(lambda x: x.sample(n=x["selection_count"], weights='views')),
pdf_candidate_names.groupby("cc").apply(lambda x: x.sample(n=x["selection_count"], weights='views'))
]).sample(frac=1.0).reset_index(drop=True)
CodePudding user response:
As groupby.sample
is only using a fixed n
parameter, you can use sample
within a groupby
:
out = (df.groupby('cc', group_keys=False)
.apply(lambda g: g.sample(g['selected_count'].iloc[0]))
)
output:
name cc selected_count view
3 n4 UK 1 2
2 n3 US 3 52
5 n6 US 3 21
4 n5 US 3 65