d = {"a": [1, 2, 3, 4, 5], "b": [2, 4, 6, 8, 10]}
df = pd.DataFrame(d)
df.sample(n=3, weights='b', random_state=1)
Returns:
a b
3 4 8
4 5 10
0 1 2
Whereas I am looking for min and max values of b to be included in the normal sample distribution:
a b
3 1 2
4 3 6
0 5 10
Removing the weights parameter doesn't include the minimum value.
CodePudding user response:
You need to extract the min/max, then sample n-2 (or n-1 if min == max) of the rest of the DataFrame:
n = 3
l = [df['b'].idxmin(), df['b'].idxmax()]
out = pd.concat([df.loc[l], df.drop(l).sample(n=n-len(set(l)))]).sort_index()
Output:
a b
0 1 2
3 4 8
4 5 10