Pandas take normal distribution sample of column values including min and max-CodePudding

d = {"a": [1, 2, 3, 4, 5], "b": [2, 4, 6, 8, 10]}
df = pd.DataFrame(d)

df.sample(n=3, weights='b', random_state=1)

Returns:

Whereas I am looking for min and max values of b to be included in the normal sample distribution:

Removing the weights parameter doesn't include the minimum value.

CodePudding user response：

You need to extract the min/max, then sample n-2 (or n-1 if min == max) of the rest of the DataFrame:

n = 3

l = [df['b'].idxmin(), df['b'].idxmax()]

out = pd.concat([df.loc[l], df.drop(l).sample(n=n-len(set(l)))]).sort_index()

Output: