Home > Net >  Pandas take normal distribution sample of column values including min and max
Pandas take normal distribution sample of column values including min and max

Time:01-06

d = {"a": [1, 2, 3, 4, 5], "b": [2, 4, 6, 8, 10]}
df = pd.DataFrame(d)

df.sample(n=3, weights='b', random_state=1)

Returns:

    a   b
3   4   8
4   5   10
0   1   2

Whereas I am looking for min and max values of b to be included in the normal sample distribution:

    a   b
3   1   2
4   3   6
0   5   10

Removing the weights parameter doesn't include the minimum value.

CodePudding user response:

You need to extract the min/max, then sample n-2 (or n-1 if min == max) of the rest of the DataFrame:

n = 3

l = [df['b'].idxmin(), df['b'].idxmax()]

out = pd.concat([df.loc[l], df.drop(l).sample(n=n-len(set(l)))]).sort_index()

Output:

   a   b
0  1   2
3  4   8
4  5  10
  • Related