I am trying to generate data randomly. Below you can see my example
import numpy as np
import pandas as pd
import random
df_categories = pd.DataFrame(np.random.choice(a=["0", "1"], size=100, p=[0.7, 0.3]),
columns = ['number'])
df_categories
This code works well and generates data. Now I want to change this code in order to generate integer data in some range, instead "1"
to generate data in a range from 1 to 100
.
df_categories = pd.DataFrame(np.random.choice(a=[0, random.randint(0, 100)], size=100, p=[0.7, 0.3]),
columns = ['number'])
df_categories
I tried the code above but this code generates only one value in 30% of the fields. So can anybody help me how to solve this problem and generate different numbers instead of only one number?
CodePudding user response:
Why don't you use numpy.random.randint
and a mask?
# random integers
a = np.random.randint(0, 100, size=100)
# random mask for ~70% of values
m = np.random.choice([True, False], size=100, p=[0.7, 0.3])
df_categories = pd.DataFrame(np.where(m, 0, a),
columns=['number'])
df_categories
CodePudding user response:
You need this:
import pandas as pd
import numpy as np
import random
my_range=100
df_categories = pd.DataFrame(np.random.choice(a=[0] list(np.arange(0, my_range)), size=100, p=[0.7] [(0.3/my_range )]*my_range),
columns = ['number'])
df_categories
Output:
number
0 0
1 8
2 40
3 73
4 0
... ...
95 75
96 94
97 4
98 0
99 25
100 rows × 1 columns
CodePudding user response:
You can do the following:
n = 100
prob_0 = 0.7
a = [0] list(np.arange(0, n)) # [0, 0, 1, 2, 3, ..., 99]
p = [prob_0] [(1 - prob_0)/n] * n # [0.7, 0.003, ..., 0.003]
df_categories = pd.DataFrame(np.random.choice(a=a, size=n, p=p), columns=['number'])
Output (for example):
number
0 0
1 32
2 0
3 39
4 0
.. ...
95 0
96 63
97 55
98 0
99 0
[100 rows x 1 columns]