Let's say I have the following df -
data={'Location':[1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4]}
df = pd.DataFrame(data=data)
df
Location
0 1
1 1
2 1
3 2
4 2
5 2
6 2
7 2
8 2
9 3
10 3
11 3
12 3
13 3
14 3
15 4
16 4
17 4
In addition, I have the following dict:
Unlock={
1:"A",
2:"B",
3:"C",
4:"D",
5:"E",
6:"F",
7:"G",
8:"H",
9:"I",
10:"J"
}
I'd like to create another column that will randomly select a string from the 'Unlock' dict based on the condition that Location<=Unlock. So for example - for Location 2 some rows will get 'A' and some rows will get 'B'.
I've tried to do the following but with no luck (I'm getting an error) -
df['Name']=np.select(df['Location']<=Unlock,np.random.choice(Unlock,size=len(df))
Thanks in advance for your help!
CodePudding user response:
You can create a range base [1:location 1]
then choose a random number from this range and at the end use pandas.Series.map
for choosing value base key
from dict
.
import random
df['intermediate'] = df['Location'].apply(lambda x: random.choice(range(1,x 1)))
df['random_location'] = df['intermediate'].map(Unlock)
print(df)
Location intermediate random_location
0 1 1 A
1 1 1 A
2 1 1 A
3 2 2 B
4 2 1 A
5 2 1 A
6 2 2 B
7 2 2 B
8 2 1 A
9 3 2 B
10 3 1 A
11 3 2 B
12 3 3 C
13 3 2 B
14 3 2 B
15 4 2 B
16 4 4 D
17 4 3 C
CodePudding user response:
You can convert your dictionary values to a list, and randomly select the values of a subset of this list: only up to Location
number of elements.
With Python versions >= 3.7, dict
maintains insertion order. For lower versions - see below.
lst = list(Unlock.values())
df['Name'] = df['Location'].transform(lambda loc: np.random.choice(lst[:loc]))
Example output:
Location Name
0 1 A
1 1 A
2 1 A
3 2 B
4 2 B
5 2 B
6 2 B
7 2 A
8 2 B
9 3 B
10 3 B
11 3 C
12 3 C
13 3 C
14 3 B
15 4 A
16 4 C
17 4 D
If you are using a lower version of Python, you can Build a list of dictionary values, sorted by key:
lst = [value for key, value in sorted(Unlock.items())]
CodePudding user response:
For a vectorial method, multiply by a random value (0,1] and ceil
, then map
with your dictionary.
This will give you an equiprobable value between 1 and the current value (included):
import numpy as np
df['random'] = (np.ceil(df['Location'].mul(1-np.random.random(size=len(df))))
.astype(int).map(Unlock)
)
output (reproducible with np.random.seed(0)
):
Location random
0 1 A
1 1 A
2 1 A
3 2 B
4 2 A
5 2 B
6 2 A
7 2 B
8 2 B
9 3 B
10 3 C
11 3 B
12 3 B
13 3 C
14 3 A
15 4 A
16 4 A
17 4 D