How to randomly choose a string from a list and to iterate it over dataframe based on condition?-CodePudding

Let's say I have the following df -

data={'Location':[1,1,1,2,2,2,2,2,2,3,3,3,3,3,3,4,4,4]}

df = pd.DataFrame(data=data)
df

    Location
0   1
1   1
2   1
3   2
4   2
5   2
6   2
7   2
8   2
9   3
10  3
11  3
12  3
13  3
14  3
15  4
16  4
17  4

In addition, I have the following dict:

Unlock={
1:"A",
2:"B",
3:"C",
4:"D",
5:"E",
6:"F",
7:"G",
8:"H",
9:"I",
10:"J"
}

I'd like to create another column that will randomly select a string from the 'Unlock' dict based on the condition that Location<=Unlock. So for example - for Location 2 some rows will get 'A' and some rows will get 'B'.

I've tried to do the following but with no luck (I'm getting an error) -

df['Name']=np.select(df['Location']<=Unlock,np.random.choice(Unlock,size=len(df))

Thanks in advance for your help!

CodePudding user response：

You can create a range base [1:location 1] then choose a random number from this range and at the end use pandas.Series.map for choosing value base key from dict.

import random

df['intermediate'] = df['Location'].apply(lambda x: random.choice(range(1,x 1)))

df['random_location'] = df['intermediate'].map(Unlock)
print(df)

    Location  intermediate random_location
0          1             1               A
1          1             1               A
2          1             1               A
3          2             2               B
4          2             1               A
5          2             1               A
6          2             2               B
7          2             2               B
8          2             1               A
9          3             2               B
10         3             1               A
11         3             2               B
12         3             3               C
13         3             2               B
14         3             2               B
15         4             2               B
16         4             4               D
17         4             3               C

CodePudding user response：

You can convert your dictionary values to a list, and randomly select the values of a subset of this list: only up to Location number of elements.

With Python versions >= 3.7, dict maintains insertion order. For lower versions - see below.

lst = list(Unlock.values())

df['Name'] = df['Location'].transform(lambda loc: np.random.choice(lst[:loc]))

Example output:

    Location Name
0          1    A
1          1    A
2          1    A
3          2    B
4          2    B
5          2    B
6          2    B
7          2    A
8          2    B
9          3    B
10         3    B
11         3    C
12         3    C
13         3    C
14         3    B
15         4    A
16         4    C
17         4    D

If you are using a lower version of Python, you can Build a list of dictionary values, sorted by key:

lst = [value for key, value in sorted(Unlock.items())]

CodePudding user response：

For a vectorial method, multiply by a random value (0,1] and ceil, then map with your dictionary.

This will give you an equiprobable value between 1 and the current value (included):

import numpy as np
df['random'] = (np.ceil(df['Location'].mul(1-np.random.random(size=len(df))))
                  .astype(int).map(Unlock)
               )

output (reproducible with np.random.seed(0)):

    Location random
0          1      A
1          1      A
2          1      A
3          2      B
4          2      A
5          2      B
6          2      A
7          2      B
8          2      B
9          3      B
10         3      C
11         3      B
12         3      B
13         3      C
14         3      A
15         4      A
16         4      A
17         4      D