I'm trying to select a random subset of a pd.DataFrame
and set a value to a certain column. Here's a toy example:
import pandas as pd
df = pd.DataFrame({
'species': ['platypus', 'monkey', 'possum'],
'name': ['mike', 'paul', 'doug'],
'group': ['control', 'control', 'control']
})
species name group
0 platypus mike control
1 monkey paul control
2 possum doug control
I tried the follow, to randomly assign two people to the experimental group, but it won't work:
df.sample(2)['group'] = 'experimental'
This won't work either, in fact:
df.iloc[[0, 1]]['group'] = 'experimental'
CodePudding user response:
You can use df.sample(2).index
to get the indexes in your df of the randomly sampled data, you can then pass this into .loc
to set the group column for those indexes to be 'experimental' as below:
df.loc[df.sample(2).index, 'group'] = 'experimental'
Output:
species name group
0 platypus mike experimental
1 monkey paul experimental
2 possum doug control
CodePudding user response:
Here is something that picks random indexes, random number of times.
import pandas as pd
import random
def custom_randomizer(df, col):
total_randoms = random.choice(df.index) 1
for _ in range(total_randoms):
df.loc[random.choice(df.index), col] = 'expiremental'
return df
df = pd.DataFrame({
'species': ['platypus', 'monkey', 'possum'],
'name': ['mike', 'paul', 'doug'],
'group': ['control', 'control', 'control']
})
df = custom_randomizer(df, 'group')
print(df)