How do I assign a value to a random subset of a Pandas DataFrame?-CodePudding

I'm trying to select a random subset of a pd.DataFrame and set a value to a certain column. Here's a toy example:

import pandas as pd

df = pd.DataFrame({
    'species': ['platypus', 'monkey', 'possum'],
    'name': ['mike', 'paul', 'doug'],
    'group': ['control', 'control', 'control']
})

    species  name    group
0  platypus  mike  control
1    monkey  paul  control
2    possum  doug  control

I tried the follow, to randomly assign two people to the experimental group, but it won't work:

df.sample(2)['group'] = 'experimental'

This won't work either, in fact:

df.iloc[[0, 1]]['group'] = 'experimental'

CodePudding user response：

You can use df.sample(2).index to get the indexes in your df of the randomly sampled data, you can then pass this into .loc to set the group column for those indexes to be 'experimental' as below:

df.loc[df.sample(2).index, 'group'] = 'experimental'

Output:

    species  name         group
0  platypus  mike  experimental
1    monkey  paul  experimental
2    possum  doug       control

CodePudding user response：

Here is something that picks random indexes, random number of times.

import pandas as pd
import random

def custom_randomizer(df, col):
    
    total_randoms = random.choice(df.index)   1
    for _ in range(total_randoms):
        df.loc[random.choice(df.index), col] = 'expiremental'
    
    return df
    
df = pd.DataFrame({
    'species': ['platypus', 'monkey', 'possum'],
    'name': ['mike', 'paul', 'doug'],
    'group': ['control', 'control', 'control']
})

df = custom_randomizer(df, 'group')

print(df)