How can I insert random values where some condition is met?-CodePudding

Let's say I have this dataset:

id        col1         col2
AB1       10           2022-01-05
AB1       20           2022-05-10
CC2       3            2022-03-01
CC2       5            2022-04-01
DD1       100          2022-01-01

And I want a new column that receives a random value if col1 is bigger than 5 and col2 is higher than 2022-03-10.

I tried using np.where, just like this:

df['new_column'] = np.where(df['col1'] > 5 & df['col2' > '2022-03-01',
                            np.random.random(),
                            np.nan)

However the values generated are all the same. It generated only a single value.

I tried a couple of other things, nothing worked. Any ideas?

CodePudding user response：

Give this a shot:

def add_column(df):
    """
    df: pandas dataframe
    """
    df['col3'] = np.random.randint(0, 100, len(df))
    df['col3'] = df.apply(lambda x: x['col3'] if x['col1'] > 5 and x['col2'] > '2022-03-10' else np.nan, axis=1)
    return df

CodePudding user response：

no need to use np.where. just use pandas.loc:

df.loc[(df['col1'] > 5) & (df['col2' > '2022-03-01'), 'new_column'] = np.random.random()

CodePudding user response：

You can try this:

m = (df['col1'] >5) & (df['col2'] > pd.Timestamp('2022-03-01'))`
df.loc[m, 'new'] = np.random.random(sum(m))