Let's say I have this dataset:
id col1 col2
AB1 10 2022-01-05
AB1 20 2022-05-10
CC2 3 2022-03-01
CC2 5 2022-04-01
DD1 100 2022-01-01
And I want a new column that receives a random value if col1 is bigger than 5 and col2 is higher than 2022-03-10.
I tried using np.where, just like this:
df['new_column'] = np.where(df['col1'] > 5 & df['col2' > '2022-03-01',
np.random.random(),
np.nan)
However the values generated are all the same. It generated only a single value.
I tried a couple of other things, nothing worked. Any ideas?
CodePudding user response:
Give this a shot:
def add_column(df):
"""
df: pandas dataframe
"""
df['col3'] = np.random.randint(0, 100, len(df))
df['col3'] = df.apply(lambda x: x['col3'] if x['col1'] > 5 and x['col2'] > '2022-03-10' else np.nan, axis=1)
return df
CodePudding user response:
no need to use np.where
. just use pandas.loc
:
df.loc[(df['col1'] > 5) & (df['col2' > '2022-03-01'), 'new_column'] = np.random.random()
CodePudding user response:
You can try this:
m = (df['col1'] >5) & (df['col2'] > pd.Timestamp('2022-03-01'))`
df.loc[m, 'new'] = np.random.random(sum(m))