I have a DataFrame df:
I want: if Date1> Date2, then id1 else id2
Output:
How to complete this without using loops? Any hints pls
CodePudding user response:
You can use numpy.where
:
import numpy as np
df['output'] = np.where(df['Date1'].gt(df['Date2']), df['Id1'], df['Id2'])
CodePudding user response:
Steps -
- Convert the
date1
anddate2
to datetime object using thepd.to_datetime()
df['date1'] = pd.to_datetime( df.date1)
df['date2'] = pd.to_datetime( df.date2)
- Initialize an empty numpy array and then iterate through the rows using df.iterrows() getting proper values into the numpy array and then add a new column.
import numpy as np
n = np.empty(len(df))
for row in df.iterrows():
# row[1] has the row data
# row[0] has the index
if row[1]['date1'] > row[1]['date2']:
n[row[0]] = 1
else:
n[row[0]] = 0
df['output'] = n
There are short hands for this code as well.