I've two dfs. I wanted to assign df1.date = df2.start_date if df1.date <= df2.end_date.
df1 = {"date": ['2020-12-23 18:20:37', '2021-08-20 12:17:41.487'], "result": [ 'pass', 'fail']}
df2 = {"start_date": ['2021-08-19 12:17:41.487','2021-08-12 12:17:41.487', '2021-08-26 12:17:41.487'],
"end_date": ['2021-08-26 12:17:41.487', '2021-08-19 12:17:41.487', '2021-09-02 12:17:41.487']}
I just give two rows while in real I'm doing this query on 100,000 rows. How do I achieve this?
CodePudding user response:
We can make use of numpy
's where.
# df1.date = df2.start_date if df1.date <= df2.end_date
import numpy as np
df1.date = np.where(df1.date <= df2.end_date, df2.start_date, df1.date)
New df1
date result
0 2015-01-01 pass
1 2015-02-01 pass
2 2015-03-01 fail
3 2015-04-01 fail
4 2015-05-01 pass
Data used
df1 = {"date": [ '2014-12-29', '2015-01-26', '2015-02-26', '2015-03-08', '2015-04-10' ],
"result": [ 'pass', 'pass', 'fail', 'fail', 'pass' ]}
df2 = {"start_date": [ '2015-01-01', '2015-02-01', '2015-03-01', '2015-04-01', '2015-05-01' ],
"end_date": [ '2015-01-25', '2015-02-20', '2015-03-15', '2015-04-24', '2015-05-23' ]}
df1 = pd.DataFrame(data = df1)
df2 = pd.DataFrame(data = df2)
CodePudding user response:
assuming im understanding your question correctly and that both your dataframes line up with each other. you could loop through each row and do a compare across to the other df. however if you have thousands of records this could take some time.
df1 = pd.DataFrame({"date": [datetime.date(2014, 12, 29), datetime.date(2015, 1, 26), datetime.date(2015, 2, 26), datetime.date(2015, 3, 8), datetime.date(2015, 4, 10)],
"result": ['pass', 'fail', 'fail', 'pass', 'pass']})
df2 = pd.DataFrame({'start_date': [datetime.date(2015, 1, 1), datetime.date(2015, 2, 1), datetime.date(2015, 3, 1), datetime.date(2015, 4, 1), datetime.date(2015, 5, 1)],
'end_date': [datetime.date(2015, 1, 25), datetime.date(2015, 2, 20), datetime.date(2015, 3, 15), datetime.date(2015, 4, 24), datetime.date(2015, 5, 23)]})
for i in range(len(df1)):
if (df1.date[i] <= df2.end_date[i]):
df1.date[i] = df2.start_date[i]
but again this is assuming that both data frames have the same length and its a direct compare across