Home > Mobile >  Update dataframe value with other dataframe value if condition met?
Update dataframe value with other dataframe value if condition met?

Time:07-23

I've two dfs. I wanted to assign df1.date = df2.start_date if df1.date <= df2.end_date.

df1 = {"date": ['2020-12-23 18:20:37', '2021-08-20 12:17:41.487'], "result": [ 'pass', 'fail']}

df2 = {"start_date": ['2021-08-19 12:17:41.487','2021-08-12 12:17:41.487', '2021-08-26 12:17:41.487'],
"end_date": ['2021-08-26 12:17:41.487', '2021-08-19 12:17:41.487', '2021-09-02 12:17:41.487']}

I just give two rows while in real I'm doing this query on 100,000 rows. How do I achieve this?

CodePudding user response:

We can make use of numpy's where.

# df1.date = df2.start_date if df1.date <= df2.end_date
import numpy as np 
df1.date = np.where(df1.date <= df2.end_date, df2.start_date, df1.date)

New df1

    date    result
0   2015-01-01  pass
1   2015-02-01  pass
2   2015-03-01  fail
3   2015-04-01  fail
4   2015-05-01  pass

Data used

df1 = {"date": [ '2014-12-29', '2015-01-26', '2015-02-26', '2015-03-08', '2015-04-10' ], 
"result": [ 'pass', 'pass', 'fail', 'fail', 'pass' ]}

df2 = {"start_date": [ '2015-01-01', '2015-02-01', '2015-03-01', '2015-04-01', '2015-05-01' ],
"end_date": [ '2015-01-25', '2015-02-20', '2015-03-15', '2015-04-24', '2015-05-23' ]}

df1 = pd.DataFrame(data = df1)
df2 = pd.DataFrame(data = df2)

CodePudding user response:

assuming im understanding your question correctly and that both your dataframes line up with each other. you could loop through each row and do a compare across to the other df. however if you have thousands of records this could take some time.

    df1 = pd.DataFrame({"date": [datetime.date(2014, 12, 29), datetime.date(2015, 1, 26), datetime.date(2015, 2, 26), datetime.date(2015, 3, 8), datetime.date(2015, 4, 10)], 
"result": ['pass', 'fail', 'fail', 'pass', 'pass']})

df2 = pd.DataFrame({'start_date': [datetime.date(2015, 1, 1), datetime.date(2015, 2, 1), datetime.date(2015, 3, 1), datetime.date(2015, 4, 1), datetime.date(2015, 5, 1)], 
                    'end_date': [datetime.date(2015, 1, 25), datetime.date(2015, 2, 20), datetime.date(2015, 3, 15), datetime.date(2015, 4, 24), datetime.date(2015, 5, 23)]})

for i in range(len(df1)):
    if (df1.date[i] <= df2.end_date[i]):
        df1.date[i] = df2.start_date[i]

but again this is assuming that both data frames have the same length and its a direct compare across

  • Related