Good morning,
I'm working on this code :
df1 = pd.DataFrame()
df1['Date'] = ["29/07/2021", "29/07/2021", "29/07/2021", "29/07/2021", "30/07/2021", "30/07/2021", "30/07/2021", "30/07/2021", "31/07/2021", "31/07/2021", "01/08/2021", "01/08/2021", "02/08/2021", "02/08/2021"]
df1['Time'] = ["06:48:00", "06:48:00", "06:56:00", "06:56:00", "07:14:00", "07:14:00", "07:40:00", "07:40:00", "08:42:00", "08:42:00", "08:52:00", "08:52:00", "09:07:00", "09:07:00"]
df1["Column1"] = ['NaN', 'NaN', 0.038807581, 0.018807581, 0.025931434, 0.025163517, 0.026561283, 0.027743659, 0.028854, 0.000383506, 0.000543031, 0.000342, 'NaN', 'NaN']
df1["Column2"] = [0.000270475, 0.000313769, 'NaN', 'NaN', 0.000483506, 0.000643031, 0.000533131, 0.000543031, 0.000342, 0.056263517, 0.042163517, 0.035163517, 0.025163517, 0.026363517]
df2 = pd.DataFrame()
df2['Date'] = ["29/07/2021", "29/07/2021", "29/07/2021", "29/07/2021", "30/07/2021", "30/07/2021", "30/07/2021", "30/07/2021", "31/07/2021", "31/07/2021", "01/08/2021", "01/08/2021", "02/08/2021", "02/08/2021"]
df2['Time'] = ["06:48:00", "06:48:00", "06:56:00", "06:56:00", "07:14:00", "07:14:00", "07:40:00", "07:40:00", "08:42:00", "08:42:00", "08:52:00", "08:52:00", "09:07:00", "09:07:00"]
df2["Column1"] = [0.041807581, 0.019607581, 'NaN', 'NaN', 0.025931434, 0.025163517, 0.026561283, 0.027743659, 0.028854, 0.000383506, 'NaN', 'NaN', 0.000313769, 0.000413769]
df2["Column2"] = [0.000270475, 0.000313769, 0.000383506, 0.000583506, 'NaN', 'NaN', 0.000533131, 0.000543031, 'NaN', 'NaN', 0.042163517, 0.035163517, 0.025163517, 0.026363517]
diff_df = pd.concat([df1, df2]).drop_duplicates().reset_index(drop=True)
The output is like this :
------------ ---------- ------------- -------------
| Date | Time | Column1 | Column2 |
------------ ---------- ------------- -------------
| 29/07/2021 | 06:48:00 | NaN | 0,000270475 |
| 29/07/2021 | 06:48:00 | NaN | 0,000313769 |
| 29/07/2021 | 06:56:00 | 0,038807581 | NaN |
| 29/07/2021 | 06:56:00 | 0,018807581 | NaN |
| 30/07/2021 | 07:14:00 | 0,025931434 | 0,000483506 |
| 30/07/2021 | 07:14:00 | 0,025163517 | 0,000643031 |
| 30/07/2021 | 07:40:00 | 0,026561283 | 0,000533131 |
| 30/07/2021 | 07:40:00 | 0,027743659 | 0,000543031 |
| 31/07/2021 | 08:42:00 | 0,028854 | 0,000342 |
| 31/07/2021 | 08:42:00 | 0,000383506 | 0,056263517 |
| 01/08/2021 | 08:52:00 | 0,000543031 | 0,042163517 |
| 01/08/2021 | 08:52:00 | 0,000342 | 0,035163517 |
| 02/08/2021 | 09:07:00 | NaN | 0,025163517 |
| 02/08/2021 | 09:07:00 | NaN | 0,026363517 |
| 29/07/2021 | 06:48:00 | 0,041807581 | 0,000270475 |
| 29/07/2021 | 06:48:00 | 0,019607581 | 0,000313769 |
| 29/07/2021 | 06:56:00 | NaN | 0,000383506 |
| 29/07/2021 | 06:56:00 | NaN | 0,000583506 |
| 30/07/2021 | 07:14:00 | 0,025931434 | NaN |
| 30/07/2021 | 07:14:00 | 0,025163517 | NaN |
| 31/07/2021 | 08:42:00 | 0,028854 | NaN |
| 31/07/2021 | 08:42:00 | 0,000383506 | NaN |
| 01/08/2021 | 08:52:00 | NaN | 0,042163517 |
| 01/08/2021 | 08:52:00 | NaN | 0,035163517 |
| 02/08/2021 | 09:07:00 | 0,000313769 | 0,025163517 |
| 02/08/2021 | 09:07:00 | 0,000413769 | 0,026363517 |
------------ ---------- ------------- -------------
What i need is that if there is NaN value for one of the dataframes, it will consider to use the other one value considering the fact that they have the same date and time. which is something like this :
------------ ---------- ------------- -------------
| Date | Time | Column1 | Column2 |
------------ ---------- ------------- -------------
| 29/07/2021 | 06:48:00 | 0,041807581 | 0,000270475 |
| 29/07/2021 | 06:48:00 | 0,019607581 | 0,000313769 |
| 29/07/2021 | 06:56:00 | 0,038807581 | 0,000383506 |
| 29/07/2021 | 06:56:00 | 0,018807581 | 0,000583506 |
| 30/07/2021 | 07:14:00 | 0,025931434 | 0,000483506 |
| 30/07/2021 | 07:14:00 | 0,025163517 | 0,000643031 |
| 30/07/2021 | 07:40:00 | 0,026561283 | 0,000533131 |
| 30/07/2021 | 07:40:00 | 0,027743659 | 0,000543031 |
| 31/07/2021 | 08:42:00 | 0,028854 | 0,000342 |
| 31/07/2021 | 08:42:00 | 0,000383506 | 0,056263517 |
| 01/08/2021 | 08:52:00 | 0,000543031 | 0,042163517 |
| 01/08/2021 | 08:52:00 | 0,000342 | 0,035163517 |
| 02/08/2021 | 09:07:00 | 0,000313769 | 0,025163517 |
| 02/08/2021 | 09:07:00 | 0,000413769 | 0,026363517 |
------------ ---------- ------------- -------------
Thank you for your time and have a great day !
EDIT######## Using these lines :
diff_df.Column1 = diff_df.Column1.fillna(diff_df.Column2)
diff_df.Column2 = diff_df.Column2.fillna(diff_df.Column1)
Will give me this output which is not what i needed :
------------ ---------- ------------- -------------
| Date | Time | Column1 | Column2 |
------------ ---------- ------------- -------------
| 29/07/2021 | 06:48:00 | 0,000270475 | 0,000270475 |
| 29/07/2021 | 06:48:00 | 0,000313769 | 0,000313769 |
| 29/07/2021 | 06:56:00 | 0,038807581 | 0,038807581 |
| 29/07/2021 | 06:56:00 | 0,018807581 | 0,018807581 |
| 30/07/2021 | 07:14:00 | 0,025931434 | 0,000483506 |
| 30/07/2021 | 07:14:00 | 0,025163517 | 0,000643031 |
| 30/07/2021 | 07:40:00 | 0,026561283 | 0,000533131 |
| 30/07/2021 | 07:40:00 | 0,027743659 | 0,000543031 |
| 31/07/2021 | 08:42:00 | 0,028854 | 0,000342 |
| 31/07/2021 | 08:42:00 | 0,000383506 | 0,056263517 |
| 01/08/2021 | 08:52:00 | 0,000543031 | 0,042163517 |
| 01/08/2021 | 08:52:00 | 0,000342 | 0,035163517 |
| 02/08/2021 | 09:07:00 | 0,025163517 | 0,025163517 |
| 02/08/2021 | 09:07:00 | 0,026363517 | 0,026363517 |
| 29/07/2021 | 06:48:00 | 0,041807581 | 0,000270475 |
| 29/07/2021 | 06:48:00 | 0,019607581 | 0,000313769 |
| 29/07/2021 | 06:56:00 | 0,000383506 | 0,000383506 |
| 29/07/2021 | 06:56:00 | 0,000583506 | 0,000583506 |
| 30/07/2021 | 07:14:00 | 0,025931434 | 0,025931434 |
| 30/07/2021 | 07:14:00 | 0,025163517 | 0,025163517 |
| 31/07/2021 | 08:42:00 | 0,028854 | 0,028854 |
| 31/07/2021 | 08:42:00 | 0,000383506 | 0,000383506 |
| 01/08/2021 | 08:52:00 | 0,042163517 | 0,042163517 |
| 01/08/2021 | 08:52:00 | 0,035163517 | 0,035163517 |
| 02/08/2021 | 09:07:00 | 0,000313769 | 0,025163517 |
| 02/08/2021 | 09:07:00 | 0,000413769 | 0,026363517 |
------------ ---------- ------------- -------------
CodePudding user response:
Use DataFrame.fillna
with convert values to index if possible different datetimes:
df1 = df1.replace('NaN', np.nan)
df2 = df2.replace('NaN', np.nan)
df = df1.set_index(['Date','Time']).fillna(df2.set_index(['Date','Time'])).reset_index()
print (df)
Date Time Column1 Column2
0 29/07/2021 06:48:00 0.041808 0.000270
1 29/07/2021 06:48:00 0.019608 0.000314
2 29/07/2021 06:56:00 0.038808 0.000384
3 29/07/2021 06:56:00 0.018808 0.000584
4 30/07/2021 07:14:00 0.025931 0.000484
5 30/07/2021 07:14:00 0.025164 0.000643
6 30/07/2021 07:40:00 0.026561 0.000533
7 30/07/2021 07:40:00 0.027744 0.000543
8 31/07/2021 08:42:00 0.028854 0.000342
9 31/07/2021 08:42:00 0.000384 0.056264
10 01/08/2021 08:52:00 0.000543 0.042164
11 01/08/2021 08:52:00 0.000342 0.035164
12 02/08/2021 09:07:00 0.000314 0.025164
13 02/08/2021 09:07:00 0.000414 0.026364
If always same index and same rows between both DataFrames:
df1 = df1.replace('NaN', np.nan)
df2 = df2.replace('NaN', np.nan)
df = df1.fillna(df2)
print (df)
Date Time Column1 Column2
0 29/07/2021 06:48:00 0.041808 0.000270
1 29/07/2021 06:48:00 0.019608 0.000314
2 29/07/2021 06:56:00 0.038808 0.000384
3 29/07/2021 06:56:00 0.018808 0.000584
4 30/07/2021 07:14:00 0.025931 0.000484
5 30/07/2021 07:14:00 0.025164 0.000643
6 30/07/2021 07:40:00 0.026561 0.000533
7 30/07/2021 07:40:00 0.027744 0.000543
8 31/07/2021 08:42:00 0.028854 0.000342
9 31/07/2021 08:42:00 0.000384 0.056264
10 01/08/2021 08:52:00 0.000543 0.042164
11 01/08/2021 08:52:00 0.000342 0.035164
12 02/08/2021 09:07:00 0.000314 0.025164
13 02/08/2021 09:07:00 0.000414 0.026364