I am trying to manipulate excel sheet data to automate a process on excel(not a developer), i do have 2 dataframes:
one looking like the below(only difference is more columns)
Date Val1 Val2
0 2020-09-29 13:22:57 5.34 3.2
1 2020-09-29 13:23:12 4.5 Nan
2 2020-09-29 13:23:44 Nan 56.4
3 2020-09-29 13:24:01 24 0.3
as we notice for the one above the index are in order, and all the fields have dates populated but do not necessarily have all the other columns populated.
the second dataframe have the following characteristics, equal or more rows without any additional dates, also no duplicate dates but the additional rows are empty ones(NaT for Date and Nan for all other columns), the index of df2 are also not in order due to other processes:
Date Val1 Val2
0 2020-09-29 13:22:57 Nan Nan
5 Nat Nan Nan
1 2020-09-29 13:23:12 4.5 Nan
4 NaT Nan Nan
6 Nat Nan Nan
2 2020-09-29 13:23:44 Nan Nan
3 2020-09-29 13:24:01 24 0.3
what i basically need is to check for the matching dates, and if a date in df2 matching a date in df1 to populate the same exact value for the entire row of that date in df2 as in df without changing the location of the empty rows in df2 or adding columns:
expected output:
Date Val1 Val2
0 2020-09-29 13:22:57 5.34 3.2
5 Nat Nan Nan
1 2020-09-29 13:23:12 4.5 Nan
4 NaT Nan Nan
6 Nat Nan Nan
2 2020-09-29 13:23:44 Nan 56.4
3 2020-09-29 13:24:01 24 0.3
I've tried multiple approaches including:
data_frames = [df,df_2]
df_merged = reduce(lambda left, right: pd.merge(left, right, on=['Date'],
how='outer'), data_frames)
print(df_merged)
and also:
df_f = pd.merge(df, df_2, on='Date', how='outer').fillna(method='ffill')
also tried changing the how
to inner
, left
, right
and so on but did not get the result i want i'd just get combined columns.
Edit:
df1 = pd.DataFrame({'Date': ['2020-09-29 13:22:57', '2020-09-29 13:23:12', '2020-09-29 13:23:44', '2020-09-29 13:24:01'],
'Val1': [5.34, 4.5, np.nan, 24],
'Val2': [3.2, np.nan, 56.4, 0.3]})
df2 = pd.DataFrame({'Date': ['2020-09-29 13:22:57', np.nan, '2020-09-29 13:23:12', np.nan, np.nan, '2020-09-29 13:23:44', '2020-09-29 13:24:01'],
'Val1': [5.34, np.nan, 4.5, np.nan, np.nan, np.nan, 24],
'Val2': [3.2, np.nan, np.nan, np.nan, np.nan, 56.4, 0.3]},
index=[0,5,1,4,6,2,3])
f_f1 = df1.merge(df2["Date"], on="Date", how="right").set_index(df2.index)
print(f_f1)
CodePudding user response:
IIUC, try:
#convert to datetime if needed
df1["Date"] = pd.to_datetime(df1["Date"])
df2["Date"] = pd.to_datetime(df2["Date"])
f_f1 = df1.merge(df2["Date"], on="Date", how="right").set_index(df2.index)
>>> df_f
Date Val1 Val2
0 2020-09-29 13:22:57 5.34 3.2
5 NaN NaN NaN
1 2020-09-29 13:23:12 4.50 NaN
4 NaN NaN NaN
6 NaN NaN NaN
2 2020-09-29 13:23:44 NaN 56.4
3 2020-09-29 13:24:01 24.00 0.3
Inputs:
df1 = pd.DataFrame({'Date': ['2020-09-29 13:22:57', '2020-09-29 13:23:12', '2020-09-29 13:23:44', '2020-09-29 13:24:01'],
'Val1': [5.34, 4.5, np.nan, 24],
'Val2': [3.2, np.nan, 56.4, 0.3]})
df2 = pd.DataFrame({'Date': ['2020-09-29 13:22:57', np.nan, '2020-09-29 13:23:12', np.nan, np.nan, '2020-09-29 13:23:44', '2020-09-29 13:24:01'],
'Val1': [5.34, np.nan, 4.5, np.nan, np.nan, np.nan, 24],
'Val2': [3.2, np.nan, np.nan, np.nan, np.nan, 56.4, 0.3]},
index=[0,5,1,4,6,2,3])