I have three dataframes:
df_1 =
Name Description Date Quant Value
0 B100 text123 2021-01-02 3 89.1
1 B101 text567 2021-01-03 2 90.1
2 A200 text820 2021-03-02 1 90.2
3 B101 text567 2021-03-02 6 90.2
4 A500 text758 2021-03-06 1 94.0
5 A500 text758 2021-03-06 2 94.0
6 A500 text758 2021-03-07 2 94.0
7 A200 text820 2021-04-02 1 90.2
8 A999 text583 2021-05-05 2 90.6
9 A998 text834 2021-05-09 1 99.9
df_2 = # the index is funny because I did some manipulations and dropped some NaNs before
Code Name Person
0 900 B100 600
1 901 B100 610
2 959 B101 670
3 979 A999 670
6 944 A200 388
7 921 A500 663
8 988 B300 794
df_3 =
Code StartDate EndDate RealValue
0 900 2000-01-01 2007-12-31 80.9
1 901 2008-01-01 2099-12-31 98.8
2 902 2000-01-01 2020-02-02 98.3
3 903 2000-01-01 2007-01-10 90.6
4 903 2007-01-11 2099-12-31 90.7
5 959 2020-04-09 2099-12-31 98.9
6 979 2000-01-01 2009-02-12 87.6
7 979 2009-02-13 2021-06-13 78.0
8 979 2021-06-15 2099-12-31 89.5
9 944 2020-04-09 2099-12-31 98.9
10 921 2020-04-09 2099-12-31 98.9
I want to do the following:
Start from df_1
, find the corresponding Code(s)
in df_2
for each Name
in df_1
. Now I look into df_1
to see what the Value
and Quant
were at every Date
, and I compare the Value
with every RealValue
from the date range where my Date
is. The difficult part is selecting the right Code and then the right data range. So:
Name Date Code Value RealValue Quant
B100 2021-01-02 901 89.1 98.8 3
B101 2021-01-03 959 90.1 98.9 2
A200 2021-03-02 944 90.2 98.9 1
B101 2021-03-02 959 90.1 98.9 6
A500 2021-03-06 921 94.0 98.9 1
A500 2021-03-06 921 94.0 98.9 2
A500 2021-03-07 921 94.0 98.9 2
A200 2021-05-05 944 90.2 98.9 2
A999 2021-05-05 979 90.6 78.0 2
What I did was merging everything in one table, but since my real dataset is huge and there are many records that do not appear everywhere, I might have lost some data or ended up with NaNs. So I would leave the dataframes as they are here and navigate through them for every record in df_1
. Is that possible?
CodePudding user response:
First, map Name
column from df2
to df3
then merge df1
and df3
on Name
column. Finally, filter out rows where Date
is between StartDate
and EndDate
:
COLS = df1.columns.tolist() ['RealValue']
df3['Name'] = df3['Code'].map(df2.set_index('Code')['Name'])
out = df1.merge(df3, on='Name', how='left') \
.query('Date.between(StartDate, EndDate)')[COLS]
Output:
>>> out
Name Description Date Quant Value RealValue
1 B100 text123 2021-01-02 3 89.1 98.8
2 B101 text567 2021-01-03 2 90.1 98.9
3 A200 text820 2021-03-02 1 90.2 98.9
4 B101 text567 2021-03-02 6 90.2 98.9
5 A500 text758 2021-03-06 1 94.0 98.9
6 A500 text758 2021-03-06 2 94.0 98.9
7 A500 text758 2021-03-07 2 94.0 98.9
8 A200 text820 2021-04-02 1 90.2 98.9
10 A999 text583 2021-05-05 2 90.6 78.0
CodePudding user response:
Example:
start_date = "2019-1-1"
end_date="2019-1-31"
after_start_dates=df['date']>= start_date
before_star_dates=df['date']<= end_date
between_date=after_start_dates & before_end_dates
filter_dates=df.loc[between_dates]
print(filtered_dates)
CodePudding user response:
try this:
try1 = pd.merge(df_1, df_2, on = 'Name', how = 'outer')
try2 = pd.merge(try1, df_3, on = 'Code', how = 'outer')
try2
and then, you try to navigate in try2.
try2[['Name','Date','Code','Value','RealValue','Quant']]