Home > Blockchain >  Finding the date range of a date in different dataframes
Finding the date range of a date in different dataframes

Time:02-16

I have three dataframes:

df_1 = 

     Name    Description           Date         Quant      Value 
 0   B100        text123     2021-01-02             3       89.1
 1   B101        text567     2021-01-03             2       90.1
 2   A200        text820     2021-03-02             1       90.2
 3   B101        text567     2021-03-02             6       90.2
 4   A500        text758     2021-03-06             1       94.0
 5   A500        text758     2021-03-06             2       94.0
 6   A500        text758     2021-03-07             2       94.0
 7   A200        text820     2021-04-02             1       90.2
 8   A999        text583     2021-05-05             2       90.6
 9   A998        text834     2021-05-09             1       99.9

df_2 = # the index is funny because I did some manipulations and dropped some NaNs before
       Code     Name     Person
  0    900      B100        600
  1    901      B100        610
  2    959      B101        670
  3    979      A999        670
  6    944      A200        388
  7    921      A500        663
  8    988      B300        794

df_3 =

     Code        StartDate       EndDate      RealValue
0     900       2000-01-01    2007-12-31           80.9
1     901       2008-01-01    2099-12-31           98.8
2     902       2000-01-01    2020-02-02           98.3
3     903       2000-01-01    2007-01-10           90.6
4     903       2007-01-11    2099-12-31           90.7
5     959       2020-04-09    2099-12-31           98.9
6     979       2000-01-01    2009-02-12           87.6
7     979       2009-02-13    2021-06-13           78.0
8     979       2021-06-15    2099-12-31           89.5
9     944       2020-04-09    2099-12-31           98.9
10    921       2020-04-09    2099-12-31           98.9

I want to do the following:

Start from df_1, find the corresponding Code(s) in df_2 for each Name in df_1. Now I look into df_1 to see what the Value and Quant were at every Date, and I compare the Value with every RealValue from the date range where my Date is. The difficult part is selecting the right Code and then the right data range. So:

Name         Date   Code     Value      RealValue     Quant
B100   2021-01-02    901      89.1           98.8         3
B101   2021-01-03    959      90.1           98.9         2
A200   2021-03-02    944      90.2           98.9         1
B101   2021-03-02    959      90.1           98.9         6
A500   2021-03-06    921      94.0           98.9         1
A500   2021-03-06    921      94.0           98.9         2
A500   2021-03-07    921      94.0           98.9         2
A200   2021-05-05    944      90.2           98.9         2
A999   2021-05-05    979      90.6           78.0         2

What I did was merging everything in one table, but since my real dataset is huge and there are many records that do not appear everywhere, I might have lost some data or ended up with NaNs. So I would leave the dataframes as they are here and navigate through them for every record in df_1. Is that possible?

CodePudding user response:

First, map Name column from df2 to df3 then merge df1 and df3 on Name column. Finally, filter out rows where Date is between StartDate and EndDate:

COLS = df1.columns.tolist()   ['RealValue']

df3['Name'] = df3['Code'].map(df2.set_index('Code')['Name'])

out = df1.merge(df3, on='Name', how='left') \
         .query('Date.between(StartDate, EndDate)')[COLS]

Output:

>>> out
    Name Description        Date  Quant  Value  RealValue
1   B100     text123  2021-01-02      3   89.1       98.8
2   B101     text567  2021-01-03      2   90.1       98.9
3   A200     text820  2021-03-02      1   90.2       98.9
4   B101     text567  2021-03-02      6   90.2       98.9
5   A500     text758  2021-03-06      1   94.0       98.9
6   A500     text758  2021-03-06      2   94.0       98.9
7   A500     text758  2021-03-07      2   94.0       98.9
8   A200     text820  2021-04-02      1   90.2       98.9
10  A999     text583  2021-05-05      2   90.6       78.0

CodePudding user response:

Example:

start_date = "2019-1-1"
end_date="2019-1-31"

after_start_dates=df['date']>= start_date
before_star_dates=df['date']<= end_date

between_date=after_start_dates & before_end_dates
filter_dates=df.loc[between_dates]

print(filtered_dates)

CodePudding user response:

try this:

try1 = pd.merge(df_1, df_2, on = 'Name', how = 'outer')
try2 = pd.merge(try1, df_3, on = 'Code', how = 'outer')
try2

and then, you try to navigate in try2.

try2[['Name','Date','Code','Value','RealValue','Quant']]
  • Related