Home > Software design >  Subset a pandas dataframe based on column value using chain rule
Subset a pandas dataframe based on column value using chain rule

Time:04-17

Let say I have below code

import pandas as pd
dat = pd.DataFrame({'A' : ['2010-01-01', '2011-01-01', '2012-01-01', '2013-01-01'], 'B' : [1,2,3,4]})


start = pd.to_datetime('2011Q1').to_period('Q').start_time
end = pd.to_datetime('2012Q1').to_period('Q').start_time

dat['A1'] = pd.to_datetime(dat['A'])
dat1 = dat[dat['A1'].between(start, end)]

As you can see, in the second last line I am creating a new column A1 with type date and in the last line, I am subsetting based in that newly created column

I am looking for some way how can I use chain rule to perform above 2 calculations using a single line of code?

Any pointer will be very helpful

CodePudding user response:

You can try pipe to apply chainable functions that expect DataFrames.

out = dat.assign(A1=pd.to_datetime(dat['A'])).pipe(lambda df: df[df['A1'].between(start, end)])
print(out)

            A  B         A1
1  2011-01-01  2 2011-01-01
2  2012-01-01  3 2012-01-01

CodePudding user response:

Let's use assign and query like this:

dat.assign(A1=pd.to_datetime(dat['A'])).query('@start <= A1 <= @end')

Output:

            A  B         A1
1  2011-01-01  2 2011-01-01
2  2012-01-01  3 2012-01-01
  • Related