Home > Blockchain >  Can't get df.loc to work using dates within a loop?
Can't get df.loc to work using dates within a loop?

Time:04-30

I want my loop to go through 3 dfs with the same format, and do 3 things:

  1. Transform "Date" object column to datetime.
  2. Use loc to filter only the dates after year 2021.
  3. Set date as index.

Say I have 3 dataframes named df1, df2 and df3, they are all shaped like this:

Index Date Information
1 2020-01-01 Blablabla
2 2021-01-01 Blablabla
3 2022-01-01 Blablabla

After running my code, I want all dfs to be shaped like this:

Date Information
2021-01-01 Blablabla
2022-01-01 Blablabla

The code I'm running:

dfs = [df1, df2, df3]

for i in dfs:
    i['Date'] = pd.to_datetime(i['Date'])
    i = i.loc[i['Date'].dt.year >= 2021]
    i.set_index('Date', inplace=True)

I can't seem to get it to work. The first part of the loop is working, it transformms to datetime, but the filter is not working neither the indexing.

CodePudding user response:

Let's this using a few tricks of pandas and python:

df1, df2, df3 = [i.assign(Date=pd.to_datetime(i['Date'])).set_index('Date').loc['2021':] 
                 for i in [df1, df2, df3]]
  1. Unpacking
  2. pd.Dataframe.assign
  3. With datetime in index you can use partial string indexing with loc
  4. List comprehension

Output:

print(f'{df1=}\n\n{df2=}\n\n{df3=}')

df1=            Index Information
Date                         
2021-01-01      2   Blablabla
2022-01-01      3   Blablabla

df2=            Index Information
Date                         
2021-01-01      2   Blablabla
2022-01-01      3   Blablabla

df3=            Index Information
Date                         
2021-01-01      2   Blablabla
2022-01-01      3   Blablabla

Fixing your for loop can be done like this:

dfs = [df1,df2,df3]
for i in dfs:
    i.assign(Date=pd.to_datetime(i['Date']), inplace=True)
    i.query('Date >= "2021-01-01"', inplace=True)
    i.set_index('Date', inplace=True)

df1

CodePudding user response:

step1: transform date column to 'real date'

d1['date'] = pd.to_datetime(d1['date'])
d2['date'] = pd.to_datetime(d2['date'])
d3['date'] = pd.to_datetime(d3['date'])

step2: select >= 2021 (year) rows

d1_s = d1.loc[d1['date'].dt.year >= 2021,:]
d2_s = d2.loc[d2['date'].dt.year >= 2021,:]
d3_s = d2.loc[d2['date'].dt.year >= 2021,:]

step3: pd.concat(d1_s,d2_s,d3_s) named a new dataframe 'data_all'

data_all = pd.concat([d1_s,d2_s,d3_s],ignore_index=True)

step4:set 'date' column as index

data_all.set_index(['date'], inplace=True)
  • Related