Home > Software engineering >  Pandas include single row in df after filtering with .loc
Pandas include single row in df after filtering with .loc

Time:06-15

So, in this function:

def filter_by_freq(df, frequency):
    filtered_df = df.copy()
    
    if frequency.upper() == 'DAY':
        pass
        
    else:
        date_obj = filtered_df['Date'].values[0]
        target_day = pd.to_datetime(date_obj).day
        target_month = pd.to_datetime(date_obj).month
        
        final_date_obj = filtered_df['Date'].values[-1]
        
        if frequency.upper() == 'MONTH':
            filtered_df = filtered_df.loc[filtered_df['Date'].dt.day.eq(target_day)]
        
        elif frequency.upper() == 'YEAR':
            filtered_df = filtered_df.loc[filtered_df['Date'].dt.day.eq(target_day)]
            filtered_df = filtered_df.loc[filtered_df['Date'].dt.month.eq(target_month)]
    
    return filtered_df

How can I also include in the .loc the very last row from the original df? Tried doing (for month frequency): filtered_df = filtered_df.loc[(filtered_df['Date'].dt.day.eq(target_day)) | (filtered_df['Date'].dt.date.eq(final_date_obj))] but didn't work.

Thanks for your time!

CodePudding user response:

Here's one way you could do it. In this example I have a df and I want to filter out all rows that have c1 > 0.5, but I want to keep the last row no matter what. I create a boolean series called lte_half to keep track of the first condition, and then I create another boolean series/list/array (all interchangeable) called end_ind which is True only for the last row. The filtered table is created by taking all rows that pass either condition with the |

import pandas as pd
import numpy as np

np.random.seed(0)

df = pd.DataFrame({'c1':np.random.rand(20)})
lte_half = df['c1'].le(0.5)
end_ind = df.index == df.index[-1]

filt_df = df[lte_half | end_ind]
print(filt_df)

enter image description here

  • Related