How to select rows for each group whose value are within certain limits?-CodePudding

I have a data set like this:

dfdict = {
    'year' : [2021, 2021, 2021, 2021, 2021, 2022, 2022, 2022, 2022, 2022],
    'value' : [1,2,3,4,5,6,7,8,9,10]
}
df = pd.DataFrame(dfdict)

I also have a dictionary whose keys are years and values are the limit values of each year I want to apply a condition:

limitdict = {
    '2021' : [2, 4],
    '2022' : [7, 8]
}

How can I show the rows of df whose values for each year are either smaller than the lower limit or larger than the upper limit of the limitdict? The result will look like:

    year    value
0   2021    1
4   2021    5
5   2022    6
8   2022    9
9   2022    10

CodePudding user response：

I suggest splitting the dataframe by year and then using between to filter out values in the range specified in the limitdict. Note that I am using the ~ symbol to filter out values within the range specified in the limitdic: df_year[~df_year.value.between(limitdict[str(year)][0],limitdict[str(year)][1])].

list_of_dataframes = []
for year in df.year.unique():
    df_year = df[df.year == year]
    list_of_dataframes.append(df_year[~df_year.value.between(limitdict[str(year)][0],limitdict[str(year)][1])])
output_df = pd.concat(list_of_dataframes)

This returns:

year    value
0   2021    1
4   2021    5
5   2022    6
8   2022    9
9   2022    10

CodePudding user response：

Another possible solution:

# astype is needed because your dictionary keys are strings
year = df['year'].astype('str')

df[(
  df['value'].lt([limitdict[x][0] for x in year]) |
  df['value'].gt([limitdict[x][1] for x in year])
)]

Or:

year = df['year'].astype('str')
z1, z2 = zip(*[limitdict[x] for x in year])
df[(df['value'].lt(z1) |  df['value'].gt(z2))]

Output:

   year  value
0  2021      1
4  2021      5
5  2022      6
8  2022      9
9  2022     10