I have a data set like this:
dfdict = {
'year' : [2021, 2021, 2021, 2021, 2021, 2022, 2022, 2022, 2022, 2022],
'value' : [1,2,3,4,5,6,7,8,9,10]
}
df = pd.DataFrame(dfdict)
I also have a dictionary whose keys are years and values are the limit values of each year I want to apply a condition:
limitdict = {
'2021' : [2, 4],
'2022' : [7, 8]
}
How can I show the rows of df
whose values for each year are either smaller than the lower limit or larger than the upper limit of the limitdict
? The result will look like:
year value
0 2021 1
4 2021 5
5 2022 6
8 2022 9
9 2022 10
CodePudding user response:
I suggest splitting the dataframe by year and then using between
to filter out values in the range specified in the limitdict
. Note that I am using the ~
symbol to filter out values within the range specified in the limitdic
: df_year[~df_year.value.between(limitdict[str(year)][0],limitdict[str(year)][1])]
.
list_of_dataframes = []
for year in df.year.unique():
df_year = df[df.year == year]
list_of_dataframes.append(df_year[~df_year.value.between(limitdict[str(year)][0],limitdict[str(year)][1])])
output_df = pd.concat(list_of_dataframes)
This returns:
year value
0 2021 1
4 2021 5
5 2022 6
8 2022 9
9 2022 10
CodePudding user response:
Another possible solution:
# astype is needed because your dictionary keys are strings
year = df['year'].astype('str')
df[(
df['value'].lt([limitdict[x][0] for x in year]) |
df['value'].gt([limitdict[x][1] for x in year])
)]
Or:
year = df['year'].astype('str')
z1, z2 = zip(*[limitdict[x] for x in year])
df[(df['value'].lt(z1) | df['value'].gt(z2))]
Output:
year value
0 2021 1
4 2021 5
5 2022 6
8 2022 9
9 2022 10