Create subsets based on output of the value

My dataset looks as follows:

Area	Year	Value	Crop
Afghanistan	1961	1000	Corn
Afghanistan	1961	1100	Potato
Afghanistan	1961	1200	Wheat
Afghanistan	1962	1304	Potato
Afghanistan	1962	1923	Corn
Bhutan	1961	1236	Corn
Bhutan	1961	1836	Potato
Bhutan	1961	1031	Wheat
Bhutan	1962	1111	Corn

I have a total of 60 years of data under consideration, with each country able to contain a max of one row per crop per year- to a total of 240 rows per year. I have used the .value_counts function to evaluate which countries are not whole and would like to create a subset containing only the countries that have a complete set of data (ie. where the .value_counts function returned a value of 240 for that country).

I have tried doing this as follows: df_240 = df.loc[df["Area"].value_counts == 240] however I am having no luck.

The error I am recieving is:

KeyError: 'False: boolean label can not be used without a boolean index'

CodePudding user response：

This is what you are looking for:

df1 = df.groupby(["Area", "Year"]).filter(lambda x: len(x) == 240)

CodePudding user response：

Here is grouping by 'Area' and fetching 'Year' data is passed to transform where 'count' is used for counting. It turns out a logical mask, which is substituted for selecting rows. Indexes are dropped and drop=True is applied.

import pandas as pd

ind = df.groupby('Area')['Year'].transform('count') >= 240
df1 = df[ind].reset_index(drop=True)

print(df1)