Hello to the people of the web,
I have a dataframe containing 'DATE' (datetime) as index and TMAX as column with values: tmax dataframe
What i'm trying to do is checking for every month (of each year) the amount of samples (each TMAX column value is considered as a sample). If I have less than 28 samples, I want to drop that particular month (of that particular year) and all it's samples.
I have the following code:
if __name__ == '__main__':
df = pd.read_csv("2961941.csv")
# set date column as index, drop the 'DATE' column to avoid repititions create as datetime object
# speed up parsing using infer_datetime_format=True.
df['DATE'] = pd.to_datetime(df['DATE'], infer_datetime_format=True)
df.set_index('DATE', inplace=True)
# create new table out of 'DATE' and 'TMAX'
tmax = df.filter(['DATE', 'TMAX'], axis=1)
# erase rows with missing data
tmax.dropna()
# create snow table & delete rows with missing info
snow = df.filter(['DATE', 'SNOW']).dropna()
# for index, row in tmax.iterrows():
Thanks for the help.
CodePudding user response:
I can suggest trying the following. Here I have highlighted the results of counting days in a month into a variable 'a'. And then I filter the data in which there are less than 28 days in a month. It worked for me.
a = df.groupby(pd.Grouper(level='DATE', freq="M")).transform('count')
print(df[a['TMAX'] >= 28])