I'm working with a dataframe which carries daily data from february 2013 to may 2022 and has the following format:
Unnamed: 0 prod und proc tipo min mcom max merc date year month day
0 0 Bacalhau Cx.25Kg NOR Saith NaN 437.50 NaN Est 2013/02/01 2013 2 1
1 1 Camarao Kg NaN 7 Barba NaN NaN NaN Sin 2013/02/01 2013 2 1
2 2 Camarao com casca Kg NaN Grande NaN NaN NaN Sin 2013/02/01 2013 2 1
3 3 Camarao com casca Kg NaN Medio NaN NaN NaN Sin 2013/02/01 2013 2 1
4 4 Camarao com casca Kg NaN Pequeno NaN NaN NaN Sin 2013/02/01 2013 2 1
5 5 Peixe de agua salgada Kg RS Albacora 7.80 10.00 10.00 Est 2013/02/01 2013 2 1
6 6 Peixe de agua salgada Kg RS-SC Anchova 8.80 8.80 9.00 Est 2013/02/01 2013 2 1
7 7 Peixe de agua salgada Kg RS-SC Castanha NaN 5.00 NaN Est 2013/02/01 2013 2 1
8 8 Peixe de agua salgada Kg RS-SC Cavalinha 4.00 4.00 4.38 Est 2013/02/01 2013 2 1
9 9 Peixe de agua salgada Kg RS-SC Cioba 15.98 15.98 16.50 Est 2013/02/01 2013 2 1
doing an explanatory analysis I realized that there are some date that must be removed. These data are in the rows were the month value is '2', the day value is '29' and the year values are '2013', '2014', '2015', '2017', '2018', '2019', '2021' and '2022'.
How removing it row by row would polute so much the code i tried to remove it by using a for loop according to the following command:
anos = [2013,2014,2015,2017,2018,2019,2021,2022]
for df['year'].values in anos:
df = df.drop(df[(df['month'] == 2) & (df['day'] == 29)].index, inplace= True)
but it didn't work. Could someone help me?
CodePudding user response:
you don't need to loop over it, all can be done in a single statement
df.drop(df[( (df['year'].isin(anos)) & (df['month'] == 2) & (df['day'] == 29) )].index)
PS: can you post the dataframe example as a csv?
CodePudding user response:
This will do the job without loops:
anos = [2013,2014,2015,2017,2018,2019,2021,2022]
df = df[(df['month'] != 2) | (df['day'] != 29) | ~df['year'].isin(anos)]
df