Home > Back-end >  Pandas and Excel results are inconsistent
Pandas and Excel results are inconsistent

Time:04-16

I am using a csv file with a table of all of the different pokemon types, with all different pokemon names in the rows and pieces of information such as their typing, and my goal is to find the average of the pokemon Total(sum of all of their stats) which are of a certain typing. I have attempted that with this coding

pokemons = pd.read_csv('pokemon.csv')
print('Bug pokemon average:', pokemons['Total'][(pokemons['Type 1'] == "Bug") | (pokemons['Type 2'] == "Bug")].mean())

And I obtain the following result Bug pokemon average: 379.52777777777777 When I was expecting the average to be 365.3611111 which I had worked out in an excel document earlier. I am wondering if I am doing something blatantly wrong? The csv file is this one https://gist.github.com/armgilles/194bcff35001e7eb53a2a8b441e8b2c6.

CodePudding user response:

The AVERAGEIF function in Excel does not consider the range C2:D801 as an OR function. That selection is not filtering the lines that you want. A functioning solution that gives the same results as you expect is =(SUMIF(C2:C801,"Bug",E1:E801) SUMIF(D2:D801,"Bug",E1:E801))/(COUNTIF(C2:C801,"Bug") COUNTIF(D2:D802,"Bug")).

Basically you have to compute the sums separately, then average. This gives the expected result of 379.5278.

I don't think the AVERAGEIF function works readily with OR conditions. I hope it helps :)

I found this related answer: Excel averageifs with or function

  • Related