Home > Mobile >  Find "most used items" per "level" in big csv file with Pandas
Find "most used items" per "level" in big csv file with Pandas

Time:07-13

I have a rather big csv file and I want to find out which items are used the most at a certain player level.

So one column I'm looking at has all the player levels (from 1 to 30) another column has all the item names (e.g. knife_1, knife_2, etc.) and yet another column lists backpacks (backback_1, backpack_2, etc.). Now I want to check which is the most used knife and backpack for player level 1, for player level 2, player level 3, etc.

What I've tried was this but when I tried to verify it in Excel (with countifs) the results were different:

import pandas as pd 

df = pd.read_csv('filename.csv')
#getting the columns I need:
df = df[["playerLevel", "playerKnife", "playerBackpack"]]

print(df.loc[df["playerLevel"] == 1].mode())

In my head, this should locate all the rows with playerLevel 1 and then only print out the most used items for that level. However, I wanted to double-check and used "countifs" in excel which gave me a different result.

Maybe I'm thinking too simple (or complicated) so I hope you can either verify that my code should be correct or point out the error.

I'm also looking for an easy way to then go through all levels automatically and print out the most used items for each level.

Thanks in advance.

Edit: Dataframe example. Just imagine there are thousands of players that can range from level 1 to level 30. And especially on higher levels, they have access to a lot of knives and backpacks. So the combinations are limitless.

index   playerLevel  playerKnife  playerBackpack
0       1        knife_1      backpack_1
1       2        knife_2      backpack_1
2       3        knife_1      backpack_2
3       1        knife_2      backpack_1
4       2        knife_3      backpack_2
5       1        knife_1      backpack_1
6       15       knife_13     backpack_12
7       13       knife_10     backpack_9
8       1        knife_1      backpack_2

CodePudding user response:

Try the following:

data = """\
index   playerLevel  playerKnife  playerBackpack
0       1        knife_1      backpack_1
1       2        knife_2      backpack_1
2       3        knife_1      backpack_2
3       1        knife_2      backpack_1
4       2        knife_3      backpack_2
5       1        knife_1      backpack_1
6       15       knife_13     backpack_12
7       13       knife_10     backpack_9
8       1        knife_1      backpack_2
"""

import io
import pandas as pd

stream = io.StringIO(data)
df = pd.read_csv(stream, sep='\s ')
df = df.drop('index', axis='columns')
print(df.groupby('playerLevel').agg(pd.Series.mode))

yields

                    playerKnife            playerBackpack
playerLevel
1                       knife_1                backpack_1
2            [knife_2, knife_3]  [backpack_1, backpack_2]
3                       knife_1                backpack_2
13                     knife_10                backpack_9
15                     knife_13               backpack_12

Note that the result of df.groupby('playerLevel').agg(pd.Series.mode) is a DataFrame, so you can assign that result and use it as a normal dataframe.


For data plain from a CSV file, simply use

df = pd.read_csv('filename.csv')
df = df[['playerLevel, 'playerKnife', 'playerBackpack']] # or whichever columns you want
stats = df.groupby('playerLevel').agg(pd.Series.mode))  # stats will be dataframe as well
  • Related