I have a rather big csv file and I want to find out which items are used the most at a certain player level.
So one column I'm looking at has all the player levels (from 1 to 30) another column has all the item names (e.g. knife_1, knife_2, etc.) and yet another column lists backpacks (backback_1, backpack_2, etc.). Now I want to check which is the most used knife and backpack for player level 1, for player level 2, player level 3, etc.
What I've tried was this but when I tried to verify it in Excel (with countifs) the results were different:
import pandas as pd
df = pd.read_csv('filename.csv')
#getting the columns I need:
df = df[["playerLevel", "playerKnife", "playerBackpack"]]
print(df.loc[df["playerLevel"] == 1].mode())
In my head, this should locate all the rows with playerLevel 1 and then only print out the most used items for that level. However, I wanted to double-check and used "countifs" in excel which gave me a different result.
Maybe I'm thinking too simple (or complicated) so I hope you can either verify that my code should be correct or point out the error.
I'm also looking for an easy way to then go through all levels automatically and print out the most used items for each level.
Thanks in advance.
Edit: Dataframe example. Just imagine there are thousands of players that can range from level 1 to level 30. And especially on higher levels, they have access to a lot of knives and backpacks. So the combinations are limitless.
index playerLevel playerKnife playerBackpack
0 1 knife_1 backpack_1
1 2 knife_2 backpack_1
2 3 knife_1 backpack_2
3 1 knife_2 backpack_1
4 2 knife_3 backpack_2
5 1 knife_1 backpack_1
6 15 knife_13 backpack_12
7 13 knife_10 backpack_9
8 1 knife_1 backpack_2
CodePudding user response:
Try the following:
data = """\
index playerLevel playerKnife playerBackpack
0 1 knife_1 backpack_1
1 2 knife_2 backpack_1
2 3 knife_1 backpack_2
3 1 knife_2 backpack_1
4 2 knife_3 backpack_2
5 1 knife_1 backpack_1
6 15 knife_13 backpack_12
7 13 knife_10 backpack_9
8 1 knife_1 backpack_2
"""
import io
import pandas as pd
stream = io.StringIO(data)
df = pd.read_csv(stream, sep='\s ')
df = df.drop('index', axis='columns')
print(df.groupby('playerLevel').agg(pd.Series.mode))
yields
playerKnife playerBackpack
playerLevel
1 knife_1 backpack_1
2 [knife_2, knife_3] [backpack_1, backpack_2]
3 knife_1 backpack_2
13 knife_10 backpack_9
15 knife_13 backpack_12
Note that the result of df.groupby('playerLevel').agg(pd.Series.mode)
is a DataFrame, so you can assign that result and use it as a normal dataframe.
For data plain from a CSV file, simply use
df = pd.read_csv('filename.csv')
df = df[['playerLevel, 'playerKnife', 'playerBackpack']] # or whichever columns you want
stats = df.groupby('playerLevel').agg(pd.Series.mode)) # stats will be dataframe as well