To illustrate my question, consider the following Pandas DataFrame:
df = pd.DataFrame({'player': ['Bob', 'Jane', 'Alice'],
'hand': [['two','ace'], ['queen','king'], ['three','five']]})
I would like to sort each hand array. I've tried using lamdas orlooping through df using iterrow, but I couldn't get either to work.
BONUS: The reason I want it sorted is so I could do a groupby on that column to identify all players having same hand. Perhaps, there is a more direct way of doing it.
CodePudding user response:
You can apply(sorted)
:
df['hand'] = df['hand'].apply(sorted)
Output:
player hand
0 Bob [ace, two]
1 Jane [king, queen]
2 Alice [five, three]
This won't allow you to group as lists are not hashable.
If your goal is to group or compare, and the cards are unique, you could also use a frozenset
:
df['hand'] = df['hand'].apply(frozenset)
Or, if you want to consider duplicated cards (e.g, ace ace), sort and convert to tuple:
df['hand'] = df['hand'].apply(lambda x: tuple(sorted(x)))
Output:
player hand
0 Bob (two, ace)
1 Jane (king, queen)
2 Alice (three, five)
Then you can groupby
hand to list the players with the same hand:
df.groupby('hand')['player'].apply(list)
Output:
hand
(ace, two) [Bob]
(five, three) [Alice]
(king, queen) [Jane]
Name: player, dtype: object
CodePudding user response:
I will do explode
,for your next step , you can just groupby
the hand agg
the player
df.explode('hand').groupby('hand').player.agg(list)
hand
ace [Bob]
five [Alice]
king [Jane]
queen [Jane]
three [Alice]
two [Bob]
Name: player, dtype: object
CodePudding user response:
I think that using sorted
is one of the best options, and in this question it is also raised.
>>> df['hand'] = [tuple(sorted(x)) for x in df['hand']]
>>> df
player hand
0 Bob (ace, two)
1 Jane (king, queen)
2 Alice (five, three)