In Pandas DataFrame when each cell is an array sort each subarray-CodePudding

To illustrate my question, consider the following Pandas DataFrame:

df = pd.DataFrame({'player': ['Bob', 'Jane', 'Alice'], 
                   'hand': [['two','ace'], ['queen','king'], ['three','five']]})

I would like to sort each hand array. I've tried using lamdas orlooping through df using iterrow, but I couldn't get either to work.

BONUS: The reason I want it sorted is so I could do a groupby on that column to identify all players having same hand. Perhaps, there is a more direct way of doing it.

CodePudding user response：

You can apply(sorted):

df['hand'] = df['hand'].apply(sorted)

Output:

  player           hand
0    Bob     [ace, two]
1   Jane  [king, queen]
2  Alice  [five, three]

This won't allow you to group as lists are not hashable.

If your goal is to group or compare, and the cards are unique, you could also use a frozenset:

df['hand'] = df['hand'].apply(frozenset)

Or, if you want to consider duplicated cards (e.g, ace ace), sort and convert to tuple:

df['hand'] = df['hand'].apply(lambda x: tuple(sorted(x)))

Output:

  player           hand
0    Bob     (two, ace)
1   Jane  (king, queen)
2  Alice  (three, five)

Then you can groupby hand to list the players with the same hand:

df.groupby('hand')['player'].apply(list)

Output:

hand
(ace, two)         [Bob]
(five, three)    [Alice]
(king, queen)     [Jane]
Name: player, dtype: object

CodePudding user response：

I will do explode ,for your next step , you can just groupby the hand agg the player

df.explode('hand').groupby('hand').player.agg(list)
hand
ace        [Bob]
five     [Alice]
king      [Jane]
queen     [Jane]
three    [Alice]
two        [Bob]
Name: player, dtype: object

CodePudding user response：

I think that using sorted is one of the best options, and in this question it is also raised.

>>> df['hand'] = [tuple(sorted(x)) for x in df['hand']]
>>> df
  player           hand
0    Bob     (ace, two)
1   Jane  (king, queen)
2  Alice  (five, three)