I am looking for an efficient method to pivot
or groupyby
a data frame where the output is the list of unique items in that subgroup. I am not interested in using loops and would prefer to retain the result as a data frame. Thank you in advance.
Example df:
number | letter | fruit |
---|---|---|
101 | a | apple |
101 | b | melon |
101 | a | peach |
201 | b | grape |
101 | b | orange |
101 | a | pear |
201 | b | apple |
201 | a | peach |
My Code:
df=pd.DataFrame({'number':['101','101','101','201','101','101','201','201'], 'letter': ['a','b','a','b','b','a','b','a'], 'fruit':['apple','melon','peach','grape','orange','pear','apple','peach'] })
pd.pivot_table(df, index='number', columns='letter', values='fruit', aggfunc='sum')
Output:
letter | a | b |
---|---|---|
number | ||
101 | appleapplepear | melonorange |
201 | peach | grapeapple |
Desired Output:
letter | a | b |
---|---|---|
number | ||
101 | [apple ,pear] | [melon ,orange] |
201 | [peach] | [grape ,apple] |
CodePudding user response:
Try:
>>> pivoted = pd.pivot_table(df, index='number', columns='letter', values='fruit', aggfunc=lambda x: list(set(x)))
letter a b
number
101 [apple, peach, pear] [melon, orange]
201 [peach] [grape, apple]
CodePudding user response:
You can also use pd.unique
for the aggfunc
, as follows:
pd.pivot_table(df, index='number', columns='letter', values='fruit', aggfunc=pd.unique)
Note that the output for a single item is not within a list. Some people prefer this but see whether it fits your preference.
Result:
letter a b
number
101 [apple, peach, pear] [melon, orange]
201 peach [grape, apple]
If you like to put single item also into a list, you can modify the call to pd.unique
a bit, as follows:
pd.pivot_table(df, index='number', columns='letter', values='fruit', aggfunc=lambda x: pd.unique(x).tolist())
Result:
letter a b
number
101 [apple, peach, pear] [melon, orange]
201 [peach] [grape, apple]
Thus, you have 2 choices of output layouts for single item now.