I am trying to split the cells into multiple rows by comma and make the groupby count. One complication is that sometimes I get strange whitespace after the split (I don't understand why and I cannot replicate the strange case). This would make the groupby count wrong. To overcome this, I could strip out the whitespace after each split. My question is how to make the process more 'integrated' - accommodating the whitespace stripping and fewer repeated lines.
data = {'place':["US", "Japan", "UK", "Japan", "UK"],
'colour':["red", "yellow", "blue", "red", "green"],
'fruit1':["organge, apple", "apple", "organge, dragon fruit, watermelon", "organge, others", "others"],
'fruit2':["apple, organge", "others", "watermelon, dragon fruit, organge", "watermelon", "others"]}
df = pd.DataFrame(data)
df
dt = (df.set_index(['place', 'colour', 'fruit1'])
.apply(lambda x: x.str.split(',').explode())
.reset_index())
dt2 = (dt.set_index(['place', 'colour', 'fruit2'])
.apply(lambda x: x.str.split(',').explode())
.reset_index())
dt2['fruit1'] = dt2['fruit1'].str.strip()
dt2['fruit2'] = dt2['fruit2'].str.strip()
dt2.groupby(['fruit1','fruit2']).size().reset_index(name='counts')
Expected output:
fruit1 fruit2 counts
0 apple organge 1
1 apple apple 1
2 dragon fruit dragon fruit 1
3 dragon fruit organge 1
4 dragon fruit watermelon 1
5 others watermelon 1
6 watermelon dragon fruit 1
7 watermelon organge 1
8 watermelon watermelon 1
9 apple others 1
10 organge dragon fruit 1
11 organge organge 2
12 organge apple 1
13 organge watermelon 2
14 others others 1
CodePudding user response:
You can split by ,
with space both columns and then create product of them, last count them:
dt = df[['fruit1','fruit2']].apply(lambda x: x.str.split(', '))
from itertools import product
dt = pd.DataFrame([j for i in dt.to_numpy() for j in product(*i)],
columns=['fruit1','fruit2'])
df = dt.groupby(['fruit1','fruit2']).size().reset_index(name='counts')
print (df)
fruit1 fruit2 counts
0 apple apple 1
1 apple organge 1
2 apple others 1
3 dragon fruit dragon fruit 1
4 dragon fruit organge 1
5 dragon fruit watermelon 1
6 organge apple 1
7 organge dragon fruit 1
8 organge organge 2
9 organge watermelon 2
10 others others 1
11 others watermelon 1
12 watermelon dragon fruit 1
13 watermelon organge 1
14 watermelon watermelon 1