I want to split two columns at their comma and bring them back to the original pandas dataframe. I tried to explode()
but I got an error with ValueError: cannot handle a non-unique multi-index!
I wonder how I can overcome this error.
import pandas as pd
data = {'fruit_tag': {0: 'apple, organge', 1: 'watermelon', 2: 'banana', 3: 'banana', 4: 'apple, banana'}, 'location': {0: 'Hong Kong , London', 1: 'New York, Tokyo', 2: 'Singapore', 3: 'Singapore, Hong Kong', 4: 'Tokyo'}, 'rating': {0: 'bad', 1: 'good', 2: 'good', 3: 'bad', 4: 'good'}, 'measure_score': {0: 0.9529434442520142, 1: 0.952498733997345, 2: 0.9080725312232971, 3: 0.8847543001174927, 4: 0.8679852485656738}}
dt = pd.DataFrame.from_dict(data)
dt.\
set_index(['rating', 'measure_score']).\
apply(lambda x: x.str.split(',').explode())
CodePudding user response:
When you explode, the index are the same for (each) old rows. Pandas doesn't know (or like) to align these indexes, because intention of users can be different from case to case, e.g. align by order, or cross merge. In your case, for example, what do you expect to get from row 1
where you have 2 entries for each column? How about row 2?
If you want a cross merge, you would need to explode manually:
def explode(x, col): return x.assign(**{col:x[col].str.split(', ')}).explode(col)
explode(explode(dt, 'fruit_tag'), 'location')
Output:
fruit_tag location rating measure_score
0 apple Hong Kong bad 0.952943
0 apple London bad 0.952943
0 organge Hong Kong bad 0.952943
0 organge London bad 0.952943
1 watermelon New York good 0.952499
1 watermelon Tokyo good 0.952499
2 banana Singapore good 0.908073
3 banana Singapore bad 0.884754
3 banana Hong Kong bad 0.884754
4 apple Tokyo good 0.867985
4 banana Tokyo good 0.867985