I have dataframe with multivalue in some columns
df = pd.DataFrame({'id': ['1','2','3',],
'fruits': ['apple, Apple','Orange, grapefruit','Melon'],
'count': [2,2,1]})
I want to seperate the value in fruits column, so my dataframe become like this
id fruits
1 apple
1 Apple
2 Orange
2 grapefruit
3 Melon
CodePudding user response:
You have to reassign the column values by splitting comma separated strings and explode results.
df = df.assign(fruits=df['fruits'].str.split(',')).explode('fruits')
Output will be:
id fruits count
0 1 apple 2
0 1 Apple 2
1 2 Orange 2
1 2 grapefruit 2
2 3 Melon 1
If you want you can reorder the index, without duplicates, just do a reset:
df = df.reset_index(drop=True)
Output will be:
id fruits count
0 1 apple 2
1 1 Apple 2
2 2 Orange 2
3 2 grapefruit 2
4 3 Melon 1
CodePudding user response:
you can use the str
methods with explode
on the relevant columns:
ser = df.fruits.str.split(",\s").explode()
output:
0 apple
0 Apple
1 Orange
1 grapefruit
2 Melon
Name: fruits, dtype: object
if you wish to keep the id
:
df2 = pd.DataFrame(ser)
df2['id'] = df['id']
this uses the index of df, df2 to insert the value of id
:
fruits id
0 apple 1
0 Apple 1
1 Orange 2
1 grapefruit 2
2 Melon 3
CodePudding user response:
df2 = pd.DataFrame(columns=['id','fruits'])
for i, row in df.iterrows():
temp = pd.DataFrame([[row['id'],fruit.replace(" ", "")] for fruit in row['fruits'].split(',')], columns=['id','fruits'], index = [i for _ in row['fruits'].split(',')])
df2 = df2.append(temp)
This answer is slower than the previous answers since it doesn’t use Pandas built-in functions (assign or explode) but is easier to understand algorithmically.