Home > OS >  Seperate values in specific dataframe column to some row
Seperate values in specific dataframe column to some row

Time:11-02

I have dataframe with multivalue in some columns

df = pd.DataFrame({'id': ['1','2','3',],
                   'fruits': ['apple, Apple','Orange, grapefruit','Melon'],
                   'count': [2,2,1]})

I want to seperate the value in fruits column, so my dataframe become like this

id  fruits
 1  apple
 1  Apple
 2  Orange
 2  grapefruit
 3  Melon

CodePudding user response:

You have to reassign the column values by splitting comma separated strings and explode results.

df = df.assign(fruits=df['fruits'].str.split(',')).explode('fruits')

Output will be:

  id       fruits  count
0  1        apple      2
0  1        Apple      2
1  2       Orange      2
1  2   grapefruit      2
2  3        Melon      1

If you want you can reorder the index, without duplicates, just do a reset:

df = df.reset_index(drop=True)

Output will be:

  id       fruits  count
0  1        apple      2
1  1        Apple      2
2  2       Orange      2
3  2   grapefruit      2
4  3        Melon      1

CodePudding user response:

you can use the str methods with explode on the relevant columns:

ser = df.fruits.str.split(",\s").explode()

output:

0         apple
0         Apple
1        Orange
1    grapefruit
2         Melon
Name: fruits, dtype: object

if you wish to keep the id:

df2 = pd.DataFrame(ser)
df2['id'] = df['id']

this uses the index of df, df2 to insert the value of id:

       fruits id
0       apple  1
0       Apple  1
1      Orange  2
1  grapefruit  2
2       Melon  3

CodePudding user response:

df2 = pd.DataFrame(columns=['id','fruits'])

for i, row in df.iterrows():
    temp = pd.DataFrame([[row['id'],fruit.replace(" ", "")] for fruit in row['fruits'].split(',')], columns=['id','fruits'], index = [i for _ in row['fruits'].split(',')])
    df2 = df2.append(temp)

This answer is slower than the previous answers since it doesn’t use Pandas built-in functions (assign or explode) but is easier to understand algorithmically.

  • Related