Home > Enterprise >  How to split values in a single column and assign to matching columns in pandas data frame
How to split values in a single column and assign to matching columns in pandas data frame

Time:02-18

I have a dataframe with a column contain multiple values. I need to split the values and assigned them to separate but matching columns. Example Fruits column contain different fruits. I need to splits these fruits and assign apple to apple column, banana to banana column ..etc.

    fruits
0   apple/mango
1   grapes
2   orange/apple
3   grapes/orange/apple/banana
4   banana/grapes/apple

I created data frame but when I split values in a row ,these values are added to multiple rows instead of single row. There should be 5 rows in new data frame but I'm getting 12 rows

columns_name =['apple','banana','grapes','orange','mango']
fruits =df['fruits'].to_list()
columnlists_list =[]
for column in columns_name:
  column_list = column
  column_list =[]
  columnlists_list.append(column_list)
  for fruit in fruits:
    one_fruit = fruit.split('/')
    for i in range(0,len(one_fruit)):
      if column == one_fruit[i]:
        column_list.append('1')
      else:
        column_list.append('0')
#data frame 
fruits_df = pd.DataFrame(columnlists_list)
fruits_df =fruits_df.transpose()
fruits_df.columns =[columns_name]

This the result I got.

    apple   banana  grapes  orange  mango
0   1       0       0       0        0
1   0       0       0       0        1
2   0       0       1       0        0
3   0       0       0       1        0
4   1       0       0       0        0
5   0       0       1       0        0
6   0       0       0       1        0

Please advise how to correct the error. Thank you

CodePudding user response:

Try using the get_dummies method in the str accessor (.str) from pandas working with text data:

df['fruits'].str.get_dummies(sep='/')

Output:

   apple  banana  grapes  mango  orange
0      1       0       0      1       0
1      0       0       1      0       0
2      1       0       0      0       1
3      1       1       1      0       1
4      1       1       1      0       0

CodePudding user response:

You can start by unnesting your data using explode().This requires a column of list type values which you can achieve with str.split('/').

Then you can use get_dummies() which will use all the unique values of your 'fruits' column, and chain groupby to get one row per index:

# Unnest your dataframe
expl_df =  df.fruits.str.split('/').explode().reset_index()
>>> print(expl_df)

    index  fruits
0       0   apple
1       0   mango
2       1  grapes
3       2  orange
4       2   apple
5       3  grapes
6       3  orange
7       3   apple
8       3  banana
9       4  banana
10      4  grapes
11      4   apple

# Convert your fruits to columns
pd.get_dummies(expl_df, columns=['fruits'],prefix='', prefix_sep='').groupby(['index'], as_index=False).sum().drop('index',axis=1)

   apple  banana  grapes  mango  orange
0      1       0       0      1       0
1      0       0       1      0       0
2      1       0       0      0       1
3      1       1       1      0       1
4      1       1       1      0       0

CodePudding user response:

Here is my attempt:

df['fruits']=df['fruits'].str.split("/")
columns_name =['apple','banana','grapes','orange','mango']

membership_info = []
for elem in df['fruits']:
    membership = []
    for fruit in columns_name:
        membership.append(1) if fruit in elem else membership.append(0)
    membership_info.append(membership)

result_df = pd.DataFrame(membership_info, columns= columns_name)
print(result_df)

Output:

   apple  banana  grapes  orange  mango
0      1       0       0       0      1
1      0       0       1       0      0
2      1       0       0       1      0
3      1       1       1       1      0
4      1       1       1       0      0
  • Related