I have a dataframe with a column contain multiple values. I need to split the values and assigned them to separate but matching columns. Example Fruits column contain different fruits. I need to splits these fruits and assign apple to apple column, banana to banana column ..etc.
fruits
0 apple/mango
1 grapes
2 orange/apple
3 grapes/orange/apple/banana
4 banana/grapes/apple
I created data frame but when I split values in a row ,these values are added to multiple rows instead of single row. There should be 5 rows in new data frame but I'm getting 12 rows
columns_name =['apple','banana','grapes','orange','mango']
fruits =df['fruits'].to_list()
columnlists_list =[]
for column in columns_name:
column_list = column
column_list =[]
columnlists_list.append(column_list)
for fruit in fruits:
one_fruit = fruit.split('/')
for i in range(0,len(one_fruit)):
if column == one_fruit[i]:
column_list.append('1')
else:
column_list.append('0')
#data frame
fruits_df = pd.DataFrame(columnlists_list)
fruits_df =fruits_df.transpose()
fruits_df.columns =[columns_name]
This the result I got.
apple banana grapes orange mango
0 1 0 0 0 0
1 0 0 0 0 1
2 0 0 1 0 0
3 0 0 0 1 0
4 1 0 0 0 0
5 0 0 1 0 0
6 0 0 0 1 0
Please advise how to correct the error. Thank you
CodePudding user response:
Try using the get_dummies
method in the str accessor (.str) from pandas working with text data:
df['fruits'].str.get_dummies(sep='/')
Output:
apple banana grapes mango orange
0 1 0 0 1 0
1 0 0 1 0 0
2 1 0 0 0 1
3 1 1 1 0 1
4 1 1 1 0 0
CodePudding user response:
You can start by unnesting your data using explode()
.This requires a column of list type values which you can achieve with str.split('/')
.
Then you can use get_dummies()
which will use all the unique values of your 'fruits' column, and chain groupby
to get one row per index:
# Unnest your dataframe
expl_df = df.fruits.str.split('/').explode().reset_index()
>>> print(expl_df)
index fruits
0 0 apple
1 0 mango
2 1 grapes
3 2 orange
4 2 apple
5 3 grapes
6 3 orange
7 3 apple
8 3 banana
9 4 banana
10 4 grapes
11 4 apple
# Convert your fruits to columns
pd.get_dummies(expl_df, columns=['fruits'],prefix='', prefix_sep='').groupby(['index'], as_index=False).sum().drop('index',axis=1)
apple banana grapes mango orange
0 1 0 0 1 0
1 0 0 1 0 0
2 1 0 0 0 1
3 1 1 1 0 1
4 1 1 1 0 0
CodePudding user response:
Here is my attempt:
df['fruits']=df['fruits'].str.split("/")
columns_name =['apple','banana','grapes','orange','mango']
membership_info = []
for elem in df['fruits']:
membership = []
for fruit in columns_name:
membership.append(1) if fruit in elem else membership.append(0)
membership_info.append(membership)
result_df = pd.DataFrame(membership_info, columns= columns_name)
print(result_df)
Output:
apple banana grapes orange mango
0 1 0 0 0 1
1 0 0 1 0 0
2 1 0 0 1 0
3 1 1 1 1 0
4 1 1 1 0 0