Home > Blockchain >  Convert dataframe column to array and inset back into dataframe
Convert dataframe column to array and inset back into dataframe

Time:04-20

I have pandas data frame table like this:

Unique ID Count Samples
A-1 3 6.1 (Apple)
A-1 3 4.5 (Banana)
A-1 3 5.6 (Kiwi)
A-2 2 7.2 (Orange)
A-2 2 3.6 (Mango)

I would like to convert the column "Samples" to a list/array and obtain only the first occurrence of the other columns. The resulting table should look like this

Unique ID Count SampleSummary
A-1 3 6.1 (Apple), 4.5 (Banana), 5.6 (Kiwi)
A-2 2 7.2 (Orange), 3.6(Mango)

So far, this is what I have in my code:

for i in range(len(dataframe['unique_id'])):
     SampleSummary= np.asarray(pd.concat([dataframe['Sample']]))

CodePudding user response:

Create dictionary for aggregate by GroupBy.first without column Unique ID, for column Samples aggregate by join by GroupBy.agg:

d = dict.fromkeys(dataframe.columns.difference(['Unique ID']), 'first')
d['Samples'] = ', '.join

df  = dataframe.groupby('Unique ID', as_index=False).agg(d)
print (df)
  Unique ID  Count                                Samples
0       A-1      3  6.1 (Apple), 4.5 (Banana), 5.6 (Kiwi)
1       A-2      2              7.2 (Orange), 3.6 (Mango)

Or is possible use GroupBy.transform for column filled by joined strings and then use DataFrame.drop_duplicates:

dataframe['Samples'] = dataframe.groupby('Unique ID')['Samples'].transform(', '.join)
df = dataframe.drop_duplicates('Unique ID')
print (df)
  Unique ID  Count                                Samples
0       A-1      3  6.1 (Apple), 4.5 (Banana), 5.6 (Kiwi)
3       A-2      2              7.2 (Orange), 3.6 (Mango)
  • Related