Home > Blockchain >  Create a new column for each group and merge multiple element in a column
Create a new column for each group and merge multiple element in a column

Time:08-13

I have a dataframe such as

Groups  Species        Assignation
G1      Homo_sapiens   assign1
G1      Homo_sapiens   assign2
G1      Homo_sapiens   assign3
G1      Danio_rerio    NA
G1      Mus_musculus   NA
G1      Apis_mellifera NA 
G2      Danio_rerio    NA
G2      Homo_sapiens   assign4
G2      Cavius_percel  NA
G3      Danio_rerio    NA
G3      Mus_musculus   NA

And I would like to create a New_assignation column where for each Groups if there is multiple Assignation, I merge them separated by a "-" such as:

Groups  Species        Assignation   New_assignation
G1      Homo_sapiens   assign1       assign1-assign2-assign3
G1      Homo_sapiens   assign2       assign1-assign2-assign3
G1      Homo_sapiens   assign3       assign1-assign2-assign3
G1      Danio_rerio    NA            assign1-assign2-assign3
G1      Mus_musculus   NA            assign1-assign2-assign3
G1      Apis_mellifera NA            assign1-assign2-assign3
G2      Danio_rerio    NA            assign4
G2      Homo_sapiens   assign4       assign4
G2      Cavius_percel  NA            assign4
G3      Danio_rerio    NA            NA
G3      Mus_musculus   NA            NA

Here is the dict format of the dataframe if it can helps:

{'Groups ': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G1', 5: 'G1', 6: 'G2', 7: 'G2', 8: 'G2', 9: 'G3', 10: 'G3'}, 'Species': {0: 'Homo_sapiens', 1: 'Homo_sapiens', 2: 'Homo_sapiens', 3: 'Danio_rerio', 4: 'Mus_musculus', 5: 'Apis_mellifera', 6: 'Danio_rerio ', 7: 'Homo_sapiens', 8: 'Cavius_percel', 9: 'Danio_rerio', 10: 'Mus_musculus'}, 'Assignation': {0: 'assign1', 1: 'assign2', 2: 'assign3', 3: nan, 4: nan, 5: 'NA ', 6: nan, 7: 'assign4', 8: nan, 9: nan, 10: nan}}

CodePudding user response:

Use GroupBy.transform with lambda function for join and remove missing values:

f = lambda x: '-'.join(x.dropna())
df['New_assignation'] = df.groupby('Groups')['Assignation'].transform(f).replace('', np.nan)
print (df)
   Groups         Species Assignation          New_assignation
0      G1    Homo_sapiens     assign1  assign1-assign2-assign3
1      G1    Homo_sapiens     assign2  assign1-assign2-assign3
2      G1    Homo_sapiens     assign3  assign1-assign2-assign3
3      G1     Danio_rerio         NaN  assign1-assign2-assign3
4      G1    Mus_musculus         NaN  assign1-assign2-assign3
5      G1  Apis_mellifera         NaN  assign1-assign2-assign3
6      G2     Danio_rerio         NaN                  assign4
7      G2    Homo_sapiens     assign4                  assign4
8      G2   Cavius_percel         NaN                  assign4
9      G3     Danio_rerio         NaN                      NaN
10     G3    Mus_musculus         NaN                      NaN

If need also remove duplicated values in original order use dict.fromkeys trick:

print (df)
   Groups         Species Assignation
0      G1    Homo_sapiens     assign1
1      G1    Homo_sapiens     assign1
2      G1    Homo_sapiens     assign3
3      G1     Danio_rerio         NaN
4      G1    Mus_musculus         NaN
5      G1  Apis_mellifera         NaN
6      G2     Danio_rerio         NaN
7      G2    Homo_sapiens     assign4
8      G2   Cavius_percel         NaN
9      G3     Danio_rerio         NaN
10     G3    Mus_musculus         NaN

f = lambda x: '-'.join(dict.fromkeys(x.dropna()))
df['New_assignation'] = df.groupby('Groups')['Assignation'].transform(f).replace('', np.nan)
print (df)
   Groups         Species Assignation  New_assignation
0      G1    Homo_sapiens     assign1  assign1-assign3
1      G1    Homo_sapiens     assign1  assign1-assign3
2      G1    Homo_sapiens     assign3  assign1-assign3
3      G1     Danio_rerio         NaN  assign1-assign3
4      G1    Mus_musculus         NaN  assign1-assign3
5      G1  Apis_mellifera         NaN  assign1-assign3
6      G2     Danio_rerio         NaN          assign4
7      G2    Homo_sapiens     assign4          assign4
8      G2   Cavius_percel         NaN          assign4
9      G3     Danio_rerio         NaN              NaN
10     G3    Mus_musculus         NaN              NaN
  • Related