I have a dataframe such as
Groups Species Assignation
G1 Homo_sapiens assign1
G1 Homo_sapiens assign2
G1 Homo_sapiens assign3
G1 Danio_rerio NA
G1 Mus_musculus NA
G1 Apis_mellifera NA
G2 Danio_rerio NA
G2 Homo_sapiens assign4
G2 Cavius_percel NA
G3 Danio_rerio NA
G3 Mus_musculus NA
And I would like to create a New_assignation
column where for each Groups
if there is multiple Assignation
, I merge them separated by a "-"
such as:
Groups Species Assignation New_assignation
G1 Homo_sapiens assign1 assign1-assign2-assign3
G1 Homo_sapiens assign2 assign1-assign2-assign3
G1 Homo_sapiens assign3 assign1-assign2-assign3
G1 Danio_rerio NA assign1-assign2-assign3
G1 Mus_musculus NA assign1-assign2-assign3
G1 Apis_mellifera NA assign1-assign2-assign3
G2 Danio_rerio NA assign4
G2 Homo_sapiens assign4 assign4
G2 Cavius_percel NA assign4
G3 Danio_rerio NA NA
G3 Mus_musculus NA NA
Here is the dict format of the dataframe if it can helps:
{'Groups ': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G1', 5: 'G1', 6: 'G2', 7: 'G2', 8: 'G2', 9: 'G3', 10: 'G3'}, 'Species': {0: 'Homo_sapiens', 1: 'Homo_sapiens', 2: 'Homo_sapiens', 3: 'Danio_rerio', 4: 'Mus_musculus', 5: 'Apis_mellifera', 6: 'Danio_rerio ', 7: 'Homo_sapiens', 8: 'Cavius_percel', 9: 'Danio_rerio', 10: 'Mus_musculus'}, 'Assignation': {0: 'assign1', 1: 'assign2', 2: 'assign3', 3: nan, 4: nan, 5: 'NA ', 6: nan, 7: 'assign4', 8: nan, 9: nan, 10: nan}}
CodePudding user response:
Use GroupBy.transform
with lambda function for join
and remove missing values:
f = lambda x: '-'.join(x.dropna())
df['New_assignation'] = df.groupby('Groups')['Assignation'].transform(f).replace('', np.nan)
print (df)
Groups Species Assignation New_assignation
0 G1 Homo_sapiens assign1 assign1-assign2-assign3
1 G1 Homo_sapiens assign2 assign1-assign2-assign3
2 G1 Homo_sapiens assign3 assign1-assign2-assign3
3 G1 Danio_rerio NaN assign1-assign2-assign3
4 G1 Mus_musculus NaN assign1-assign2-assign3
5 G1 Apis_mellifera NaN assign1-assign2-assign3
6 G2 Danio_rerio NaN assign4
7 G2 Homo_sapiens assign4 assign4
8 G2 Cavius_percel NaN assign4
9 G3 Danio_rerio NaN NaN
10 G3 Mus_musculus NaN NaN
If need also remove duplicated values in original order use dict.fromkeys
trick:
print (df)
Groups Species Assignation
0 G1 Homo_sapiens assign1
1 G1 Homo_sapiens assign1
2 G1 Homo_sapiens assign3
3 G1 Danio_rerio NaN
4 G1 Mus_musculus NaN
5 G1 Apis_mellifera NaN
6 G2 Danio_rerio NaN
7 G2 Homo_sapiens assign4
8 G2 Cavius_percel NaN
9 G3 Danio_rerio NaN
10 G3 Mus_musculus NaN
f = lambda x: '-'.join(dict.fromkeys(x.dropna()))
df['New_assignation'] = df.groupby('Groups')['Assignation'].transform(f).replace('', np.nan)
print (df)
Groups Species Assignation New_assignation
0 G1 Homo_sapiens assign1 assign1-assign3
1 G1 Homo_sapiens assign1 assign1-assign3
2 G1 Homo_sapiens assign3 assign1-assign3
3 G1 Danio_rerio NaN assign1-assign3
4 G1 Mus_musculus NaN assign1-assign3
5 G1 Apis_mellifera NaN assign1-assign3
6 G2 Danio_rerio NaN assign4
7 G2 Homo_sapiens assign4 assign4
8 G2 Cavius_percel NaN assign4
9 G3 Danio_rerio NaN NaN
10 G3 Mus_musculus NaN NaN