I have a dataframe such as :
Groups Species_1 Species_2
G1 SP1_A SP1
G1 SP1B SP1
G1 SP2_AZ SP2
G1 SP3_12:A SP3
G1 SP4-2 SP4
G2 SP1_2 SP2
G2 SP3:21 SP3
G3 SP5(2) SP5
And I would like to create a new dataframe with a for each Species_2
values create a column where for each Groups
I add a row with the corresponding Species_1
values.
If there are multiple Species_1
, I separate them with a ;
in the cell as in the G1-SP1 example.
I should then get the following result;
Groups SP1 SP2 SP3 SP4 SP5
G1 SP1_A;SP1B SP2_AZ SP3_12:A SP4-2 NA
G2 NA SP1_2 SP3:21 NA NA
G3 NA NA NA NA SP5(2)
Here is the dataframe in dict format if it can help :
{'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G1', 5: 'G2', 6: 'G2', 7: 'G3'}, 'Species_1': {0: 'SP1_A', 1: 'SP1B', 2: 'SP2_AZ', 3: 'SP3_12:A', 4: 'SP4-2', 5: 'SP1_2', 6: 'SP3:21', 7: 'SP5(2)'}, 'Species_2': {0: 'SP1', 1: 'SP1', 2: 'SP2', 3: 'SP3', 4: 'SP4', 5: 'SP2', 6: 'SP3', 7: 'SP5'}}
Thanks a lot for your help
CodePudding user response:
Try this:
df1 = df.groupby(['Groups', 'Species_2'], as_index=False).agg({'Species_1': lambda x: ';'.join(x)})
df1
> Groups Species_2 Species_1
0 G1 SP1 SP1_A;SP1B
1 G1 SP2 SP2_AZ
2 G1 SP3 SP3_12:A
3 G1 SP4 SP4-2
4 G2 SP2 SP1_2
5 G2 SP3 SP3:21
6 G3 SP5 SP5(2)
df1.pivot(index='Groups', columns=['Species_2'], values='Species_1')
>Species_2 SP1 SP2 SP3 SP4 SP5
Groups
G1 SP1_A;SP1B SP2_AZ SP3_12:A SP4-2 NaN
G2 NaN SP1_2 SP3:21 NaN NaN
G3 NaN NaN NaN NaN SP5(2)
CodePudding user response:
Try with pivot_table
out =df.pivot_table(index ='Groups', columns = 'Species_2',values = 'Species_2',aggfunc = ';'.join).reset_index()
Out[73]:
Species_2 Groups SP1 SP2 SP3 SP4 SP5
0 G1 SP1_A;SP1B SP2_AZ SP3_12:A SP4-2 NaN
1 G2 NaN SP1_2 SP3:21 NaN NaN
2 G3 NaN NaN NaN NaN SP5(2)