Home > Mobile >  Transforme a dataframe with a column for each value within a colum in pandas
Transforme a dataframe with a column for each value within a colum in pandas

Time:07-13

I have a dataframe such as :

Groups  Species_1 Species_2
G1      SP1_A     SP1
G1      SP1B      SP1
G1      SP2_AZ    SP2
G1      SP3_12:A  SP3
G1      SP4-2     SP4
G2      SP1_2     SP2
G2      SP3:21    SP3
G3      SP5(2)    SP5

And I would like to create a new dataframe with a for each Species_2 values create a column where for each Groups I add a row with the corresponding Species_1 values.

If there are multiple Species_1, I separate them with a ; in the cell as in the G1-SP1 example.

I should then get the following result;

Groups SP1          SP2     SP3      SP4   SP5
G1     SP1_A;SP1B   SP2_AZ  SP3_12:A SP4-2 NA
G2     NA           SP1_2   SP3:21   NA    NA
G3     NA           NA      NA       NA    SP5(2)

Here is the dataframe in dict format if it can help :

{'Groups': {0: 'G1', 1: 'G1', 2: 'G1', 3: 'G1', 4: 'G1', 5: 'G2', 6: 'G2', 7: 'G3'}, 'Species_1': {0: 'SP1_A', 1: 'SP1B', 2: 'SP2_AZ', 3: 'SP3_12:A', 4: 'SP4-2', 5: 'SP1_2', 6: 'SP3:21', 7: 'SP5(2)'}, 'Species_2': {0: 'SP1', 1: 'SP1', 2: 'SP2', 3: 'SP3', 4: 'SP4', 5: 'SP2', 6: 'SP3', 7: 'SP5'}}

Thanks a lot for your help

CodePudding user response:

Try with pivot_table

out =df.pivot_table(index ='Groups', columns = 'Species_2',values = 'Species_2',aggfunc = ';'.join).reset_index()
Out[73]: 
Species_2 Groups         SP1     SP2       SP3    SP4     SP5
0             G1  SP1_A;SP1B  SP2_AZ  SP3_12:A  SP4-2     NaN
1             G2         NaN   SP1_2    SP3:21    NaN     NaN
2             G3         NaN     NaN       NaN    NaN  SP5(2)

CodePudding user response:

Try this:

df1 = df.groupby(['Groups', 'Species_2'], as_index=False).agg({'Species_1': lambda x: ';'.join(x)})
df1
>   Groups  Species_2   Species_1
0   G1  SP1 SP1_A;SP1B
1   G1  SP2 SP2_AZ
2   G1  SP3 SP3_12:A
3   G1  SP4 SP4-2
4   G2  SP2 SP1_2
5   G2  SP3 SP3:21
6   G3  SP5 SP5(2)

df1.pivot(index='Groups', columns=['Species_2'], values='Species_1')
>Species_2  SP1 SP2 SP3 SP4 SP5
Groups                  
G1  SP1_A;SP1B  SP2_AZ  SP3_12:A    SP4-2   NaN
G2  NaN SP1_2   SP3:21  NaN NaN
G3  NaN NaN NaN NaN SP5(2)

  • Related