Home > Back-end >  Sweep a table in Python in a particular way
Sweep a table in Python in a particular way

Time:11-21

I have a table such as:

Groups SP1 SP2 SP3 SP4_1 SP4_2 SP5_1 SP5_2
G1     3   4   NA  2     4     2     1
G2     NA  1   NA  3     NA    NA    NA
G3     1   2   NA  NA    NA    8     NA
G4     4   6   NA  NA    NA    NA    NA
G5     8   9   NA  NA    NA    NA    2 

And I would like to sweep that table into:

       G1               G2       G3       G4     G5 
SP1    SP1-3            NA       SP1-1    SP1-4  SP1-8
SP2    SP2-4            SP2-1    SP2-2    SP2-6  SP2-9
SP3    NA               NA       NA       NA     NA
SP4    SP4_1-2;SP4_2-4  SP4_1-3  NA       NA     NA
SP5    SP5_1-2;SP5_2-1  NA       SP5_1-8  NA     SP5_2-2

Let me explain:

Let's take the G1 to explain, The Idea is first to create a new column G1 and add all SPn present as rows:

     G1  
SP1  
SP2    
SP3    
SP4    
SP5    

Then, in G1, I have one value for SP1 which is 3, then I add a row SP1-3

     G1  
SP1  SP1-3
SP2    
SP3    
SP4    
SP5    

I have one value for SP2 which is 4, then I add a row SP1-4

     G1  
SP1  SP1-3
SP2  SP1-4
SP3    
SP4    
SP5 

I have no value for SP3

     G1  
SP1  SP1-3
SP2  SP1-4
SP3  NA
SP4    
SP5 

I have two values for SP4 which are 2 in SP4_1 and 4 in SP4_2, then I merge them by a semicolon ";" within the cell and add a row SP4_1-2;SP4_2-4

     G1  
SP1  SP1-3
SP2  SP1-4
SP3  NA
SP4  SP4_1-2;SP4_2-4
SP5 

And finally I have two values for SP5 which are 2 in SP5_1 and 1 in SP5_2, then I merge them by a semicolon ";" within the cell and add a row SP5_1-2;SP5_2-1

     G1  
SP1  SP1-3
SP2  SP1-4
SP3  NA
SP4  SP4_1-2;SP4_2-4
SP5  SP5_1-2;SP5_2-1

And so on for the other groups.

Does someone have an idea using python please?

CodePudding user response:

so this is my attempt, doesn't look nice but seems working:

t = df.melt('Groups')
t['val'] = t['variable'].str.cat(t['value'].dropna().astype(str),sep='-')
t['col'] = t['variable'].str[:3]

def f(x):
    return x.dropna().str.cat(sep=';') or pd.NA

res = t.pivot_table('val','col','Groups',f)

print(res)
'''
Groups                   G1         G2         G3       G4         G5
col                                                                  
SP1                 SP1-3.0        NaN    SP1-1.0  SP1-4.0    SP1-8.0
SP2                 SP2-4.0    SP2-1.0    SP2-2.0  SP2-6.0    SP2-9.0
SP4     SP4_1-2.0;SP4_2-4.0  SP4_1-3.0        NaN      NaN        NaN
SP5     SP5_1-2.0;SP5_2-1.0        NaN  SP5_1-8.0      NaN  SP5_2-2.0
  • Related