I have a table such as:
Groups SP1 SP2 SP3 SP4_1 SP4_2 SP5_1 SP5_2
G1 3 4 NA 2 4 2 1
G2 NA 1 NA 3 NA NA NA
G3 1 2 NA NA NA 8 NA
G4 4 6 NA NA NA NA NA
G5 8 9 NA NA NA NA 2
And I would like to sweep that table into:
G1 G2 G3 G4 G5
SP1 SP1-3 NA SP1-1 SP1-4 SP1-8
SP2 SP2-4 SP2-1 SP2-2 SP2-6 SP2-9
SP3 NA NA NA NA NA
SP4 SP4_1-2;SP4_2-4 SP4_1-3 NA NA NA
SP5 SP5_1-2;SP5_2-1 NA SP5_1-8 NA SP5_2-2
Let me explain:
Let's take the G1 to explain,
The Idea is first to create a new column G1
and add all SPn
present as rows:
G1
SP1
SP2
SP3
SP4
SP5
Then, in G1
, I have one value for SP1 which is 3, then I add a row SP1-3
G1
SP1 SP1-3
SP2
SP3
SP4
SP5
I have one value for SP2 which is 4, then I add a row SP1-4
G1
SP1 SP1-3
SP2 SP1-4
SP3
SP4
SP5
I have no value for SP3
G1
SP1 SP1-3
SP2 SP1-4
SP3 NA
SP4
SP5
I have two values for SP4 which are 2 in SP4_1 and 4 in SP4_2, then I merge them by a semicolon ";" within the cell and add a row SP4_1-2;SP4_2-4
G1
SP1 SP1-3
SP2 SP1-4
SP3 NA
SP4 SP4_1-2;SP4_2-4
SP5
And finally I have two values for SP5 which are 2 in SP5_1 and 1 in SP5_2, then I merge them by a semicolon ";" within the cell and add a row SP5_1-2;SP5_2-1
G1
SP1 SP1-3
SP2 SP1-4
SP3 NA
SP4 SP4_1-2;SP4_2-4
SP5 SP5_1-2;SP5_2-1
And so on for the other groups.
Does someone have an idea using python please?
CodePudding user response:
so this is my attempt, doesn't look nice but seems working:
t = df.melt('Groups')
t['val'] = t['variable'].str.cat(t['value'].dropna().astype(str),sep='-')
t['col'] = t['variable'].str[:3]
def f(x):
return x.dropna().str.cat(sep=';') or pd.NA
res = t.pivot_table('val','col','Groups',f)
print(res)
'''
Groups G1 G2 G3 G4 G5
col
SP1 SP1-3.0 NaN SP1-1.0 SP1-4.0 SP1-8.0
SP2 SP2-4.0 SP2-1.0 SP2-2.0 SP2-6.0 SP2-9.0
SP4 SP4_1-2.0;SP4_2-4.0 SP4_1-3.0 NaN NaN NaN
SP5 SP5_1-2.0;SP5_2-1.0 NaN SP5_1-8.0 NaN SP5_2-2.0