I have the following dataframe:
d1 = [{'AB':['Record=1','POS=2'], 'AVF':[], 'HI':['Record=343', 'POS=4'], 'version_1':[]},
{'AB':[], 'AVF':['Record=4', 'POS=454'], 'AVF':[], 'HI':[], 'version_2':[]},
{'AB':['Record=4', 'POS=32'], 'AVF':[], 'HI':[], 'version_3':[]}]
frame = pd.DataFrame(d1)
AB AVF HI version_1 version_2 version_3
0 [Record=1, POS=2] [] [Record=343, POS=4] [] NaN NaN
1 [] [] [] NaN [] NaN
2 [Record=4, POS=32][] [] NaN NaN []
Desired output:
AB AVF HI
version_1 [Record=1, POS=2] [] [Record=343, POS=4]
version_2 [] [] []
version_3 [Record=4, POS=32][] []
I have tried:
frame.set_index(['version1','version2','version3']).stack()
The real dictionary is very large with over 30 versions so simply typing out the version numbers into a list is not an option.
thanks
CodePudding user response:
Try this:
frame.melt(['AB', 'AVF', 'HI']).dropna().drop('value', axis=1).set_index('variable')
Output:
AB AVF HI
variable
version_1 [Record=1, POS=2] [] [Record=343, POS=4]
version_2 [] [] []
version_3 [Record=4, POS=32] [] []