I have a datastructure like this:
lst = ['name, age, sex, height, weight',
'underweight,overweight,normal',
'David, 22, M, 185, -,-,78',
'Lily, 18, F, 165,-,75,-',
..............................]
The weight is categorized as three more columns (the second row in the list). How can I write it to a pandas dataframe.
What I have done is writing the list as dataframe using :
pd.DataFrame(lst)
But this is not the whole solution, it has more complicated logic.
Help me out please
CodePudding user response:
The output you expect is not fully clear, but you can preprocess your data with a list comprehension:
lst2 = [list(map(str.strip, e.split(','))) for e in lst] # split on commas
pd.DataFrame(lst2[2:], columns=lst2[0][:-1] lst2[1]) # use first 2 item to build header
# rest is data
output:
name age sex height underweight overweight normal
0 David 22 M 185 - - 78
1 Lily 18 F 165 - 75 -
MultiIndex
Although feasible, I don't recommend this, it be be much harder to work with:
lst2 = [list(map(str.strip, e.split(','))) for e in lst]
cols = pd.MultiIndex.from_arrays([lst2[0][:-1] [lst2[0][-1]]*3,
['']*4 lst2[1]])
pd.DataFrame(lst2[2:], columns=cols)
output:
name age sex height weight
underweight overweight normal
0 David 22 M 185 - - 78
1 Lily 18 F 165 - 75 -