(Question rewritten as per comment-suggestions)
Suppose I have data like this:
{
2012: [ ('A', 9), ('C', 7), ('D', 4) ],
2013: [ ('B', 7), ('C', 6), ('E', 1) ]
}
How would I construct a dataframe that will account for the 'missing columns' in the rows?
i.e.
year A B C D E
0 2012 9 0 7 4 0
1 2013 0 7 6 0 1
I suppose I can perform a trivial preliminary manipulation to get:
[
[ ('year', 2012), ('A', 9), ('C', 7), ('D', 4) ],
[ ('year', 2013), ('B', 7), ('C', 6), ('E', 1) ]
]
CodePudding user response:
You could first apply the method suggested in this post
by @jezrael
, create a df
with the standard constructor, and then use df.pivot
to get the df
in the desired shape:
import pandas as pd
data = {
2012: [ ('A', 9), ('C', 7), ('D', 4) ],
2013: [ ('B', 7), ('C', 6), ('E', 1) ]
}
L = [(k, *t) for k, v in data.items() for t in v]
df = pd.DataFrame(L).rename(columns={0:'year'})\
.pivot(index='year', columns=1, values=2).fillna(0).reset_index(drop=False)
df.columns.name = None
print(df)
year A B C D E
0 2012 9.0 0.0 7.0 4.0 0.0
1 2013 0.0 7.0 6.0 0.0 1.0
If the values are all ints
, you could do .fillna(0).astype(int).reset_index(drop=False)
, of course.
CodePudding user response:
import pandas as pd
data = {
2012: [ ('A', 9), ('C', 7), ('D', 4) ],
2013: [ ('B', 7), ('C', 6), ('E', 1) ]
}
L = [(k, *t) for k, v in data.items() for t in v]
df = pd.DataFrame(L).rename(columns={0:'year'})\
.pivot(index='year', columns=1, values=2).fillna(0).reset_index(drop=False)
df.columns.name = None
df = df.astype({'A':'int','B':'int','C':'int','D':'int','E':'int'})