Home > Software design >  Create dataframe from rows which comprise differing column names
Create dataframe from rows which comprise differing column names

Time:10-02

(Question rewritten as per comment-suggestions)

Suppose I have data like this:

{
    2012: [ ('A', 9), ('C', 7), ('D', 4) ],
    2013: [ ('B', 7), ('C', 6), ('E', 1) ]
}

How would I construct a dataframe that will account for the 'missing columns' in the rows?

i.e.

  year A B C D E
0 2012 9 0 7 4 0
1 2013 0 7 6 0 1

I suppose I can perform a trivial preliminary manipulation to get:

[
    [ ('year', 2012), ('A', 9), ('C', 7), ('D', 4) ],
    [ ('year', 2013), ('B', 7), ('C', 6), ('E', 1) ]
]

CodePudding user response:

You could first apply the method suggested in this post by @jezrael, create a df with the standard constructor, and then use df.pivot to get the df in the desired shape:

import pandas as pd

data = {
    2012: [ ('A', 9), ('C', 7), ('D', 4) ],
    2013: [ ('B', 7), ('C', 6), ('E', 1) ]
}

L = [(k, *t) for k, v in data.items() for t in v]

df = pd.DataFrame(L).rename(columns={0:'year'})\
    .pivot(index='year', columns=1, values=2).fillna(0).reset_index(drop=False)
df.columns.name = None

print(df)

   year    A    B    C    D    E
0  2012  9.0  0.0  7.0  4.0  0.0
1  2013  0.0  7.0  6.0  0.0  1.0

If the values are all ints, you could do .fillna(0).astype(int).reset_index(drop=False), of course.

CodePudding user response:

import pandas as pd

data = {
    2012: [ ('A', 9), ('C', 7), ('D', 4) ],
    2013: [ ('B', 7), ('C', 6), ('E', 1) ]
}

L = [(k, *t) for k, v in data.items() for t in v]

df = pd.DataFrame(L).rename(columns={0:'year'})\
    .pivot(index='year', columns=1, values=2).fillna(0).reset_index(drop=False)
df.columns.name = None

df = df.astype({'A':'int','B':'int','C':'int','D':'int','E':'int'})
  • Related