transpose pandas df based on multiple header rows-CodePudding

I have the following df from a vendor:

Unnamed: 0  Unnamed: 1  Unnamed: 2  agg metrics 10/20/22    10/20/22    10/21/22    10/21/22
title     content title season      episode     start       hours       start       hours
book      blue          1           3           2           2           5           2
movie     orange        2           4           11          4           7           4

I need the output like this:

title   content title   season  episode date    start   hours
book    blue            1       3       10/20/22    2   2
book    blue            1       3       10/21/22    5   2
movie   orange          2       4       10/20/22    11  4
movie   orange          2       4       10/21/22    7   4

df = pd.read_csv('file')
df = df.drop(labels=0, axis=0)
df1 = df.melt(['Unnamed: 0','Unnamed: 1', 'Unnamed: 2', 'agg metrics'],var_name='Date', value_name='Value')

but this doesn't return the proper output. apologies for not knowing how to represent this properly. hopefully my IP/OP helps.

Essentially, i'm having trouble transposing multiple headers.

Thanks for your help!

CodePudding user response：

You could do this and this is what QuangHoang's thought too I believe:

# Read csv with top two rows as headers resulting in multiindex, from your code I figure 
# you are not doing that.
df = pd.read_csv(
    StringIO(
        """
Unnamed: 0,Unnamed: 1,Unnamed: 2,agg metrics,10/20/22,10/20/22,10/21/22,10/21/22
title,content title,season,episode,start,hours,start,hours
book,blue,1,3,2,2,5,2
movie,orange,2,4,11,4,7,4
        """
    ),
    header=[0, 1],
)

# Then filter columns that are date like and stack at level 0 and reset_index
t = df.filter(regex="\d /\d /\d ")
t1 = t.stack(0).rename_axis(["", "date"]).reset_index(1)

# Then get other columns and reindex to the index of the intermediate output you got above.
t2 = df[df.columns.difference(t.columns)].droplevel(0, axis=1).reindex(t1.index)

# Then concat both along axis 1
out = pd.concat([t2, t1], axis=1)

print(out)

   title content title  season  episode      date  hours  start
                                                               
0   book          blue       1        3  10/20/22      2      2
0   book          blue       1        3  10/21/22      2      5
1  movie        orange       2        4  10/20/22      4     11
1  movie        orange       2        4  10/21/22      4      7

CodePudding user response：

Here's an example of what I mean:

# mock csv file with StringIO
s = StringIO('''
Unnamed: 0  Unnamed: 1  Unnamed: 2  agg metrics  10/20/22    10/20/22    10/21/22    10/21/22
title     content title  season      episode     start       hours       start       hours
book      blue          1           3           2           2           5           2
movie     orange        2           4           11          4           7           4
''')

# forget `sep` argument if your file is Comma Separated Value
df = pd.read_csv(s, sep='\s\s ', header=[0,1], index_col=[0,1,2,3])
df.stack(level=0).reset_index()

Output (rename your columns accordingly):

title level_0 level_1  level_2  level_3 Unnamed: 0  hours  start
0        book    blue        1        3   10/20/22      2      2
1        book    blue        1        3   10/21/22      2      5
2       movie  orange        2        4   10/20/22      4     11
3       movie  orange        2        4   10/21/22      4      7