Home > Software design >  Merge dataframes and complete indexes
Merge dataframes and complete indexes

Time:04-29

I have 2 dataframes like:

df1
    (index)
    start_time   user   class
    10/20/1998   0.13   worker
    10/21/1998   1341   worker
    10/22/1998   1303   worker

df2
    (index)
    start_time  user   class
    10/20/1998   192   empl
    10/21/1998   12    empl
    10/23/1998   19    empl

And I want to merge them, completing the indexes that are missing/different as multiindex and keeping one column for both:

(multindex)
class  start_time        user
worker 10/20/1998        0.13  
empl   10/20/1998        192  
worker 10/21/1998        1341   
empl   10/21/1998        12   
worker 10/22/1998        1303   
empl   10/23/1998        19   

CodePudding user response:

Use concat with DataFrame.set_index and append=True for MultiIndex, for swapping levels is used DataFrame.swaplevel:

df = pd.concat([df1.set_index('class', append=True),
                df2.set_index('class', append=True)]).swaplevel(1, 0)
print (df)
                      user
class  start_time         
worker 10/20/1998     0.13
       10/21/1998  1341.00
       10/22/1998  1303.00
empl   10/20/1998   192.00
       10/21/1998    12.00
       10/23/1998    19.00

If ordering is important create DatetimeIndex and sorting by second level:

df1.index = pd.to_datetime(df1.index)
df2.index = pd.to_datetime(df2.index)

df = (pd.concat([df1.set_index('class', append=True),
                df2.set_index('class', append=True)])
        .swaplevel(1, 0)
        .sort_index(level=1, sort_remaining=False))
print (df)
                      user
class  start_time         
worker 1998-10-20     0.13
empl   1998-10-20   192.00
worker 1998-10-21  1341.00
empl   1998-10-21    12.00
worker 1998-10-22  1303.00
empl   1998-10-23    19.00

CodePudding user response:

You can try reset the start_time index and then set the class and start_time columns as index.

df = pd.concat([df1.reset_index().set_index(['class', 'start_time']),
                df2.reset_index().set_index(['class', 'start_time'])])
print(df)

                      user
class  start_time
worker 10/20/1998     0.13
       10/21/1998  1341.00
       10/22/1998  1303.00
empl   10/20/1998   192.00
       10/21/1998    12.00
       10/23/1998    19.00
  • Related