Home > OS >  Transpose and Compare | Python
Transpose and Compare | Python

Time:05-29

I'm attempting to compare two data frames. Item and Summary variables correspond to various dates and quantities. I'd like to transpose the dates into one column of data along with the associated quantities. I'd then like to compare the two data frames and see what changed from PreviousData to CurrentData.

Previous Data:

PreviousData = { 'Item' : ['abc','def','ghi','jkl','mno','pqr','stu','vwx','yza','uaza','fupa'],
                'Summary' : ['party','weekend','food','school','tv','photo','camera','python','r','rstudio','spyder'],
                '2022-01-01' : [1, np.nan, np.nan, 1.0, np.nan, 1.0, np.nan, np.nan, np.nan,np.nan,2],
                '2022-02-01' : [1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
                '2022-03-01' : [np.nan,np.nan,np.nan,1,np.nan,np.nan,1,np.nan,np.nan,np.nan,np.nan],
                '2022-04-01' : [np.nan,np.nan,3,np.nan,np.nan,3,np.nan,np.nan,np.nan,np.nan,np.nan],
                '2022-05-01' : [np.nan,np.nan,np.nan,3,np.nan,np.nan,2,np.nan,np.nan,3,np.nan],
                '2022-06-01' : [np.nan,np.nan,np.nan,np.nan,2,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
                '2022-07-01' : [np.nan,1,np.nan,np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan,np.nan],
                '2022-08-01' : [np.nan,np.nan,np.nan,1,np.nan,1,np.nan,np.nan,np.nan,np.nan,np.nan],
                '2022-09-01' : [np.nan,1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,1,np.nan],
                '2022-10-01' : [np.nan,np.nan,1,np.nan,np.nan,1,np.nan,np.nan,np.nan,np.nan,np.nan],
                '2022-11-01' : [np.nan,2,np.nan,np.nan,1,1,1,np.nan,np.nan,np.nan,np.nan],
                '2022-12-01' : [np.nan,np.nan,np.nan,np.nan,3,np.nan,np.nan,2,np.nan,np.nan,np.nan],
                '2023-01-01' : [np.nan,np.nan,1,np.nan,1,np.nan,np.nan,np.nan,2,np.nan,np.nan],
                '2023-02-01' : [np.nan,np.nan,np.nan,2,np.nan,2,np.nan,np.nan,np.nan,np.nan,np.nan],
                '2023-03-01' : [np.nan,3,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
                '2023-04-01' : [np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan],
                '2023-05-01' : [np.nan,np.nan,2,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,2,np.nan],
                '2023-06-01' : [1,1,np.nan,np.nan,9,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
                '2023-07-01' : [np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
                '2023-08-01' : [np.nan,1,np.nan,np.nan,1,np.nan,1,np.nan,np.nan,np.nan,np.nan],
                '2023-09-01' : [np.nan,1,1,np.nan,np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan],
               }
PreviousData = pd.DataFrame(PreviousData)

PreviousData

Current Data:

CurrentData = { 'Item' : ['ghi','stu','abc','mno','jkl','pqr','def','vwx','yza'],
               'Summary' : ['food','camera','party','tv','school','photo','weekend','python','r'],
                '2022-01-01' : [3, np.nan, np.nan, 1.0, np.nan, 1.0, np.nan, np.nan, np.nan],
                '2022-02-01' : [np.nan,1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
                '2022-03-01' : [np.nan,1,1,1,np.nan,np.nan,np.nan,np.nan,np.nan],
                '2022-04-01' : [np.nan,np.nan,1,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
                '2022-05-01' : [np.nan,np.nan,3,np.nan,np.nan,np.nan,np.nan,np.nan,np.nan],
                '2022-06-01' : [2,np.nan,np.nan,np.nan,4,np.nan,np.nan,np.nan,np.nan],
                '2022-07-01' : [np.nan,np.nan,np.nan,np.nan,np.nan,4,np.nan,np.nan,np.nan],
                '2022-08-01' : [np.nan,np.nan,3,np.nan,4,np.nan,np.nan,np.nan,np.nan],
                '2022-09-01' : [np.nan,np.nan,3,3,3,np.nan,np.nan,5,5],
                '2022-10-01' : [np.nan,np.nan,np.nan,np.nan,5,np.nan,np.nan,np.nan,np.nan],
                '2022-11-01' : [np.nan,np.nan,np.nan,5,np.nan,np.nan,np.nan,np.nan,np.nan],
                '2022-12-01' : [np.nan,4,np.nan,np.nan,np.nan,1,np.nan,np.nan,np.nan],
                '2023-01-01' : [np.nan,np.nan,np.nan,np.nan,1,1,np.nan,np.nan,np.nan],
                '2023-02-01' : [np.nan,np.nan,np.nan,2,1,np.nan,np.nan,np.nan,np.nan],
                '2023-03-01' : [np.nan,np.nan,np.nan,np.nan,2,np.nan,2,np.nan,2],
                '2023-04-01' : [np.nan,np.nan,np.nan,np.nan,np.nan,2,np.nan,np.nan,2],
               }
CurrentData = pd.DataFrame(CurrentData)
CurrentData

As requested, here's an example of a difference:

enter image description here

Any tips on how to transpose and compare these two sets would be greatly appreciated!

CodePudding user response:

One way of doing this is the following. Transpose both dataframes:

PreviousData_t = PreviousData.melt(id_vars=["Item", "Summary"], 
    var_name="Date", 
    value_name="value1")

which is

   Item  Summary        Date  value1
0     abc    party  2022-01-01     1.0
1     def  weekend  2022-01-01     NaN
2     ghi     food  2022-01-01     NaN
3     jkl   school  2022-01-01     1.0
4     mno       tv  2022-01-01     NaN
..    ...      ...         ...     ...
226   stu   camera  2023-09-01     NaN
227   vwx   python  2023-09-01     1.0
228   yza        r  2023-09-01     NaN
229  uaza  rstudio  2023-09-01     NaN
230  fupa   spyder  2023-09-01     NaN

and

CurrentData_t = CurrentData.melt(id_vars=["Item", "Summary"], 
    var_name="Date", 
    value_name="value2")

  Item  Summary        Date  value2
0    ghi     food  2022-01-01     3.0
1    stu   camera  2022-01-01     NaN
2    abc    party  2022-01-01     NaN
3    mno       tv  2022-01-01     1.0
4    jkl   school  2022-01-01     NaN
..   ...      ...         ...     ...
139  jkl   school  2023-04-01     NaN
140  pqr    photo  2023-04-01     2.0
141  def  weekend  2023-04-01     NaN
142  vwx   python  2023-04-01     NaN
143  yza        r  2023-04-01     2.0

[144 rows x 4 columns]

THen merge:

Compare = PreviousData_t.merge(CurrentData_t, on =['Date','Item','Summary'], how = 'left')
     Item  Summary        Date  value1  value2
0     abc    party  2022-01-01     1.0     NaN
1     def  weekend  2022-01-01     NaN     NaN
2     ghi     food  2022-01-01     NaN     3.0
3     jkl   school  2022-01-01     1.0     NaN
4     mno       tv  2022-01-01     NaN     1.0
..    ...      ...         ...     ...     ...
226   stu   camera  2023-09-01     NaN     NaN
227   vwx   python  2023-09-01     1.0     NaN
228   yza        r  2023-09-01     NaN     NaN
229  uaza  rstudio  2023-09-01     NaN     NaN
230  fupa   spyder  2023-09-01     NaN     NaN

[231 rows x 5 columns]

and compare by creating a column marking differences

Compare['diff'] = np.where(Compare['value1']!=Compare['value2'], 1,0)

Item  Summary        Date  value1  value2  diff
0     abc    party  2022-01-01     1.0     NaN     1
1     def  weekend  2022-01-01     NaN     NaN     1
2     ghi     food  2022-01-01     NaN     3.0     1
3     jkl   school  2022-01-01     1.0     NaN     1
4     mno       tv  2022-01-01     NaN     1.0     1
..    ...      ...         ...     ...     ...   ...
226   stu   camera  2023-09-01     NaN     NaN     1
227   vwx   python  2023-09-01     1.0     NaN     1
228   yza        r  2023-09-01     NaN     NaN     1
229  uaza  rstudio  2023-09-01     NaN     NaN     1
230  fupa   spyder  2023-09-01     NaN     NaN     1

[231 rows x 6 columns]

If you only want to compare those entries that are common to both, do this:

Compare = PreviousData_t.merge(CurrentData_t, on =['Date','Item','Summary'])

Compare['diff'] = np.where(Compare['value1']!=Compare['value2'], 1,0)

 Item  Summary        Date  value1  value2  diff
0    abc    party  2022-01-01     1.0     NaN     1
1    def  weekend  2022-01-01     NaN     NaN     1
2    ghi     food  2022-01-01     NaN     3.0     1
3    jkl   school  2022-01-01     1.0     NaN     1
4    mno       tv  2022-01-01     NaN     1.0     1
..   ...      ...         ...     ...     ...   ...
139  mno       tv  2023-04-01     NaN     NaN     1
140  pqr    photo  2023-04-01     NaN     2.0     1
141  stu   camera  2023-04-01     NaN     NaN     1
142  vwx   python  2023-04-01     1.0     NaN     1
143  yza        r  2023-04-01     NaN     2.0     1

[144 rows x 6 columns]
  • Related