Home > Blockchain >  Two Dataframes have same values and dtypes but are still not equal under df1.equals(df2)
Two Dataframes have same values and dtypes but are still not equal under df1.equals(df2)

Time:08-13

I am testing new functionality for a project im on and I can't get the test to run with the new functionality. The code is functioning as intended and the outputs have identical values and dtypes as far as I can tell. What is going on here?

Sorry for a lot of code here, but this is as minimal as I think I can make it. It should run just fine in a jupyter notebook cell or .py file.

import pandas as pd
import numpy as np
from datetime import date as dtm_date

def days_since_dec_30_1899(input_date:pd._libs.tslibs.timestamps.Timestamp) -> str:
    date_list = [int(i) for i in str(input_date).split(' ')[0].split('-')]
    return str((dtm_date(date_list[0], date_list[1], date_list[2]) - 
                dtm_date(1899, 12, 30)).days)

def create_combo_npi_and_date_col(data:pd.core.frame.DataFrame,
                                  date_name:str = 'Date'
) -> pd.core.frame.DataFrame:
    temp = data.copy()
    days_since_1899 = temp[date_name].apply(lambda date: str(days_since_dec_30_1899(date)))
    
    # deals with float->str issue of having .0 at the end
    if str(temp['NPI'].dtype) == 'float64':
        temp['NPI'] = temp['NPI'].astype('Int64')
    
    temp['Combo (NPI & Date)'] = temp['NPI'].astype('str')   days_since_1899
    temp['Combo (NPI & Date)'] = temp['Combo (NPI & Date)']\
                                 .apply(lambda x : x if '<NA>' not in x else np.nan)
    temp['NPI'] = temp['NPI'].astype('str')
    
    return temp


import unittest
from datetime import date

class Test_Methods(unittest.TestCase):
    '''A test class for the methods in Main.py'''
    def test_create_combo_npi_and_date_col(self):
        '''Tests the create_combo_npi_and_date_col method.'''
        # Test input.
        input = pd.DataFrame([[pd._libs.tslibs.timestamps.Timestamp('2022-08-11 00:00:00'), '1234567890'],
                                   [pd._libs.tslibs.timestamps.Timestamp('2022-07-14 00:00:00'), '0987654321']],
                                  columns=['Date', 'NPI'])

        # Test ground truth output.
        output = \
        pd.DataFrame([[pd._libs.tslibs.timestamps.Timestamp('2022-08-11 00:00:00'), '1234567890', '123456789044784'],
                      [pd._libs.tslibs.timestamps.Timestamp('2022-07-14 00:00:00'), '0987654321', '098765432144756']],
                      columns=['Date', 'NPI', 'Combo (NPI & Date)'])
        
        print(type(str(output.NPI.dtype)))
        print()
        print(output)
        print()
        print(create_combo_npi_and_date_col(input))
        print()
        print(output.compare(create_combo_npi_and_date_col(input)))
        print()
        print(output.dtypes)
        print(create_combo_npi_and_date_col(input).dtypes)
        self.assertTrue(output.equals(create_combo_npi_and_date_col(input)), 'test 1 has failed') # test 1

        # tests if floats act properly

        input['NPI'] = input['NPI'].astype('float64')

        self.assertTrue(output.equals(create_combo_npi_and_date_col(input)), 'test 2 has failed') # test 2


unittest.main(argv=[''], verbosity=2, exit=False)

Some of the docstring stuff is private, so I am sorry there are no docs. Can someone tell my why my tests fail?

Edit: import statements.

CodePudding user response:

use the assert_frame_equal function from pandas.testing

assert_frame_equal will give you feedback on what exactly is different. Sometimes it has to do with attributes that aren't obvious.

from pandas.testing import assert_frame_equal

# ... all of your other code before the assert ... #

    assert_frame_equal(output,create_combo_npi_and_date_col(input))
  • Related