Home > Back-end >  Unit Test: Testing if a dataframe contains specific columns
Unit Test: Testing if a dataframe contains specific columns

Time:08-05

I'm creating a unit test for some functions, in one of the tests, I would like test whether the new columns were created or not. Therefore I would like to test that the names of certain columns are in the output dataframe df_output from one of the functions. I have a list containing the names of the expected newly created columns List_Match. How can I do that as a unit test ?

A simplified example of my data: 
d = {'ID_EMPLOYEE': [12, 35, 56, 46], 'Number':[0,1,2,30], 'Location_EMPLOYEE':["US","US","Austria","France"], 'Salary':[100,200,100,160]}
df_output=pd.DataFrame(d)

List_Match=["Location_EMPLOYEE","ID_EMPLOYEE"]

CodePudding user response:

Try this,

assert all([col in df_output.columns for col in List_Match])

Alternative Solution without loop:

assert len(set(List_Match)&set(df_output.columns))==len(set(List_Match))

Explanation:

  1. Check each expected column in output
  2. Perform all to verify everything is present
  3. use assert to test your code
  • Related