Home > Enterprise >  Missing columns in two different dataframes
Missing columns in two different dataframes

Time:04-22

I have two different data frames of different values but should have same columns. When i checked for example df1 has 10columns and df2 has 15columns. But in actual there are 1000's of columns in my df. Can anyone help me out how to check for the missing columns in two df's?

CodePudding user response:

I assumed df has 100 columns instead of 1000 just for the example

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randint(0,100,size=(10, 10)))  
df2 = pd.DataFrame(np.random.randint(0,100,size=(10, 15)))
df=pd.DataFrame(np.random.randint(0,100,size=(10, 100))) # use 1000 inside size if you want 1000 columns in df  

missing_columns_in_df1=df.columns.difference(df1.columns).tolist()
missing_columns_in_df2=df.columns.difference(df2.columns).tolist()

CodePudding user response:

As pointed by Tim, there is too less information. Based upon the deduction made using given information, there are 2 dataframes with few non-similar set of columns and want to identify those columns.

There can be multiple approach find out the list of common columns using any one of the below :

  1. Intersect method of numpy
  2. Intersection of Pandas
  3. Using & operator

then subtracting from the list of all columns.

OR using difference operator of pandas

  • Related