I have two different data frames of different values but should have same columns. When i checked for example df1 has 10columns and df2 has 15columns. But in actual there are 1000's of columns in my df. Can anyone help me out how to check for the missing columns in two df's?
CodePudding user response:
I assumed df has 100 columns instead of 1000 just for the example
import pandas as pd
import numpy as np
df1 = pd.DataFrame(np.random.randint(0,100,size=(10, 10)))
df2 = pd.DataFrame(np.random.randint(0,100,size=(10, 15)))
df=pd.DataFrame(np.random.randint(0,100,size=(10, 100))) # use 1000 inside size if you want 1000 columns in df
missing_columns_in_df1=df.columns.difference(df1.columns).tolist()
missing_columns_in_df2=df.columns.difference(df2.columns).tolist()
CodePudding user response:
As pointed by Tim, there is too less information. Based upon the deduction made using given information, there are 2 dataframes with few non-similar set of columns and want to identify those columns.
There can be multiple approach find out the list of common columns using any one of the below :
- Intersect method of numpy
- Intersection of Pandas
- Using & operator
then subtracting from the list of all columns.
OR using difference operator of pandas