I have three space delimited text files. One of them (lets call it the master file) has all possible column headers, I need to add two more csv files (lets call them file1 and file2) to this master file, but file1 and file2 may not have all the columns that are present in the master file. In case a column is missing, I want a null object indicated for that csv in that particular column.
I have tried using pandas but haven't been able to figure out a logic. Since, if the missing column name is not at the end of the row of headers, it just doesn't work. Also note that because of the size of the master file, it is not feasible to read the master file into a data frame (takes too long) and concatenate with the other two files.
CodePudding user response:
You can try this(a is the master file):
import pandas as pd
a=pd.DataFrame([[1,2,3]],columns=['a','b','c'])
b=pd.DataFrame([[4,5]],columns=['b','c'])
c=pd.DataFrame(columns=a.columns)
pd.concat([b,c],ignore_index=True)
with concat,empty columns will fill with "Nan".