I need to update some dataframes in which the columns are not consistent. Consider:
df1 = ['A', 'B', 'C']
df2 = ['A', 'B', 'C', 'E']
df3 = ['A', 'B', 'C', 'E', 'D']
required_columns = ['A', 'B', 'C', 'D', 'E']
Here, I need to have df1, df2, df3 such that it also has columns: ['A', 'B', 'C', 'D', 'E']. Here, in df1, D, E columns can be either NA or null in case D and E are not present, and D can be NA or null in case of df2. df3 has all these columns, but order doesn't match, so it should be updated to ['A', 'B', 'C', 'D', 'E']
CodePudding user response:
Do you mean:
df.reindex(['A','B','C','D','E'], axis=1)
Pass fill_value=0
if you want new columns filled with 0
.
CodePudding user response:
Appening dataframes to a main dataframe
df_main = df1.append(df2, ignore_index=True)
df_main = df_main.append(df3, ignore_index=True)
usually sets missing values as NaN or something, so that shouldn't be a problem. Just use df_main = df_main.fillna(0)
To add to Quang Hoang's answer, should you have a changing columns list that you want to have sorted, you can also use df.columns.sort_values()
:
new_cols = list(df.columns.sort_values().array)
df = df.reindex(new_cols, axis=1)
df1:
a b c e d
0 1.0 2.0 3.0 5.0 4.0
1 1.1 2.2 3.3 5.5 4.4
sorted columns:
a b c d e
0 1.0 2.0 3.0 4.0 5.0
1 1.1 2.2 3.3 4.4 5.5