Home > Net >  How to update dataframe columns to a specific list and if those columns are not present, they can be
How to update dataframe columns to a specific list and if those columns are not present, they can be

Time:11-06

I need to update some dataframes in which the columns are not consistent. Consider:

df1 = ['A', 'B', 'C']
df2 = ['A', 'B', 'C', 'E'] 
df3 = ['A', 'B', 'C', 'E', 'D']

required_columns = ['A', 'B', 'C', 'D', 'E']

Here, I need to have df1, df2, df3 such that it also has columns: ['A', 'B', 'C', 'D', 'E']. Here, in df1, D, E columns can be either NA or null in case D and E are not present, and D can be NA or null in case of df2. df3 has all these columns, but order doesn't match, so it should be updated to ['A', 'B', 'C', 'D', 'E']

CodePudding user response:

Do you mean:

df.reindex(['A','B','C','D','E'], axis=1)

Pass fill_value=0 if you want new columns filled with 0.

CodePudding user response:

Appening dataframes to a main dataframe

df_main = df1.append(df2, ignore_index=True)
df_main = df_main.append(df3, ignore_index=True)

usually sets missing values as NaN or something, so that shouldn't be a problem. Just use df_main = df_main.fillna(0)

To add to Quang Hoang's answer, should you have a changing columns list that you want to have sorted, you can also use df.columns.sort_values():

new_cols = list(df.columns.sort_values().array)
df = df.reindex(new_cols, axis=1)

df1:

    a   b   c   e   d
0   1.0 2.0 3.0 5.0 4.0
1   1.1 2.2 3.3 5.5 4.4

sorted columns:

    a   b   c   d   e
0   1.0 2.0 3.0 4.0 5.0
1   1.1 2.2 3.3 4.4 5.5
  • Related