Home > Enterprise >  Update Column Names Using Data Dictionary
Update Column Names Using Data Dictionary

Time:08-06

I have a dataframe with column names 0, 1, 2, 3, etc. I can easily assign the field names when I load the data but the issue becomes if the column name changes, I would not know. There is another file where the column names 0, 1, 2, 3 are identified along with their description. 0 is ID Number, 1, is Title, etc.

I did convert the dataframe to a dictionary.

    layout_dict = layout_df.to_dict('records')
    
    [{'FIELD_NUMBER': '0',
    'FIELD_DESCRIPTION': 'ID Number',
    'FIELD_TYPE': 'String',
    'COMMENTS': 'SURE Voter ID number'},
    {'FIELD_NUMBER': '1',
    'FIELD_DESCRIPTION': 'Title',
    'FIELD_TYPE': 'String',
    'COMMENTS': nan},

From the layout_df dataframe with the column names 0, 1, 2, 3 is there a way to programmatically assign the column names to the FIELD_NUMBER in the dictionary and pull the FIELD_DESCRIPTION into the column name.

Here is how the data appears in the norco_df dataframe

0 1
123 dr
234 mr

Once the mapping occurs to the data dictionary, the column names in norco_df dataframe should be renamed to:

ID Number Title
123 dr
234 mr

CodePudding user response:

You can create a dictionary of the same shape as what .rename() takes from your layout_df:

columns = layout_df.set_index('FIELD_NUMBER')['FIELD_DESCRIPTION'].to_dict()
norco_df = norco_df.rename(columns=columns)
print(norco_df)

Output:

   ID Number Title
0        123    dr
1        234    mr

CodePudding user response:

Note that field numbers are string in layout_dict. The following list comprehension converts the column names

norco_df.columns = [item['FIELD_DESCRIPTION'] for old_col in norco_df.columns for item in layout_dict if item['FIELD_NUMBER'] == str(old_col)]

CodePudding user response:

Use the layout_df DataFrame to set up a Series and reindex:


df = pd.DataFrame(columns=list('ABC'), index=[0])

s = layout_df.astype({'FIELD_NUMBER': int}).set_index('FIELD_NUMBER')['FIELD_DESCRIPTION']
df.columns = s.reindex(range(df.shape[1])).to_list()

Output:

  ID Number Title  NaN
0       NaN   NaN  NaN
  • Related