Converting pandas dataframe to dictionary while modifying column names-CodePudding

How can I convert a dataframe, for example looking like this:

id    | column! 1 | column 2?> | column 3
ABC1  | 1234      | text text  | text!
ABC2  | 1234      | text text  | text!

Into a dictionary, but while converting the column names to remove special characters and spaces. The conversion could look like something like label_id = re.sub('[^A-Za-z0-9] ', '', str(label_name)) but I am not sure how I could apply this to the dataframe -> dictionary conversion.

CodePudding user response：

Use str.replace on columns with '\W '. \W matches any character which is not a word character.

d = df.rename(columns=dict(zip(df.columns, df.columns.str.replace('\W ', '', regex=True)))).to_dict()
print(d)

# Output:
{'id': {0: 'ABC1', 1: 'ABC2'},
 'column1': {0: 1234, 1: 1234},
 'column2': {0: 'text text', 1: 'text text'},
 'column3': {0: 'text!', 1: 'text!'}}

Details:

>>> dict(zip(df.columns, df.columns.str.replace('\W ', '', regex=True)))

# Output:
{'id': 'id',
 'column! 1': 'column1',
 'column 2?>': 'column2',
 'column 3': 'column3'}

Or more simple:

df.columns = df.columns.str.replace('\W ', '', regex=True)
print(df)

# Output:
     id  column1    column2 column3
0  ABC1     1234  text text   text!
1  ABC2     1234  text text   text!