How can I convert a dataframe, for example looking like this:
id | column! 1 | column 2?> | column 3
ABC1 | 1234 | text text | text!
ABC2 | 1234 | text text | text!
Into a dictionary, but while converting the column names to remove special characters and spaces. The conversion could look like something like label_id = re.sub('[^A-Za-z0-9] ', '', str(label_name))
but I am not sure how I could apply this to the dataframe -> dictionary conversion.
CodePudding user response:
Use str.replace
on columns with '\W '
. \W
matches any character which is not a word character.
d = df.rename(columns=dict(zip(df.columns, df.columns.str.replace('\W ', '', regex=True)))).to_dict()
print(d)
# Output:
{'id': {0: 'ABC1', 1: 'ABC2'},
'column1': {0: 1234, 1: 1234},
'column2': {0: 'text text', 1: 'text text'},
'column3': {0: 'text!', 1: 'text!'}}
Details:
>>> dict(zip(df.columns, df.columns.str.replace('\W ', '', regex=True)))
# Output:
{'id': 'id',
'column! 1': 'column1',
'column 2?>': 'column2',
'column 3': 'column3'}
Or more simple:
df.columns = df.columns.str.replace('\W ', '', regex=True)
print(df)
# Output:
id column1 column2 column3
0 ABC1 1234 text text text!
1 ABC2 1234 text text text!