So I am dealing with csv file that the column names are changeable.
For example, the column names always have like "First Name", "Last Name", "Phone Number", etc. However, every time I download this csv file (update everyday), column names could change to "FirstName" or "first name" or "firstname" or even "firstNAME". Since I have to match the column to my own database columns, I could not use index.
The solution I could think of for now is to change all columns every time the script read the csv file.
csv_table = pd.read_csv(filename)
custom_col_name = ['First Name', 'Last Name', 'Phone Number']
csv_table.columns = custom_col_name
But if someday they (who generates the csv file) decide to change the order of columns, my script and result would be mess up.
Is there any way that I could set a pattern to look for and then change it.
Something like this:
- read csv
- look for column similar to 'first name'
- change that column to 'First Name'
- (repeat for all columns)
CodePudding user response:
Rename the columns by converting the column names to lowercase and removing whitespace.
import re
df.columns= df.columns.str.lower()
df = df.rename(columns=lambda x: re.sub('\s ', '', x))
CodePudding user response:
# Define the desired column names (order doesn't matter)
target_names = ['First Name', 'Last Name', 'Phone Number']
# Build mapping dictionary
mapper = {}
for col_name in df.columns:
for target_name in target_names:
for_compare_1 = col_name.lower().replace(' ', '')
for_compare_2 = target_name.lower().replace(' ', '')
if for_compare_1 == for_compare_2:
mapper[col_name] = target_name
break
# Rename the columns
df = df.rename(columns=mapper)