How to detect string regardless uppercase, lowercase, with/without space in python?-CodePudding

So I am dealing with csv file that the column names are changeable.

For example, the column names always have like "First Name", "Last Name", "Phone Number", etc. However, every time I download this csv file (update everyday), column names could change to "FirstName" or "first name" or "firstname" or even "firstNAME". Since I have to match the column to my own database columns, I could not use index.

The solution I could think of for now is to change all columns every time the script read the csv file.

csv_table = pd.read_csv(filename)
custom_col_name = ['First Name', 'Last Name', 'Phone Number']
csv_table.columns = custom_col_name

But if someday they (who generates the csv file) decide to change the order of columns, my script and result would be mess up.

Is there any way that I could set a pattern to look for and then change it.

Something like this:

read csv
look for column similar to 'first name'
change that column to 'First Name'
(repeat for all columns)

CodePudding user response：

Rename the columns by converting the column names to lowercase and removing whitespace.

import re
df.columns= df.columns.str.lower()
df = df.rename(columns=lambda x: re.sub('\s ', '', x))

CodePudding user response：

# Define the desired column names (order doesn't matter)
target_names = ['First Name', 'Last Name', 'Phone Number']

# Build mapping dictionary
mapper = {}
for col_name in df.columns:
    for target_name in target_names:
        for_compare_1 = col_name.lower().replace(' ', '')
        for_compare_2 = target_name.lower().replace(' ', '')
        if for_compare_1 == for_compare_2:
            mapper[col_name] = target_name
            break

# Rename the columns
df = df.rename(columns=mapper)