Home > Enterprise >  How to detect string regardless uppercase, lowercase, with/without space in python?
How to detect string regardless uppercase, lowercase, with/without space in python?

Time:09-30

So I am dealing with csv file that the column names are changeable.

For example, the column names always have like "First Name", "Last Name", "Phone Number", etc. However, every time I download this csv file (update everyday), column names could change to "FirstName" or "first name" or "firstname" or even "firstNAME". Since I have to match the column to my own database columns, I could not use index.

The solution I could think of for now is to change all columns every time the script read the csv file.

csv_table = pd.read_csv(filename)
custom_col_name = ['First Name', 'Last Name', 'Phone Number']
csv_table.columns = custom_col_name

But if someday they (who generates the csv file) decide to change the order of columns, my script and result would be mess up.

Is there any way that I could set a pattern to look for and then change it.

Something like this:

  1. read csv
  2. look for column similar to 'first name'
  3. change that column to 'First Name'
  4. (repeat for all columns)

CodePudding user response:

Rename the columns by converting the column names to lowercase and removing whitespace.

import re
df.columns= df.columns.str.lower()
df = df.rename(columns=lambda x: re.sub('\s ', '', x))

CodePudding user response:

# Define the desired column names (order doesn't matter)
target_names = ['First Name', 'Last Name', 'Phone Number']

# Build mapping dictionary
mapper = {}
for col_name in df.columns:
    for target_name in target_names:
        for_compare_1 = col_name.lower().replace(' ', '')
        for_compare_2 = target_name.lower().replace(' ', '')
        if for_compare_1 == for_compare_2:
            mapper[col_name] = target_name
            break

# Rename the columns
df = df.rename(columns=mapper)
  • Related