Home > Software engineering >  Inserting a character after the last set of numbers in a Pandas column
Inserting a character after the last set of numbers in a Pandas column

Time:12-13

I'm interested in inserting a character (comma in this case ) after the last set of numbers if they are present in values of a Pandas column. A sample of the original dataframe is as below:

import pandas as pd
data = {'ID': ['1', '2', '3'], 'Address': ['123 Nelson Avenue, Redmont Central, Redmont 0987', '123 Nelson Avenue, Redmont Central, Redmont', '123 Nelson Avenue, Redmont Central, Redmont 87']}
df_addresses = pd.DataFrame(data)`

Expected df output is as below: -

data_expected = {'ID': ['1', '2', '3'], 'Address': ['123 Nelson Avenue, Redmont Central, Redmont, 0987', '123 Nelson Avenue, Redmont Central, Redmont', '123 Nelson Avenue, Redmont Central, Redmont, 87']}
df_addresses_expected = pd.DataFrame(data_expected)

Ideally, a comma is inserted before the last set of numbers in the column value. If the last set of characters is not a number-like value, the column is left as it is. Any thoughts around this?

CodePudding user response:

You could define a function to insert a comma into a string as you describe, and apply that function to the address column:

def insert_comma(string):
    if string == '':
        return string
    if string[-1] not in '0123456789':
        return string
    words = string.split(' ')
    if len(words) <= 1:
        return string
    words[-2]  = ','
    return ' '.join(words)

df_addresses['Address'] = df_addresses['Address'].apply(insert_comma)
df_addresses
    ID  Address
0   1   123 Nelson Avenue, Redmont Central, Redmont, 0987
1   2   123 Nelson Avenue, Redmont Central, Redmont
2   3   123 Nelson Avenue, Redmont Central, Redmont, 87

CodePudding user response:

Something like this?

def check_for_final_comma(input_str):
    input_str_split = input_str.split(' ')
    last_word = input_str_split[-1]

    if last_word.isnumeric():
        if input_str_split[-2][-1] != ",":
            input_str_split[-2]  = ","
            return " ".join(input_str_split)
    
    return input_str

df_addresses['Address_New'] = df_addresses['Address'].apply(check_for_final_comma)

Output: enter image description here

CodePudding user response:

You can do it without using apply like this:

df['Address'] = df['Address'].str.extract('(. ?)\s*(\d )?$').fillna('').assign(t=', ')[[0, 't', 1]].sum(axis=1).str.strip(', ')

Output:

>>> df
  ID                                            Address
0  1  123 Nelson Avenue, Redmont Central, Redmont, 0987
1  2        123 Nelson Avenue, Redmont Central, Redmont
2  3    123 Nelson Avenue, Redmont Central, Redmont, 87
  • Related