Home > Mobile >  pandas convert comma to dash inside parentheses
pandas convert comma to dash inside parentheses

Time:12-04

given a data set that has commas as part of the text, is there a good/easy way to convert them so i can parse the rest of the data using the 'real' commas? the commas I want to ignore/translate are always inside parentheses

#Create Series
s = pd.Series(['one,two,ten','first,second,third(twenty,thirty,forty),last','ten,eleven,twelve'],['buz','bas','bur'])
k = pd.Series(['y','n','o'],['buz','bas','bur'])

#Create DataFrame df from two series
df = pd.DataFrame({'first':s,'second':k})

my thought is that for each row in column first I need to check for a "(" and then if there is a "," convert it to "-". then if I get to the ")" I stop the translation. In the end I will have third(twenty-thirty-forty)

Is there a char by char parser that can be triggered by a "("

expected output:

#Create Series
s = pd.Series(['one,two,ten','first,second,third(twenty-thirty-forty),last','ten,eleven,twelve'],['buz','bas','bur'])
k = pd.Series(['y','n','o'],['buz','bas','bur'])
df = pd.DataFrame({'first':s,'second':k})

CodePudding user response:

Let us try str.replace with replacement lambda function

repl = lambda g: g.group().replace(',', '-')
df['first'] = df['first'].str.replace(r'\((.*?)\)', repl, regex=True)

print(df)

                                            first second
buz                                   one,two,ten      y
bas  first,second,third(twenty-thirty-forty),last      n
bur                             ten,eleven,twelve      o

CodePudding user response:

You can create a character by character parser and apply to each column:

def replace_comma(x):
    # create a list from the string
    x_list = [s for s in x]

    # create a second list to modify
    new_xlist = x_list

    # set a flag for when in paranthesis
    in_paranthesis = False

    # iterate through the list
    for count, character in enumerate(x_list):
        if character == '(':
            in_paranthesis = True
        elif character == ')':
            in_paranthesis = False
        elif character == ',' and in_paranthesis is True:
            # if in paranthesis, replace comma with '/'
            new_xlist[count] = '/'

    # return new_xlist, with /'s, joined as a string
    return ('').join(new_xlist)

Use pandas apply on each column:

df = df['first'].apply(replace_comma)

CodePudding user response:

Using str.replace and regex.

df['first'].str.replace(r"(,(?=[^()]*\)))", '-')

Output:
buz                                     one,two,ten
bas    first,second,third(twenty-thirty-forty),last
bur                               ten,eleven,twelve
Name: first, dtype: object
  • Related