Want to replace comma with decimal point in text file where after each number there is a comma in py-CodePudding

Arun,Mishra,108,23,34,45,56,Mumbai

o\p I want is

Arun,Mishra,108.23,34,45,56,Mumbai

Tried to replace the comma with dot but all the demiliters are replaced with comma

tried text.replace(',','.') but replacing all the commas with dot

CodePudding user response：

You can use regex for these kind of tasks:

import re

old_str = 'Arun,Mishra,108,23,34,45,56,Mumbai'
new_str = re.sub(r'(\d )(,)(\d )', r'\1.\3', old_str, 1)
>>> 'Arun,Mishra,108.23,34,45,56,Mumbai'

The search pattern r'(\d )(,)(\d )' was to find a comma between two numbers. There are three capture groups, therefore one can use them in the replacement: r\1.\3 (\1 and \3 are first and third groups). The old_str is the string and 1 is to tell the pattern to only replace the first occurrence (thus keep 34, 45).

CodePudding user response：

It may be instructive to show how this can be done without additional module imports.

The idea is to search the string for all/any commas. Once the index of a comma has been identified, examine the characters either side (checking for digits). If such a pattern is observed, modify the string accordingly

s = 'Arun,Mishra,108,23,34,45,56,Mumbai'

pos = 1

while (pos := s.find(',', pos, len(s)-1)) > 0:
    if s[pos-1].isdigit() and s[pos 1].isdigit():
        s = s[:pos]   '.'   s[pos 1:]
        break
    pos  = 1

print(s)

Output:

Arun,Mishra,108.23,34,45,56,Mumbai

CodePudding user response：

Assuming you have a plain CSV file as in your single line example, we can assume there are 8 columns and you want to 'merge' columns 3 and 4 together. You can do this with a regular expression - as shown below. Here I explicitly match the 8 columns into 8 groups - matching everything that is not a comma as a column value and then write out the 8 columns again with commas separating all except columns 3 and 4 where I put the period/dot you require.

$ echo "Arun,Mishra,108,23,34,45,56,Mumbai" | sed -r "s/([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)/\1,\2,\3.\4,\5,\6,\7,\8/"
Arun,Mishra,108.23,34,45,56,Mumbai

This regex is for your exact data. Having a generic regex to replace any comma between two subsequent sets of digits might give false matches on other data however so I think explicitly matching the data based on the exact columns you have will be the safest way to do it.

You can take the above regex and code it into your python code as shown below.

import re

inLine = 'Arun,Mishra,108,23,34,45,56,Mumbai'
outLine = re.sub(r'([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*),([^,]*)'
    , r'\1,\2,\3.\4,\5,\6,\7,\8', inLine, 0)
print(outLine)

As Tim Biegeleisen pointed out in an original comment, if you have access to the original source data you would be better fixing the formatting there. Of course that is not always possible.

CodePudding user response：

First split the string using s.split() and then replace ',' in 2nd element after replacing join the string back again.

s= 'Arun,Mishra,108,23,34,45,56,Mumbai '
ls = s.split(',')
ls[2] = '.'.join([ls[2], ls[3]])
ls.pop(3)
s = ','.join(ls)

CodePudding user response：

It changes all the commas to dots if dot have numbers before and after itself.

txt = "2459,12 is the best number. lets change the dots . with commas , 458,45."

commaindex = 0

while commaindex != -1:
    commaindex = txt.find(",",commaindex 1)
    if txt[commaindex-1].isnumeric() and txt[commaindex 1].isnumeric():
        txt = txt[0:commaindex]   "."   txt[commaindex 1:len(txt) 1]
        
        
        
        
print(txt)

Best Regards, Devrim