Home > OS >  Python/Pandas How to put comma after every number in a string
Python/Pandas How to put comma after every number in a string

Time:11-26

For example you have "blablabla 23423451, neemememem 5688234 hhojvz 3451, yoea".

Output should look like this "blablabla 2342345, neemememem 5688234, hhojvz 345, yoea"

If there is already a comma, just skip.

Note: Such text in the dataframe, there are bunch of it. So, ideally would be in pandas. All numbers and text are unique (no dublicates). Length of digits in a number are random.

text
blablabla 2342345 neemememem 5688234 hhojvz 345 yoea
asdffgh 645655 neemememem 5688234 hhojvz 345 yoeablablabla 2342345 neemememem 5688234 hhojvz 345 yoeablablabla 2342345 ghhjfg 777777 hhojvz 345 ertert 698666666 neemememem 5688234 hhojvz 345 yoea
blablabla 2342345 neemememem 5688234 hhojvz 345 yoeablablabla 2342345 neemememem 5688234 hhojvz 345 yoeablablabla 2342345 neemememem 5688234 hhojvz 345 yoeablablabla 2342345 neemememem 5688234 hhojvz 345 yoeablablabla
5688234 hhojvz 345 yoeablablabla 2342345 neemememem 5688234 hhojvz 345 yoeablablabla 2342345 neemememem 5688234 hhojvz 345 yoeablablabla 2342345 neemememem 5688234 hhojvz 345 yoea
blablabla 2342345 neemememem 5688234 hhojvz 345 yoea
sdf 2345

CodePudding user response:

You can use a regex:

df['text'] = df['text'].str.replace(r'(\d )(?!,)\b', r'\1,', regex=True)

How it works

(\d ) # capture digits
(?!,) # not followed by comma
\b    # ensure word boundary

Replace with: captured group (\1) and comma

output:

text
0                                                                                                                                                                                blablabla 2342345, neemememem 5688234, hhojvz 345, yoea
1                        asdffgh 645655, neemememem 5688234, hhojvz 345, yoeablablabla 2342345, neemememem 5688234, hhojvz 345, yoeablablabla 2342345, ghhjfg 777777, hhojvz 345, ertert 698666666, neemememem 5688234, hhojvz 345, yoea
2  blablabla 2342345, neemememem 5688234, hhojvz 345, yoeablablabla 2342345, neemememem 5688234, hhojvz 345, yoeablablabla 2342345, neemememem 5688234, hhojvz 345, yoeablablabla 2342345, neemememem 5688234, hhojvz 345, yoeablablabla
3                                         5688234, hhojvz 345, yoeablablabla 2342345, neemememem 5688234, hhojvz 345, yoeablablabla 2342345, neemememem 5688234, hhojvz 345, yoeablablabla 2342345, neemememem 5688234, hhojvz 345, yoea
4                                                                                                                                                                                blablabla 2342345, neemememem 5688234, hhojvz 345, yoea
5                                                                                                                                                                                                                              sdf 2345,

Alternative regex: (\d )(?!,)(?!\d), this removes the condition on the word boundary and avoids to transform '123,' into '12,3,'

  • Related