I have a dataframe with multiple columns and I want to separate the numbers from the letters with a space in one column. In this example I want to add space in the third column. do you know how to do so?
import pandas as pd
data = {'first_column': ['first_value', 'second_value', 'third_value'],
'second_column': ['first_value', 'second_value', 'third_value'],
'third_column':['AA6589', 'GG6589', 'BXV6589'],
'fourth_column':['first_value', 'second_value', 'third_value'],
}
df = pd.DataFrame(data)
print (df)
CodePudding user response:
Use str.replace
with a short regex:
df['third_column'] = df['third_column'].str.replace(r'(\D )(\d )',
r'\1 \2', regex=True)
regex:
(\D ) # capture one or more non-digits
(\d ) # capture one or more digits
replace with \1 \2
(first captured group, then space, then second captured group).
Alternative with lookarounds:
df['third_column'] = df['third_column'].str.replace(r'(?<=\D)(?=\d)',
' ', regex=True)
meaning: insert a space at any position in-between a non-digit and a digit.
CodePudding user response:
Similarly you could extract the digits and non digit characters from your 'third_column' and place them together with a space in between:
df.assign(
third_column=df["third_column"].str.extract(r'(\D )') " " df["third_column"].str.extract(r'(\d )')
)
first_column second_column third_column fourth_column
0 first_value first_value AA 6589 first_value
1 second_value second_value GG 6589 second_value
2 third_value third_value BXV 6589 third_value