Home > Enterprise >  How to split a string column into two column by varying space delimiter on its last occurence
How to split a string column into two column by varying space delimiter on its last occurence

Time:08-30

I am trying a way to split a string column in python to two different columns by space delimiter. I have tried with below code:

df[['A', 'B']] = df['AB'].str.split(' ', 1, expand=True)

But this will work only if the space delimiter is having only single space. I would like to know if we can split a column by varying length of space delimiter by its last occurence.

Example:

If column value is "aa bb cc", then the resultant new columns values should be "aa bb" and "cc"

If column value is "dd ee (more than one space delimiter) ff", then resultant new columns values should be "dd ee" and "ff"

Here we need to split the string column by delimiter on on its last occurence but it can have varying length for space.

Any help will be much appreciated.

CodePudding user response:

You can use this regex to split on:

\s (?!.*\s)

This looks for a sequence of spaces which has no spaces after it in the string, so will only split into two values at most.

Usage:

df = pd.DataFrame({'AB': ['aa bb cc', 'dd ee    ff']})
print(df)
df[['A', 'B']] = df['AB'].str.split(r'\s (?!.*\s)', expand=True)
print(df)

Output:

            AB
0     aa bb cc
1  dd ee    ff

            AB      A   B
0     aa bb cc  aa bb  cc
1  dd ee    ff  dd ee  ff

CodePudding user response:

You can use rindex(' ') this returns the index of the last occurrence.

origColumn = 'Part-A Part-B'
seperator = origColumn.rindex(' ')
newColumn1 = origColumn[0:seperator]
newColumn2 = origColumn[seperator 1:]
print(f"column:{origColumn} is now split into {newColumn1}{newColumn2}")

output: column:Part-A Part-B is now split into Part-A:Part-B

  • Related