I want to split one string into several columns however I encounter issue with str.split
method.
Example:
While performing str.split
on column Name
which consist such strings it work as expected.
data1 = {'Name': ['Alice 23', 'Philip 12', 'Krish 64', 'John 29']}
df1 = pd.DataFrame(data1)
df1
Name
0 Alice 23
1 Philip 12
2 Krish 64
3 John 29
performing split:
df1[['Name', 'age']] = df1['Name'].str.split(' ', 1, expand=True)
df1
Name age
0 Alice 23
1 Philip 12
2 Krish 64
3 John 29
All good, as I wanted but if I need to place other separator like ||
its not working properly.
data2 = {'Name': ['Alice||23', 'Philip||12', 'Krish||64', 'John||29']}
df2 = pd.DataFrame(data)
df2
Name
0 Alice||23
1 Philip||12
2 Krish||64
3 John||29
performing split...
df2[['Name', 'age']] = df2['Name'].str.split('[||]',1,expand = True)
df3[['Name', 'age']] = df2['Name'].str.split('||',1,expand = True)
results in not what I expected
df2
Name age
0 Alice |23
1 Philip |12
2 Krish |64
3 John |29
df3
Name age
0 Alice||23
1 Philip||12
2 Krish||64
3 John||29
What's the reason for this behaviour and how get expected result which is as df1
?
CodePudding user response:
The issue you're hitting is that pandas
defaults to assuming your string to split on is a regular expression. In regular expressions the "|"
character is a special character that enables you to match either expression on the left or right of that character (e.g. you can match 'a' or 'b' with the expression '(a|b)'
.
In your case, we don't want to pass a regular expression, so you can pass .str.split(…, regex=False)
.
>>> df2['Name'].str.split('||', regex=False, expand=True)
0 1
0 Alice 23
1 Philip 12
2 Krish 64
3 John 29
CodePudding user response:
escape the |
:
df2['Name'].str.split('\|\|',expand = True)
Output:
0 1
0 Alice 23
1 Philip 12
2 Krish 64
3 John 29