I have a column in Pandas DataFrame that stores long strings, in which different chunks of information are separated by a "|||". This is an example:
"intermediation|"mechanical turk"|precarious "public policy" ||| intermediation|"mechanical turk"|precarious high-level
I need to split this column into multiple columns, each column containing the string between the separators "|||".
However, while running the following code:
df['query_ids'].str.split('|||', n=5, expand = True)
What I get, however, are splits done for every single character, like this:
0 1 2 3 4 5
0 " r e g ulatory capture"|"political lobbying" policy-m...
I suspect it's because "|" is a Python operator, but I cannot think of a suitable workaround.
CodePudding user response:
You need to escape |
:
df['query_ids'].str.split('\|\|\|', n=5, expand=True)
or to pass regex=False
:
df['query_ids'].str.split('|||', n=5, expand=True, regex=False)