Home > OS >  Python: string not splitting correctly at "|||" substring
Python: string not splitting correctly at "|||" substring

Time:08-20

I have a column in Pandas DataFrame that stores long strings, in which different chunks of information are separated by a "|||". This is an example:

"intermediation|"mechanical turk"|precarious "public policy" ||| intermediation|"mechanical turk"|precarious high-level

I need to split this column into multiple columns, each column containing the string between the separators "|||".

However, while running the following code:

df['query_ids'].str.split('|||', n=5, expand = True)

What I get, however, are splits done for every single character, like this:

     0   1  2  3  4                                                  5
0        "  r  e  g  ulatory capture"|"political lobbying" policy-m...

I suspect it's because "|" is a Python operator, but I cannot think of a suitable workaround.

CodePudding user response:

You need to escape |:

df['query_ids'].str.split('\|\|\|', n=5, expand=True)

or to pass regex=False:

df['query_ids'].str.split('|||', n=5, expand=True, regex=False)
  • Related