Home > front end >  How to split a Pandas DataFrame column into multiple columns if the column is a string of varying le
How to split a Pandas DataFrame column into multiple columns if the column is a string of varying le

Time:12-04

I have a Pandas DataFrame that was created by reading a table from a PDF with tabula. The PDF isn't parsed perfectly, so I end up with a few table columns smushed into one column in the resulting DataFrame. The issue is that one of the table columns in the PDF is text, so there are sometimes one word and sometimes two words that compose the column. Example:

            Col_1  Col_2
0       Hello X Y      A
1 Hello world Q R      B
2          Hi S T      C

I would like to split Col_1 into 3 columns. I'm not sure how to do this, given that the first new column would sometimes consist of one word, as in the case of Rows 0 & 2, and sometimes consist of two words, as in the case of Row 1.

I have tried splitting the strings of Col_1 with df['Col_1'].str.split(' ', 4, expand=True), but this starts the splitting from the beginning of the string (from the left), whereas I would like the splitting to be done from the right, I suppose.

CodePudding user response:

You can try using str.rsplit:

Splits string around given separator/delimiter, starting from the right.

df['Col_1'].str.rsplit(' ', 2, expand=True)

Output:

             0  1  2
0        Hello  X  Y
1  Hello world  Q  R
2           Hi  S  T

As a full dataframe:

df['Col_1'].str.rsplit(' ', 2, expand=True).add_prefix('nCol_').join(df)

Output:

        nCol_0 nCol_1 nCol_2            Col_1 Col_2
0        Hello      X      Y        Hello X Y     A
1  Hello world      Q      R  Hello world Q R     B
2           Hi      S      T           Hi S T     C
  • Related