I have a Pandas DataFrame that was created by reading a table from a PDF with tabula. The PDF isn't parsed perfectly, so I end up with a few table columns smushed into one column in the resulting DataFrame. The issue is that one of the table columns in the PDF is text, so there are sometimes one word and sometimes two words that compose the column. Example:
Col_1 Col_2
0 Hello X Y A
1 Hello world Q R B
2 Hi S T C
I would like to split Col_1
into 3 columns. I'm not sure how to do this, given that the first new column would sometimes consist of one word, as in the case of Rows 0 & 2, and sometimes consist of two words, as in the case of Row 1.
I have tried splitting the strings of Col_1
with df['Col_1'].str.split(' ', 4, expand=True)
, but this starts the splitting from the beginning of the string (from the left), whereas I would like the splitting to be done from the right, I suppose.
CodePudding user response:
You can try using str.rsplit
:
Splits string around given separator/delimiter, starting from the right.
df['Col_1'].str.rsplit(' ', 2, expand=True)
Output:
0 1 2
0 Hello X Y
1 Hello world Q R
2 Hi S T
As a full dataframe:
df['Col_1'].str.rsplit(' ', 2, expand=True).add_prefix('nCol_').join(df)
Output:
nCol_0 nCol_1 nCol_2 Col_1 Col_2
0 Hello X Y Hello X Y A
1 Hello world Q R Hello world Q R B
2 Hi S T Hi S T C