Splitting a column of dataframe into multiple columns-CodePudding

I have a df as shown containing option data as follows:

    ABCD
1   AARTIIND 29APR21 1100 PE
2   AARTIIND 29APR21 1100 PE
3   AARTIIND 29APR21 1100 PE
4   AARTIIND-I
5   AARTIIND-I
6   AARTIIND-I
7   AARTIIND-I
8   AARTIIND-I
9   AARTIIND-I
10  AARTIIND-I
11  AARTIIND-I
12  AARTIIND-I
13  AARTIIND-I
14  AARTIIND-I
15  AARTIIND-I
16  AARTIIND-I
17  AARTIIND-I
18  AARTIIND-I

Now in the above dataframe some of the rows are seperated by spaces into 4 parts. Others are singular words

I intend to do the following:

Rows in a column which are seperated into 4 parts by spaces between them should be seperated into 4 individual column where each column contains one part

eg: AARTIIND 29APR21 1100 PE Should be splitted into 4 columns wherein column one will contain AARTIIND,column2 will contain the date, column3 will contain the price, column 4 will contain the type of option i.e PE

Singular words which are not seperated should be inserted in column 1 while in the other columns we should put NA

eg: AARTIIND-I is singular hence column 1 will contain AARTIIND-I while column2,3,4 will display NA

Hence after the transformation the Final df should be displayed as:

A           B           C           D
AARTIIND    29-Apr-21   1100        PE
AARTIIND    29-Apr-21   1100        PE
AARTIIND    29-Apr-21   1100        PE
AARTIIND-I  NA          NA          NA
AARTIIND-I  NA          NA          NA
AARTIIND-I  NA          NA          NA
AARTIIND-I  NA          NA          NA

To split the strings using white spaces I use:

new_df[['A', 'B', 'C', 'D']] = new_df.ABCD.str.split(expand=True)

But since the spacing is not consistent it gives me an error:

C:\Users\sadik\anaconda3\envs\Katwal_Asset_Management\python.exe "C:/Users/sadik/PycharmProjects/Katwal_Asset_Management/import data.py"
Traceback (most recent call last):
  File "C:\Users\sadik\PycharmProjects\Katwal_Asset_Management\import data.py", line 6, in <module>
    df[['A', 'B', 'C', 'D']] = df.ABCD.str.split(expand=True)
  File "C:\Users\sadik\anaconda3\envs\Katwal_Asset_Management\lib\site-packages\pandas\core\frame.py", line 3600, in __setitem__
    self._setitem_array(key, value)
  File "C:\Users\sadik\anaconda3\envs\Katwal_Asset_Management\lib\site-packages\pandas\core\frame.py", line 3639, in _setitem_array
    check_key_length(self.columns, key, value)
  File "C:\Users\sadik\anaconda3\envs\Katwal_Asset_Management\lib\site-packages\pandas\core\indexers.py", line 428, in check_key_length
    raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key

So is there any way I can accomplish the above task using str.split or is there any other method in python to achieve the desired output

CodePudding user response：

Try the following code:

import io
import pandas as pd

text = 
"""    ABCD
1   AARTIIND 29APR21 1100 PE
2   AARTIIND 29APR21 1100 PE
3   AARTIIND 29APR21 1100 PE
4   AARTIIND-I
5   AARTIIND-I
6   AARTIIND-I
7   AARTIIND-I
8   AARTIIND-I
9   AARTIIND-I
10  AARTIIND-I
11  AARTIIND-I
12  AARTIIND-I
13  AARTIIND-I
14  AARTIIND-I
15  AARTIIND-I
16  AARTIIND-I
17  AARTIIND-I
18  AARTIIND-I"""

df = pd.read_csv(io.StringIO(text))

df = df['    ABCD'].str.split(' ', expand=True)

df.columns = ['A','B','C','D','E','F','G']