I have a df as shown containing option data as follows:
ABCD
1 AARTIIND 29APR21 1100 PE
2 AARTIIND 29APR21 1100 PE
3 AARTIIND 29APR21 1100 PE
4 AARTIIND-I
5 AARTIIND-I
6 AARTIIND-I
7 AARTIIND-I
8 AARTIIND-I
9 AARTIIND-I
10 AARTIIND-I
11 AARTIIND-I
12 AARTIIND-I
13 AARTIIND-I
14 AARTIIND-I
15 AARTIIND-I
16 AARTIIND-I
17 AARTIIND-I
18 AARTIIND-I
Now in the above dataframe some of the rows are seperated by spaces into 4 parts. Others are singular words
I intend to do the following:
- Rows in a column which are seperated into 4 parts by spaces between them should be seperated into 4 individual column where each column contains one part
eg: AARTIIND 29APR21 1100 PE Should be splitted into 4 columns wherein column one will contain AARTIIND,column2 will contain the date, column3 will contain the price, column 4 will contain the type of option i.e PE
- Singular words which are not seperated should be inserted in column 1 while in the other columns we should put NA
eg: AARTIIND-I is singular hence column 1 will contain AARTIIND-I while column2,3,4 will display NA
Hence after the transformation the Final df should be displayed as:
A B C D
AARTIIND 29-Apr-21 1100 PE
AARTIIND 29-Apr-21 1100 PE
AARTIIND 29-Apr-21 1100 PE
AARTIIND-I NA NA NA
AARTIIND-I NA NA NA
AARTIIND-I NA NA NA
AARTIIND-I NA NA NA
To split the strings using white spaces I use:
new_df[['A', 'B', 'C', 'D']] = new_df.ABCD.str.split(expand=True)
But since the spacing is not consistent it gives me an error:
C:\Users\sadik\anaconda3\envs\Katwal_Asset_Management\python.exe "C:/Users/sadik/PycharmProjects/Katwal_Asset_Management/import data.py"
Traceback (most recent call last):
File "C:\Users\sadik\PycharmProjects\Katwal_Asset_Management\import data.py", line 6, in <module>
df[['A', 'B', 'C', 'D']] = df.ABCD.str.split(expand=True)
File "C:\Users\sadik\anaconda3\envs\Katwal_Asset_Management\lib\site-packages\pandas\core\frame.py", line 3600, in __setitem__
self._setitem_array(key, value)
File "C:\Users\sadik\anaconda3\envs\Katwal_Asset_Management\lib\site-packages\pandas\core\frame.py", line 3639, in _setitem_array
check_key_length(self.columns, key, value)
File "C:\Users\sadik\anaconda3\envs\Katwal_Asset_Management\lib\site-packages\pandas\core\indexers.py", line 428, in check_key_length
raise ValueError("Columns must be same length as key")
ValueError: Columns must be same length as key
So is there any way I can accomplish the above task using str.split or is there any other method in python to achieve the desired output
CodePudding user response:
Try the following code:
import io
import pandas as pd
text =
""" ABCD
1 AARTIIND 29APR21 1100 PE
2 AARTIIND 29APR21 1100 PE
3 AARTIIND 29APR21 1100 PE
4 AARTIIND-I
5 AARTIIND-I
6 AARTIIND-I
7 AARTIIND-I
8 AARTIIND-I
9 AARTIIND-I
10 AARTIIND-I
11 AARTIIND-I
12 AARTIIND-I
13 AARTIIND-I
14 AARTIIND-I
15 AARTIIND-I
16 AARTIIND-I
17 AARTIIND-I
18 AARTIIND-I"""
df = pd.read_csv(io.StringIO(text))
df = df[' ABCD'].str.split(' ', expand=True)
df.columns = ['A','B','C','D','E','F','G']