I have created a dataframe like this from a list.
Name
0 Security Name % to Net Assets* DEBENtURES 0.04
1 Britannia Industries Ltd. EQUity & RELAtED 96.83
2 HDFC Bank 6.98
3 ICICI 4.82
4 Infosys 4.37
using df = pd.DataFrame(list1, columns=['Name'])
I want to split Name
in another such that the dataframe looks like this:
Name value
0 Security Name % to Net Assets* DEBENtURES 0.04
1 Britannia Industries Ltd. EQUity & RELAtED 96.83
2 HDFC Bank 6.98
3 ICICI 4.82
4 Infosys 4.37
I tried doing something like df = pd.DataFrame(df.row.str.split(",", 1).tolist(), columns=["Security Name", "Weights"])
however it gave an error like AttributeError: 'DataFrame' object has no attribute 'row'
How exactly do I split a column such that I get another column? Please help
CodePudding user response:
There is no column row
and no separator comma, so use Series.str.rsplit
for split from right with n=1
for first space:
print (df.columns.tolist())
Name
df[['Name','value']] = df['Name'].str.rsplit(n=1, expand=True)
print (df)
Name value
0 Security Name % to Net Assets* DEBENtURES 0.04
1 Britannia Industries Ltd. EQUity & RELAtED 96.83
2 HDFC Bank 6.98
3 ICICI 4.82
4 Infosys 4.37
If possible some numbers missing is possible use Series.str.extract
with match numbers after last space in regex:
print (df)
Name
0 Security Name % to Net Assets* DEBENtURES 0.04
1 Britannia Industries Ltd. EQUity & RELAtED 96.83
2 HDFC Bank 6.98
3 ICICI 4.82
4 Infosys 4.37
5 new
6 aa 58
df = (df['Name'].str.extract('(?P<Name>.*)\s (?P<value>\d \.\d |\d )$')
.fillna({'Name':df['Name']}))
print (df)
Name value
0 Security Name % to Net Assets* DEBENtURES 0.04
1 Britannia Industries Ltd. EQUity & RELAtED 96.83
2 HDFC Bank 6.98
3 ICICI 4.82
4 Infosys 4.37
5 new NaN
6 aa 58