Home > database >  How do I split a dataframe column values in pandas to get another column using python?
How do I split a dataframe column values in pandas to get another column using python?

Time:12-08

I have created a dataframe like this from a list.

                                        Name
0     Security Name % to Net Assets* DEBENtURES 0.04
1   Britannia Industries Ltd. EQUity & RELAtED 96.83
2                                     HDFC Bank 6.98
3                                         ICICI 4.82
4                                       Infosys 4.37

using df = pd.DataFrame(list1, columns=['Name']) I want to split Name in another such that the dataframe looks like this:

                                        Name        value
0     Security Name % to Net Assets* DEBENtURES      0.04
1   Britannia Industries Ltd. EQUity & RELAtED      96.83
2                                     HDFC Bank      6.98
3                                         ICICI      4.82
4                                       Infosys      4.37

I tried doing something like df = pd.DataFrame(df.row.str.split(",", 1).tolist(), columns=["Security Name", "Weights"]) however it gave an error like AttributeError: 'DataFrame' object has no attribute 'row'

How exactly do I split a column such that I get another column? Please help

CodePudding user response:

There is no column row and no separator comma, so use Series.str.rsplit for split from right with n=1 for first space:

print (df.columns.tolist())
Name


df[['Name','value']] = df['Name'].str.rsplit(n=1, expand=True)
print (df)
                                         Name  value
0   Security Name % to Net Assets* DEBENtURES   0.04
1  Britannia Industries Ltd. EQUity & RELAtED  96.83
2                                   HDFC Bank   6.98
3                                       ICICI   4.82
4                                     Infosys   4.37

If possible some numbers missing is possible use Series.str.extract with match numbers after last space in regex:

print (df)
                                               Name
0    Security Name % to Net Assets* DEBENtURES 0.04
1  Britannia Industries Ltd. EQUity & RELAtED 96.83
2                                    HDFC Bank 6.98
3                                        ICICI 4.82
4                                      Infosys 4.37
5                                               new
6                                             aa 58


df = (df['Name'].str.extract('(?P<Name>.*)\s (?P<value>\d \.\d |\d )$')
                .fillna({'Name':df['Name']}))
print (df)
                                         Name  value
0   Security Name % to Net Assets* DEBENtURES   0.04
1  Britannia Industries Ltd. EQUity & RELAtED  96.83
2                                   HDFC Bank   6.98
3                                       ICICI   4.82
4                                     Infosys   4.37
5                                         new    NaN
6                                          aa     58
  • Related