Home > Software design >  Indexing a dataframe column in Pandas
Indexing a dataframe column in Pandas

Time:06-24

How can I index values in a column in pandas and make it into a new column? This is what I'm trying to do:

Original:

       Data    
0  0010-AAAA    
1  0010-BBBB    
2  0010-CCCC    
3  0011-DDDD    
4  0011-EEEE    

Adding two columns:
       Data    col_2   col_3  
0  0010-AAAA    0010    AAAA
1  0010-BBBB    0010    BBBB
2  0010-CCCC    0010    CCCC
3  0011-DDDD    0011    DDDD
4  0011-EEEE    0011    EEEE

CodePudding user response:

df[['col_2','col_3']]= df['Data'].str.split("-",expand=True)
df
Data    col_2   col_3
0   0010-AAAA   0010    AAAA
1   0010-BBBB   0010    BBBB
2   0010-CCCC   0010    CCCC
3   0011-DDDD   0011    DDDD
4   0011-EEEE   0011    EEEE

CodePudding user response:

Looks like you need a split:

df[['col_2', 'col_3']] = df['Data'].str.split('-', n=1, expand=True)

output:

        Data col_2 col_3
0  0010-AAAA  0010  AAAA
1  0010-BBBB  0010  BBBB
2  0010-CCCC  0010  CCCC
3  0011-DDDD  0011  DDDD
4  0011-EEEE  0011  EEEE

no dash

Then use a regex with str.extract.

In this case: numbers \d , followed by non numbers \D :

df[['col_2', 'col_3']] = df['Data'].str.extract(r'(\d )(\D )')

output:

       Data col_2 col_3
0  0010AAAA  0010  AAAA
1  0010BBBB  0010  BBBB
2  0010CCCC  0010  CCCC
3  0011DDDD  0011  DDDD
4  0011EEEE  0011  EEEE

or even: r'(\d )\W*(\D )' (digits / optional non-alphanum / non-digits) to handle both cases at once:

df[['col_2', 'col_3']] = df['Data'].str.extract(r'(\d )\W*(\D )')

example:

        Data col_2 col_3
0  0010-AAAA  0010  AAAA
1   0010BBBB  0010  BBBB
2  0010-CCCC  0010  CCCC
3   0011DDDD  0011  DDDD
4  0011-EEEE  0011  EEEE
  • Related