Home > Blockchain >  Pandas - How to convert an string column into Integer... then convert into String with 10 charact
Pandas - How to convert an string column into Integer... then convert into String with 10 charact

Time:08-18

I'm performing a data analysis where one of the steps is to create a key by combining several fields.

Unfortunally, the number of digits in a given field is not always the same.

Some information
  • Datatype of my_field is object;
  • nan values have been replaced by the '-' character.
  • But, basically, the my_field is numbers (INTEGER) formatted in Text.
Code
import pandas as pd
import numpy as np

data ={'product': ['PA1', 'PA2', 'PA3', 'PA4', 'PA5', 'PA6', 'PA7', 'PA8'],
       'my_field': ['001', '0000000000002', '3', '04', '-', '5', '-', '6']}
df = pd.DataFrame(data)   
df
Raw Data
product my_field
0 PA1 001
1 PA2 0000000000002
2 PA3 3
3 PA4 04
4 PA5 -
5 PA6 5
6 PA7 -
7 PA8 6
My Aproach:
df['my_field'] = np.where(df['my_field'] == '-', '-' , df['my_field'].str.zfill(10) )
df
My Output:
product my_field
0 PA1 0000000001
1 PA2 0000000000002
2 PA3 0000000003
3 PA4 0000000004
4 PA5 -
5 PA6 0000000005
6 PA7 -
7 PA8 0000000006
Desired Output:
product my_field
0 PA1 0000000001
1 PA2 0000000002
2 PA3 0000000003
3 PA4 0000000004
4 PA5 -
5 PA6 0000000005
6 PA7 -
7 PA8 0000000006

The problem: Some outputs get more then 10 char.

CodePudding user response:

What about slicing after zfill, this way you'll keep the last 10 characters only:

df['my_field'] = np.where(df['my_field'] == '-', '-', df['my_field'].str.zfill(10).str[-10:])

Alternative with boolean indexing:

df.loc[df['my_field'] != '-',
       'my_field'] = df['my_field'].str.zfill(10).str[-10:]

Output:

  product    my_field
0     PA1  0000000001
1     PA2  0000000002
2     PA3  0000000003
3     PA4  0000000004
4     PA5           -
5     PA6  0000000005
6     PA7           -
7     PA8  0000000006

CodePudding user response:

def myfield_format(x):
    field=(10-len(str(x)))*'0' str(x) if x!='-' else '-'
    return field

df['my_field']=df['my_field'].map(lambda x: myfield_format(x))


    
  • Related