Pandas - How to convert an string column into Integer... then convert into String with 10 charact-CodePudding

I'm performing a data analysis where one of the steps is to create a key by combining several fields.

Unfortunally, the number of digits in a given field is not always the same.

Some information

Datatype of my_field is object;
nan values have been replaced by the '-' character.
But, basically, the my_field is numbers (INTEGER) formatted in Text.

Code

import pandas as pd
import numpy as np

data ={'product': ['PA1', 'PA2', 'PA3', 'PA4', 'PA5', 'PA6', 'PA7', 'PA8'],
       'my_field': ['001', '0000000000002', '3', '04', '-', '5', '-', '6']}
df = pd.DataFrame(data)   
df

Raw Data

	product	my_field
0	PA1	001
1	PA2	0000000000002
2	PA3	3
3	PA4	04
4	PA5	-
5	PA6	5
6	PA7	-
7	PA8	6

My Aproach:

df['my_field'] = np.where(df['my_field'] == '-', '-' , df['my_field'].str.zfill(10) )
df

My Output:

	product	my_field
0	PA1	0000000001
1	PA2	0000000000002
2	PA3	0000000003
3	PA4	0000000004
4	PA5	-
5	PA6	0000000005
6	PA7	-
7	PA8	0000000006

Desired Output:

	product	my_field
0	PA1	0000000001
1	PA2	0000000002
2	PA3	0000000003
3	PA4	0000000004
4	PA5	-
5	PA6	0000000005
6	PA7	-
7	PA8	0000000006

The problem: Some outputs get more then 10 char.

CodePudding user response：

What about slicing after zfill, this way you'll keep the last 10 characters only:

df['my_field'] = np.where(df['my_field'] == '-', '-', df['my_field'].str.zfill(10).str[-10:])

Alternative with boolean indexing:

df.loc[df['my_field'] != '-',
       'my_field'] = df['my_field'].str.zfill(10).str[-10:]

Output:

  product    my_field
0     PA1  0000000001
1     PA2  0000000002
2     PA3  0000000003
3     PA4  0000000004
4     PA5           -
5     PA6  0000000005
6     PA7           -
7     PA8  0000000006

CodePudding user response：

def myfield_format(x):
    field=(10-len(str(x)))*'0' str(x) if x!='-' else '-'
    return field

df['my_field']=df['my_field'].map(lambda x: myfield_format(x))