Home > Blockchain >  Leading zeros removed by pandas import
Leading zeros removed by pandas import

Time:03-22

Why is my leading zeros not staying in column_item and line_item?

#format columns
dtype_dic= {'line_item': str, 
            'column_item' : str}

# loop over the list of csv files
for f in csv_files:
      
    # read the csv file
    df = pd.read_csv(f, sep=";", dtype = dtype_dic)
    df_list.append(df)

As an example line_item should be: 0010 0036 0230 1929

but after import it turns into: 10 36 230 1929

Here is the CSV file:

entity;business_line_group;conso_level_entity;report_name;line_item;column_item;z_axis;value_text;amount;approval_text
456;test;456;C_72_00_a;0070;0010;UNDEFINED;Value 1;05198630.14;28-feb-22
456;test;456;C_72_00_a;0190;0010;UNDEFINED;Value 1;835892217;28-feb-22
456;test;456;C_72_00_a;0260;0010;UNDEFINED;Value 1;4745984333;28-feb-22
456;test;456;C_73_00_a;0035;0010;UNDEFINED;Value 2;25424822307.28;28-feb-22
456;test;456;C_73_00_a;0070;0010;UNDEFINED;Value 2;-33216232069.67;28-feb-22
456;test;456;C_73_00_a;0080;0010;UNDEFINED;Value 1;-20966122130.53;28-feb-22
456;test;456;C_73_00_a;0110;0010;UNDEFINED;Value 1;-9384698955.80;28-feb-22
456;test;456;C_73_00_a;0230;0010;UNDEFINED;Value 1;2193605666.84;28-feb-22
456;test;456;C_73_00_a;0250;0010;UNDEFINED;Value 1;-573769151.28;28-feb-22
456;test;456;C_73_00_a;0260;0010;UNDEFINED;Value 1;3333715453.55;28-feb-22
456;test;456;C_73_00_a;0918;0010;UNDEFINED;Value 1;124366;28-feb-22
456;test;456;C_74_00_a;0160;0010;UNDEFINED;Value 5;-54345799619.07;28-feb-22
456;test;456;C_74_00_a;0260;0010;UNDEFINED;Value 5;150348.16;28-feb-22
456;test;456;C_73_00_a;1100;0010;UNDEFINED;Value 5;-37633449687.15;28-feb-22
456;test;456;C_73_00_a;1100;0020;UNDEFINED;Value 5;-3764349687.15;28-feb-22
456;test;456;C_73_00_a;1040;0040;UNDEFINED;Value 3;33764349687.15;28-feb-22
456;test;456;C_73_00_a;1045;0040;UNDEFINED;Value 3;33764349687.15;28-feb-22
456;test;456;C_73_00_a;1045;0030;UNDEFINED;Value 3;335098209.05;28-feb-22
456;test;456;C_73_00_a;1040;0010;UNDEFINED;Value 3;7449687.15;28-feb-22
456;test;456;C_73_00_a;1045;0010;UNDEFINED;Value 1;76449687.15;28-feb-22

I hope you can point me in the right direction.

CodePudding user response:

Can you try using converters:

dict_conv={'line_item': lambda x: str(x),
           'column_item': lambda x: str(x)}

df = pd.read_csv('data.csv', sep=';', converters=dict_conv)

Update

In fact, you don't even need to use lambda function here (but I'm not sure)

dict_conv={'line_item': str,
           'column_item': str}

df = pd.read_csv('data.csv', sep=';', converters=dict_conv)

CodePudding user response:

Your code behaves as expected, here using io.StringIO as a file proxy:

data='''entity;business_line_group;conso_level_entity;report_name;line_item;column_item;z_axis;value_text;amount;approval_text
456;test;456;C_72_00_a;0070;0010;UNDEFINED;Value 1;05198630.14;28-feb-22
456;test;456;C_72_00_a;0190;0010;UNDEFINED;Value 1;835892217;28-feb-22
456;test;456;C_72_00_a;0260;0010;UNDEFINED;Value 1;4745984333;28-feb-22
456;test;456;C_73_00_a;0035;0010;UNDEFINED;Value 2;25424822307.28;28-feb-22
456;test;456;C_73_00_a;0070;0010;UNDEFINED;Value 2;-33216232069.67;28-feb-22
456;test;456;C_73_00_a;0080;0010;UNDEFINED;Value 1;-20966122130.53;28-feb-22
456;test;456;C_73_00_a;0110;0010;UNDEFINED;Value 1;-9384698955.80;28-feb-22
456;test;456;C_73_00_a;0230;0010;UNDEFINED;Value 1;2193605666.84;28-feb-22
456;test;456;C_73_00_a;0250;0010;UNDEFINED;Value 1;-573769151.28;28-feb-22
456;test;456;C_73_00_a;0260;0010;UNDEFINED;Value 1;3333715453.55;28-feb-22
456;test;456;C_73_00_a;0918;0010;UNDEFINED;Value 1;124366;28-feb-22
456;test;456;C_74_00_a;0160;0010;UNDEFINED;Value 5;-54345799619.07;28-feb-22
456;test;456;C_74_00_a;0260;0010;UNDEFINED;Value 5;150348.16;28-feb-22
456;test;456;C_73_00_a;1100;0010;UNDEFINED;Value 5;-37633449687.15;28-feb-22
456;test;456;C_73_00_a;1100;0020;UNDEFINED;Value 5;-3764349687.15;28-feb-22
456;test;456;C_73_00_a;1040;0040;UNDEFINED;Value 3;33764349687.15;28-feb-22
456;test;456;C_73_00_a;1045;0040;UNDEFINED;Value 3;33764349687.15;28-feb-22
456;test;456;C_73_00_a;1045;0030;UNDEFINED;Value 3;335098209.05;28-feb-22
456;test;456;C_73_00_a;1040;0010;UNDEFINED;Value 3;7449687.15;28-feb-22
456;test;456;C_73_00_a;1045;0010;UNDEFINED;Value 1;76449687.15;28-feb-22'''

dtype_dic= {'line_item': str, 
            'column_item' : str}

df = pd.read_csv(io.StringIO(data), sep=";", dtype = dtype_dic)

dtypes:

entity                   int64
business_line_group     object
conso_level_entity       int64
report_name             object
line_item               object
column_item             object
z_axis                  object
value_text              object
amount                 float64
approval_text           object
dtype: object

output:

    entity business_line_group  conso_level_entity report_name line_item column_item     z_axis value_text        amount approval_text
0      456                test                 456   C_72_00_a      0070        0010  UNDEFINED    Value 1  5.198630e 06     28-feb-22
1      456                test                 456   C_72_00_a      0190        0010  UNDEFINED    Value 1  8.358922e 08     28-feb-22
2      456                test                 456   C_72_00_a      0260        0010  UNDEFINED    Value 1  4.745984e 09     28-feb-22
3      456                test                 456   C_73_00_a      0035        0010  UNDEFINED    Value 2  2.542482e 10     28-feb-22
4      456                test                 456   C_73_00_a      0070        0010  UNDEFINED    Value 2 -3.321623e 10     28-feb-22
5      456                test                 456   C_73_00_a      0080        0010  UNDEFINED    Value 1 -2.096612e 10     28-feb-22
6      456                test                 456   C_73_00_a      0110        0010  UNDEFINED    Value 1 -9.384699e 09     28-feb-22
7      456                test                 456   C_73_00_a      0230        0010  UNDEFINED    Value 1  2.193606e 09     28-feb-22
8      456                test                 456   C_73_00_a      0250        0010  UNDEFINED    Value 1 -5.737692e 08     28-feb-22
9      456                test                 456   C_73_00_a      0260        0010  UNDEFINED    Value 1  3.333715e 09     28-feb-22
10     456                test                 456   C_73_00_a      0918        0010  UNDEFINED    Value 1  1.243660e 05     28-feb-22
11     456                test                 456   C_74_00_a      0160        0010  UNDEFINED    Value 5 -5.434580e 10     28-feb-22
12     456                test                 456   C_74_00_a      0260        0010  UNDEFINED    Value 5  1.503482e 05     28-feb-22
13     456                test                 456   C_73_00_a      1100        0010  UNDEFINED    Value 5 -3.763345e 10     28-feb-22
14     456                test                 456   C_73_00_a      1100        0020  UNDEFINED    Value 5 -3.764350e 09     28-feb-22
15     456                test                 456   C_73_00_a      1040        0040  UNDEFINED    Value 3  3.376435e 10     28-feb-22
16     456                test                 456   C_73_00_a      1045        0040  UNDEFINED    Value 3  3.376435e 10     28-feb-22
17     456                test                 456   C_73_00_a      1045        0030  UNDEFINED    Value 3  3.350982e 08     28-feb-22
18     456                test                 456   C_73_00_a      1040        0010  UNDEFINED    Value 3  7.449687e 06     28-feb-22
19     456                test                 456   C_73_00_a      1045        0010  UNDEFINED    Value 1  7.644969e 07     28-feb-22
  • Related