Why is my leading zeros not staying in column_item and line_item?
#format columns
dtype_dic= {'line_item': str,
'column_item' : str}
# loop over the list of csv files
for f in csv_files:
# read the csv file
df = pd.read_csv(f, sep=";", dtype = dtype_dic)
df_list.append(df)
As an example line_item should be: 0010 0036 0230 1929
but after import it turns into: 10 36 230 1929
Here is the CSV file:
entity;business_line_group;conso_level_entity;report_name;line_item;column_item;z_axis;value_text;amount;approval_text
456;test;456;C_72_00_a;0070;0010;UNDEFINED;Value 1;05198630.14;28-feb-22
456;test;456;C_72_00_a;0190;0010;UNDEFINED;Value 1;835892217;28-feb-22
456;test;456;C_72_00_a;0260;0010;UNDEFINED;Value 1;4745984333;28-feb-22
456;test;456;C_73_00_a;0035;0010;UNDEFINED;Value 2;25424822307.28;28-feb-22
456;test;456;C_73_00_a;0070;0010;UNDEFINED;Value 2;-33216232069.67;28-feb-22
456;test;456;C_73_00_a;0080;0010;UNDEFINED;Value 1;-20966122130.53;28-feb-22
456;test;456;C_73_00_a;0110;0010;UNDEFINED;Value 1;-9384698955.80;28-feb-22
456;test;456;C_73_00_a;0230;0010;UNDEFINED;Value 1;2193605666.84;28-feb-22
456;test;456;C_73_00_a;0250;0010;UNDEFINED;Value 1;-573769151.28;28-feb-22
456;test;456;C_73_00_a;0260;0010;UNDEFINED;Value 1;3333715453.55;28-feb-22
456;test;456;C_73_00_a;0918;0010;UNDEFINED;Value 1;124366;28-feb-22
456;test;456;C_74_00_a;0160;0010;UNDEFINED;Value 5;-54345799619.07;28-feb-22
456;test;456;C_74_00_a;0260;0010;UNDEFINED;Value 5;150348.16;28-feb-22
456;test;456;C_73_00_a;1100;0010;UNDEFINED;Value 5;-37633449687.15;28-feb-22
456;test;456;C_73_00_a;1100;0020;UNDEFINED;Value 5;-3764349687.15;28-feb-22
456;test;456;C_73_00_a;1040;0040;UNDEFINED;Value 3;33764349687.15;28-feb-22
456;test;456;C_73_00_a;1045;0040;UNDEFINED;Value 3;33764349687.15;28-feb-22
456;test;456;C_73_00_a;1045;0030;UNDEFINED;Value 3;335098209.05;28-feb-22
456;test;456;C_73_00_a;1040;0010;UNDEFINED;Value 3;7449687.15;28-feb-22
456;test;456;C_73_00_a;1045;0010;UNDEFINED;Value 1;76449687.15;28-feb-22
I hope you can point me in the right direction.
CodePudding user response:
Can you try using converters
:
dict_conv={'line_item': lambda x: str(x),
'column_item': lambda x: str(x)}
df = pd.read_csv('data.csv', sep=';', converters=dict_conv)
Update
In fact, you don't even need to use lambda
function here (but I'm not sure)
dict_conv={'line_item': str,
'column_item': str}
df = pd.read_csv('data.csv', sep=';', converters=dict_conv)
CodePudding user response:
Your code behaves as expected, here using io.StringIO
as a file proxy:
data='''entity;business_line_group;conso_level_entity;report_name;line_item;column_item;z_axis;value_text;amount;approval_text
456;test;456;C_72_00_a;0070;0010;UNDEFINED;Value 1;05198630.14;28-feb-22
456;test;456;C_72_00_a;0190;0010;UNDEFINED;Value 1;835892217;28-feb-22
456;test;456;C_72_00_a;0260;0010;UNDEFINED;Value 1;4745984333;28-feb-22
456;test;456;C_73_00_a;0035;0010;UNDEFINED;Value 2;25424822307.28;28-feb-22
456;test;456;C_73_00_a;0070;0010;UNDEFINED;Value 2;-33216232069.67;28-feb-22
456;test;456;C_73_00_a;0080;0010;UNDEFINED;Value 1;-20966122130.53;28-feb-22
456;test;456;C_73_00_a;0110;0010;UNDEFINED;Value 1;-9384698955.80;28-feb-22
456;test;456;C_73_00_a;0230;0010;UNDEFINED;Value 1;2193605666.84;28-feb-22
456;test;456;C_73_00_a;0250;0010;UNDEFINED;Value 1;-573769151.28;28-feb-22
456;test;456;C_73_00_a;0260;0010;UNDEFINED;Value 1;3333715453.55;28-feb-22
456;test;456;C_73_00_a;0918;0010;UNDEFINED;Value 1;124366;28-feb-22
456;test;456;C_74_00_a;0160;0010;UNDEFINED;Value 5;-54345799619.07;28-feb-22
456;test;456;C_74_00_a;0260;0010;UNDEFINED;Value 5;150348.16;28-feb-22
456;test;456;C_73_00_a;1100;0010;UNDEFINED;Value 5;-37633449687.15;28-feb-22
456;test;456;C_73_00_a;1100;0020;UNDEFINED;Value 5;-3764349687.15;28-feb-22
456;test;456;C_73_00_a;1040;0040;UNDEFINED;Value 3;33764349687.15;28-feb-22
456;test;456;C_73_00_a;1045;0040;UNDEFINED;Value 3;33764349687.15;28-feb-22
456;test;456;C_73_00_a;1045;0030;UNDEFINED;Value 3;335098209.05;28-feb-22
456;test;456;C_73_00_a;1040;0010;UNDEFINED;Value 3;7449687.15;28-feb-22
456;test;456;C_73_00_a;1045;0010;UNDEFINED;Value 1;76449687.15;28-feb-22'''
dtype_dic= {'line_item': str,
'column_item' : str}
df = pd.read_csv(io.StringIO(data), sep=";", dtype = dtype_dic)
dtypes:
entity int64
business_line_group object
conso_level_entity int64
report_name object
line_item object
column_item object
z_axis object
value_text object
amount float64
approval_text object
dtype: object
output:
entity business_line_group conso_level_entity report_name line_item column_item z_axis value_text amount approval_text
0 456 test 456 C_72_00_a 0070 0010 UNDEFINED Value 1 5.198630e 06 28-feb-22
1 456 test 456 C_72_00_a 0190 0010 UNDEFINED Value 1 8.358922e 08 28-feb-22
2 456 test 456 C_72_00_a 0260 0010 UNDEFINED Value 1 4.745984e 09 28-feb-22
3 456 test 456 C_73_00_a 0035 0010 UNDEFINED Value 2 2.542482e 10 28-feb-22
4 456 test 456 C_73_00_a 0070 0010 UNDEFINED Value 2 -3.321623e 10 28-feb-22
5 456 test 456 C_73_00_a 0080 0010 UNDEFINED Value 1 -2.096612e 10 28-feb-22
6 456 test 456 C_73_00_a 0110 0010 UNDEFINED Value 1 -9.384699e 09 28-feb-22
7 456 test 456 C_73_00_a 0230 0010 UNDEFINED Value 1 2.193606e 09 28-feb-22
8 456 test 456 C_73_00_a 0250 0010 UNDEFINED Value 1 -5.737692e 08 28-feb-22
9 456 test 456 C_73_00_a 0260 0010 UNDEFINED Value 1 3.333715e 09 28-feb-22
10 456 test 456 C_73_00_a 0918 0010 UNDEFINED Value 1 1.243660e 05 28-feb-22
11 456 test 456 C_74_00_a 0160 0010 UNDEFINED Value 5 -5.434580e 10 28-feb-22
12 456 test 456 C_74_00_a 0260 0010 UNDEFINED Value 5 1.503482e 05 28-feb-22
13 456 test 456 C_73_00_a 1100 0010 UNDEFINED Value 5 -3.763345e 10 28-feb-22
14 456 test 456 C_73_00_a 1100 0020 UNDEFINED Value 5 -3.764350e 09 28-feb-22
15 456 test 456 C_73_00_a 1040 0040 UNDEFINED Value 3 3.376435e 10 28-feb-22
16 456 test 456 C_73_00_a 1045 0040 UNDEFINED Value 3 3.376435e 10 28-feb-22
17 456 test 456 C_73_00_a 1045 0030 UNDEFINED Value 3 3.350982e 08 28-feb-22
18 456 test 456 C_73_00_a 1040 0010 UNDEFINED Value 3 7.449687e 06 28-feb-22
19 456 test 456 C_73_00_a 1045 0010 UNDEFINED Value 1 7.644969e 07 28-feb-22