pandas.read_table - preceding zeros of numbers are removed-CodePudding

When I read a file which contains in a column int numbers with preceding zeros into a dataframe, then the zeros are removed. How can I prevent this?

Example:

file: "test.txt" has the following content:

one two three
a 025700 's'
b 005930 7
cc 125945 hi
ddd 000003 9.0

Now I am reading it into a dataframe:

import pandas as pd

filename = "test.txt"
df = pd.read_table(filename, sep=" ")

The output is:

print(df)

   one     two three
0    a   25700   's'
1    b    5930     7
2   cc  125945    hi
3  ddd       3   9.0

I would like to have as the content of the dataframe second column the same content as in the file:

   one      two three
0    a   025700   's'
1    b   005930     7
2   cc   125945    hi
3  ddd   000003   9.0

CodePudding user response：

Use dtype parameter:

df = pd.read_table(filename, sep=" ", dtype={'two': str})
print(df)

# Output
   one     two three
0    a  025700   's'
1    b  005930     7
2   cc  125945    hi
3  ddd  000003   9.0

Or if you don't want Pandas to infer your data types:

df = pd.read_table(filename, sep=" ", dtype=object)

CodePudding user response：

One simple/easy way is to convert columns to str.

df['two'] = df['two'].astype(str).str.zfill(8)

Same as above but, auto-calculating the required max_lenght

max_length = df['two'].astype(str).str.len().max()
df['two'] = df['two'].astype(str).str.zfill(max_length)