When I read a file which contains in a column int numbers with preceding zeros into a dataframe, then the zeros are removed. How can I prevent this?
Example:
file: "test.txt
" has the following content:
one two three
a 025700 's'
b 005930 7
cc 125945 hi
ddd 000003 9.0
Now I am reading it into a dataframe:
import pandas as pd
filename = "test.txt"
df = pd.read_table(filename, sep=" ")
The output is:
print(df)
one two three
0 a 25700 's'
1 b 5930 7
2 cc 125945 hi
3 ddd 3 9.0
I would like to have as the content of the dataframe second column the same content as in the file:
one two three
0 a 025700 's'
1 b 005930 7
2 cc 125945 hi
3 ddd 000003 9.0
CodePudding user response:
Use dtype
parameter:
df = pd.read_table(filename, sep=" ", dtype={'two': str})
print(df)
# Output
one two three
0 a 025700 's'
1 b 005930 7
2 cc 125945 hi
3 ddd 000003 9.0
Or if you don't want Pandas to infer your data types:
df = pd.read_table(filename, sep=" ", dtype=object)
CodePudding user response:
One simple/easy way is to convert columns to str.
df['two'] = df['two'].astype(str).str.zfill(8)
Same as above but, auto-calculating the required max_lenght
max_length = df['two'].astype(str).str.len().max()
df['two'] = df['two'].astype(str).str.zfill(max_length)