Home > database >  ValueError: Must be all encoded bytes when reading csv with 0 and 1 in pandas
ValueError: Must be all encoded bytes when reading csv with 0 and 1 in pandas

Time:01-17

I am trying to read a csv with 1s and 0s and convert them to True and False, because I have a lot of columns I would like to use the true_values and flase_values arguments, but I got ValueError: Must be all encoded bytes:

from io import StringIO
import numpy as np
import pandas as pd

pd.read_csv(StringIO("""var1, var2
0,   0
0,   1
1,   1
0,   0
0,   1
1,   0"""), true_values=[1],false_values=[0])

I cannot find the problem with the code that I wrote.

CodePudding user response:

You don't need true_values and false_values parameters. Use dtype instead:

>>> pd.read_csv(StringIO("""var1,var2
0,0
0,1
1,1
0,0
0,1
1,0"""), dtype={'var1': bool, 'var2': bool})

    var1   var2
0  False  False
1  False   True
2   True   True
3  False  False
4  False   True
5   True  False

If your columns have same prefix, use filter:

df = pd.read_csv(StringIO("""..."""))
cols = df.filter(like='var').columns
df[cols] = df[cols].astype(bool)

If your columns are consecutive, use iloc:

df = pd.read_csv(StringIO("""..."""))
cols = df.iloc[:, 0:2].columns
df[cols] = df[cols].astype(bool)

Auto-detection:

m = df.min().eq(0) & df.max().eq(1)
df.loc[:, m] = df.loc[:, m].astype(bool)
  • Related