I am trying to read a csv with 1s and 0s and convert them to True and False, because I have a lot of columns I would like to use the true_values
and flase_values
arguments, but I got
ValueError: Must be all encoded bytes:
from io import StringIO
import numpy as np
import pandas as pd
pd.read_csv(StringIO("""var1, var2
0, 0
0, 1
1, 1
0, 0
0, 1
1, 0"""), true_values=[1],false_values=[0])
I cannot find the problem with the code that I wrote.
CodePudding user response:
You don't need true_values
and false_values
parameters. Use dtype
instead:
>>> pd.read_csv(StringIO("""var1,var2
0,0
0,1
1,1
0,0
0,1
1,0"""), dtype={'var1': bool, 'var2': bool})
var1 var2
0 False False
1 False True
2 True True
3 False False
4 False True
5 True False
If your columns have same prefix, use filter
:
df = pd.read_csv(StringIO("""..."""))
cols = df.filter(like='var').columns
df[cols] = df[cols].astype(bool)
If your columns are consecutive, use iloc
:
df = pd.read_csv(StringIO("""..."""))
cols = df.iloc[:, 0:2].columns
df[cols] = df[cols].astype(bool)
Auto-detection:
m = df.min().eq(0) & df.max().eq(1)
df.loc[:, m] = df.loc[:, m].astype(bool)