I do use DataFrame.set_index()
to transform two columns into a MultiIndex
. The problem is that values with 0
and 1
are transformed into booleans False
and True
.
This is the initial table. Please see the values in idx2
.
| | idx1 | idx2 | val |
|---:|:-------|:-------|------:|
| 0 | A | | 1 |
| 1 | B | False | 2 |
| 2 | B | True | 3 |
| 3 | C | 0 | 4 |
| 4 | C | 1 | 5 |
| 5 | C | 2 | 6 |
| 6 | C | 3 | 7 |
After doing df.set_index(['idx1', 'idx2'])
the table looks like this. Look into the 4th and 5th row please and see that the integers are transformed into booleans.
| | val |
|:-------------|------:|
| ('A', '') | 1 |
| ('B', False) | 2 |
| ('B', True) | 3 |
| ('C', False) | 4 | <<<< should be ('C', 0)
| ('C', True) | 5 | <<<< should be ('C', 1)
| ('C', 2) | 6 |
| ('C', 3) | 7 |
This happens with pandas version 1.5.3
.
The question is why this happens and if there is a way to prevent that?
Here is a full MWE
#!/usr/bin/env python3
import pandas
df = pandas.DataFrame({
'idx1': list('ABBCCCC'),
'idx2': ['', False, True, 0, 1, 2, 3],
'val': range(1, 8)
})
print(df.to_markdown())
df = df.set_index(['idx1', 'idx2'])
print(df.to_markdown())
CodePudding user response:
Why this happens?
I think it's better to open an issue on the github.