I have a DataFrame that looks like this:
----- ------ --------
| idx | Col1 | Col2 |
----- ------ --------
| 0 | A | [1, 2] |
| 1 | B | [3, 4] |
| 2 | C | [5, 6] |
----- ------ --------
What I would like to accomplish is a new column layout like this:
----- ------ -------------
| idx | Col1 | Col2 |
----- ------ ------ ------
| | | sub1 | sub2 |
----- ------ ------ ------
| 0 | A | 1 | 2 |
| 1 | B | 3 | 4 |
| 2 | C | 5 | 6 |
----- ------ ------ ------
The end goal is to be able to do a df.query()
like the following:
df.query("Col2.sub1 == 3 & Col2.sub2 == 4")
to get the row at index 1.
Is this even possible with df.query()
?
Edit This is what produces the first table.
records = [{'Col1': 'A', 'Col2': [1, 2]},{'Col1': 'B', 'Col2': [3,4]},{'Col1': 'C', 'Col2': [5,6]}]
df = pd.DataFrame.from_records(records)
CodePudding user response:
Firstly, split lists into columns:
df[['sub1', 'sub2']] = pd.DataFrame(df['Col2'].tolist(), index=df.index)
df = df.drop(columns='Col2')
Col1 sub1 sub2
0 A 1 2
1 B 3 4
2 C 5 6
Create a Multiindex:
df.columns = pd.MultiIndex.from_arrays([['0', 'Col2', 'Col2'],
df.columns.tolist()])
0 Col2
Col1 sub1 sub2
0 A 1 2
1 B 3 4
2 C 5 6
Now, here is how you can query the Multiindex:
df.query("`('Col2', 'sub1')` == 3 & `('Col2', 'sub2')` == 4")
0 Col2
Col1 sub1 sub2
1 B 3 4