Home > database >  Pandas: Select column by location and rows by value
Pandas: Select column by location and rows by value

Time:12-08

I have a dataframe where one of the column names is a variable:

xx = pd.DataFrame([{'ID':1, 'Name': 'Abe', 'HasCar':1},
      {'ID':2, 'Name': 'Ben', 'HasCar':0},
      {'ID':3, 'Name': 'Cat', 'HasCar':1}])

    ID  Name    HasCar
0   1   Abe     1
1   2   Ben     0
2   3   Cat     1

In this dummy example column 2 could be "HasCar", or "IsStaff", or some other unknowable value. I want to select all rows, where column 2 is True, whatever the column name is.

I've tried the following without success:

xx.iloc[:,[2]] ==  1

    HasCar
0   True
1   False
2   True

and then trying to use that as an index results in:

xx[xx.iloc[:,[2]] ==  1]

    ID  Name    HasCar
0   NaN NaN 1.0
1   NaN NaN NaN
2   NaN NaN 1.0

Which isn't helpful. I suppose I could go about renaming column 2 but that feels a little wrong. The issue seems to be that xx.iloc[:,[2]] returns a dataframe while xx['hasCar'] returns a series. I can't figure out how to force a (x,1) shaped dataframe into a series without knowing the column name, as described here .

Any ideas?

CodePudding user response:

It was almost correct, but you sliced in 2D, use a Series slicing instead:

xx[xx.iloc[:, 2] ==  1]

Output:

   ID Name  HasCar
0   1  Abe       1
2   3  Cat       1

difference:

# 2D slicing, this gives a DataFrame (with a single column)
xx.iloc[:,[2]]

   HasCar
0       1
1       0
2       1

# 1D slicing, as Series
xx.iloc[:,2]

0    1
1    0
2    1
Name: HasCar, dtype: int64
  • Related