Home > Software design >  Filter pandas dataframe by ROW (Not column)
Filter pandas dataframe by ROW (Not column)

Time:01-29

How do you filter a pandas dataframe by row?

Seems like there's so many answers for filtering by column but none by row.

I have a simple df.describe() and would like to keep all columns that have a std of greater than 1000.

Current code is: df.describe().sort_values(by = 'std', axis = 1).loc['std'] > 1000 but does not work as intended.

Here is just some datapoints in my dataset, obviously there is more than 3 datapoints which is why i would need this function:

{'SalePrice': {'count': 1168.0,
  'mean': 180382.8998287671,
  'std': 78838.24838038624,
  'min': 35311.0,
  '25%': 129900.0,
  '50%': 162950.0,
  '75%': 214000.0,
  'max': 755000.0},
 'LotArea': {'count': 1168.0,
  'mean': 10604.377568493152,
  'std': 10603.205552654954,
  'min': 1300.0,
  '25%': 7598.25,
  '50%': 9549.5,
  '75%': 11628.5,
  'max': 215245.0},
 'MiscVal': {'count': 1168.0,
  'mean': 47.08390410958904,
  'std': 544.0945591249799,
  'min': 0.0,
  '25%': 0.0,
  '50%': 0.0,
  '75%': 0.0,
  'max': 15500.0},

Added screen shot so you can see. Filter columns based on this row:

enter image description here

CodePudding user response:

For fun, you can make a double T (which is of course a waste of computations) :

out = df.T.loc[lambda x_: x_["std"].gt(1000)].T

And in a Pandorable way, as per @Scott Boston use boolean indexing and slice the columns :

out = df.loc[:, df.loc["std"].gt(1000)]

Output :

print(out)

           SalePrice        LotArea
count    1168.000000    1168.000000
mean   180382.899829   10604.377568
std     78838.248380   10603.205553 # <- both columns are greater than 1000
min     35311.000000    1300.000000
25%    129900.000000    7598.250000
50%    162950.000000    9549.500000
75%    214000.000000   11628.500000
max    755000.000000  215245.000000
  • Related