Home > Net >  Getting indeces of rows given query of header names and respective values in Pandas
Getting indeces of rows given query of header names and respective values in Pandas

Time:07-04

I am given a dataframe, a subset of headers and values of those columns. I am interested in finding the indices of the dataframe that contain the values of interest in the columns of interest without explicitly typing in the column name and value; i.e. using df.index[df['BoolCol'] == VALUE and df['BoolCol2' == VALUE2] as I wont know what the header and values will be, and they will change every so often. I'm not sure how to do this when you can't explicitly in the code type the column names and values, and simply using variables that contain the lists of headers, and list of values.

Code Summary/Example:

df:
    Pretreat  Setup
0        3.0    0.5
1        3.0    0.5
2        3.0    3.0
3        3.0    3.0
4        3.0    5.0
5        3.0    5.0
6        3.0    0.5
7        3.0    0.5

query_labels = ['Pretreat', 'Setup'] #querying against 2 columns, Pretreat and Setup
query_values = [(3.0, 0.5)] #looking for indeces where Pretreat == 3.0 and Setup == 0.5 (in order of query_labels)

#Expecting:
{(3.0, 0.5): [0, 1, 6, 7]}

I hope that makes sense, happy to clarify if more info is needed.

Thank you!

CodePudding user response:

You can convert to series and check equqlity of all columns:

s = pd.Series(query_values[0], index=query_labels)

df[df.eq(s).all(1)].index

Output:

Int64Index([0, 1, 6, 7], dtype='int64')

If there are many items in query_labels:

out = {k: df[df.eq(pd.Series(k, index=query_labels)).all(1)].index.to_list() 
       for k in query_values}

Output: {(3.0, 0.5): [0, 1, 6, 7]}

CodePudding user response:

I am not sure i 100% understand what you require, but here's a solution of what i think u ask for ->

import pandas as pd

t = {'Pretreat': [3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0],
     'Setup': [0.5, 0.5, 3.0, 3.0, 5.0, 5.0, 0.5, 0.5]
     }

data = pd.DataFrame(t)
print(data)
# make a list of all rows in first column
pretest_list = list(data.iloc[:, 0]) 
# iloc is a dataframe method that slices data as specified in [$1,$2]
# $1 what rows you specify, $2 what columns you specify
satup_list = list(data.iloc[:, 1])
print(pretest_list)
print(satup_list)

i = 0
# each loop x is an element in pretest_list, y is an element in satup_list
for x, y in zip(pretest_list, satup_list):
    if(x == 3.0 and y == 0.5):
        print(i)
    i  = 1

  • Related