Home > Software engineering >  Pandas: how to index dataframe for certain value or string without knowing the column name
Pandas: how to index dataframe for certain value or string without knowing the column name

Time:02-13

Although it seems very detrimental, I am having a hard time getting the index of a dataframe for a certain string or value at a random position in the dataframe.

I made an example dataframe:

fruits = {
    'column1':["Apples","Pears","Bananas","Oranges","Strawberries"],
    'column2':[1,2,3,4,5],
    'column3':["Kiwis","Mangos","Pineapples","Grapes","Melons"]
          }

df = pd.DataFrame(fruits)

        column1  column2     column3
0        Apples        1       Kiwis
1         Pears        2      Mangos
2       Bananas        3  Pineapples
3       Oranges        4      Grapes
4  Strawberries        5      Melons

Now I want to get the position index of Mangos, without having the knowledge in which column or row it exists. So far I succeeded in getting the row index:

print(df.loc[df.isin(["Mangos"]).any(axis=1)].index)

Which results in:

Int64Index([1], dtype='int64')

But now I would also like to retrieve the column index or column name.

This thread is a very simplified version of Get column name where value is something in pandas dataframe, but I could not figure out the code using the other thread.

CodePudding user response:

You can simply do:

df.columns[df.isin(['Mangos']).any()])
Index(['column3'], dtype='object')

Or to just get the column name:

df.columns[df.isin(['Mangos']).any()][0]
# column3

To get the index of the column, try:

df.columns.get_indexer(df.columns[df.isin(['Mangos']).any()])
# [2]

CodePudding user response:

Stack the dataframe to reshape into multiindex series, then use boolean indexing to get the index

s = df.stack()
s[s == 'Mangos'].index

MultiIndex([(1, 'column3')])

CodePudding user response:

You can try a simple search yourself.

coordinates = []
indices = df.index
columns = df.columns
for index in indices: # df is the DataFrame
         for col in columns:
             if df[col][index] == 'Mangos':
                 coordinates.append((index, col))
coordinates

Output:

[(1, 'column3')]

CodePudding user response:

You can use np.where as well.

Code:

import numpy as np
[(df.index[i], df.columns[c]) for i, c in zip(*np.where(df.isin(['Mangos'])))]

Output:

[(1, 'column3')]
  • Related