I have a dataframe where the indexes are not numbers but strings (specifically, name of countries) and they are all unique. Given the name of a country, how do I find its row number (the 'number' value of the index)?
I tried df[df.index == 'country_name'].index
but this doesn't work.
CodePudding user response:
We can use Index.get_indexer
:
df.index.get_indexer(['Peru'])
[3]
Or we can build a RangeIndex
based on the size of the DataFrame then subset that instead:
pd.RangeIndex(len(df))[df.index == 'Peru']
Int64Index([3], dtype='int64')
Since we're only looking for a single label and the indexes are "all unique" we can also use Index.get_loc
:
df.index.get_loc('Peru')
3
Sample DataFrame:
import pandas as pd
df = pd.DataFrame({
'A': [1, 2, 3, 4, 5]
}, index=['Bahamas', 'Cameroon', 'Ecuador', 'Peru', 'Japan'])
df
:
A
Bahamas 1
Cameroon 2
Ecuador 3
Peru 4
Japan 5
CodePudding user response:
pd.Index.get_indexer
We can use pd.Index.get_indexer
to get integer index.
idx = df.index.get_indexer(list_of_target_labels)
# If you only have single label we can use tuple unpacking here.
[idx] = df.index.get_indexer([country_name])
NB:
pd.Index.get_indexer
takes a list and returns a list. Integers from 0 to n - 1 indicating that the index at these positions matches the corresponding target values. Missing values in the target are marked by -1.
np.where
You could also use np.where
here.
idx = np.where(df.index == country_name)[0]
list.index
We could also use list.index
after converting Pd.Index
to list using pd.Index.tolist
idx = df.index.tolist().index(country_name)
CodePudding user response:
Why you don make the index to be created with numbers instead of text? Because your df can be sorted in many ways beyond the alphabetical, and you can lose the rows count. With numbered index this wouldn't be a problem.