Home > front end >  Can you explain the syntax: "pd.index.str.lower().str.contains()"?
Can you explain the syntax: "pd.index.str.lower().str.contains()"?

Time:10-17

When I provided the following command.

import pandas as pd

movies=pd.read_csv(r'E:\movies.csv', index_col="Title")
movies_with_dark=movies.index.str.lower().str.contains("dark")
movies[movies_with_dark]

The result was, a DataFrame containing all films with the keyword "dark"

Can somebody explain to me the syntax, pd.index.str.lower().str.contains() Especially, why the str method again, after the lower()?

CodePudding user response:

I imagine your index contains strings with the names of movies.

# access the index
movies.index

# make index strings lowercase
movies.index.str.lower()

# check if each string contains the word "dark"
movies.index.str.lower().str.contains("dark")
# one could also use
movies.index.str.contains("dark", case=False)

The above returns a Series of booleans that is assigned to a variable and use to slice the original data with boolean indexing:

movies_with_dark=movies.index.str.lower().str.contains("dark")
movies[movies_with_dark]

Example input:

                 col
The Dark Movie     A
another darkness   B
something else     C

Intermediates (as columns for clarity):

                 col             index         str.lower  str.contains("dark")
The Dark Movie     A    The Dark Movie    the dark movie                  True
another darkness   B  another darkness  another darkness                  True
something else     C    something else    something else                 False

Output:

                 col
The Dark Movie     A
another darkness   B

CodePudding user response:

I would recommend you read working with text data.

import pandas as pd

df = pd.DataFrame({'Movie': ['Dark Movie', 'Another'], 'Budget': [1, 2]}).set_index(['Movie'])

print(df.index.str.lower())

returns an index object which doesn't have a contains method.

Index(['dark movie', 'another'], dtype='object', name='Movie')

the index object does however have a str method which lets you access vectorized string methods, hence the second contains method.

If you are interested, a believe the StringMethods object is defined here.

  • Related