When I provided the following command.
import pandas as pd
movies=pd.read_csv(r'E:\movies.csv', index_col="Title")
movies_with_dark=movies.index.str.lower().str.contains("dark")
movies[movies_with_dark]
The result was, a DataFrame containing all films with the keyword "dark"
Can somebody explain to me the syntax, pd.index.str.lower().str.contains()
Especially, why the str
method again, after the lower()
?
CodePudding user response:
I imagine your index contains strings with the names of movies.
# access the index
movies.index
# make index strings lowercase
movies.index.str.lower()
# check if each string contains the word "dark"
movies.index.str.lower().str.contains("dark")
# one could also use
movies.index.str.contains("dark", case=False)
The above returns a Series of booleans that is assigned to a variable and use to slice the original data with boolean indexing:
movies_with_dark=movies.index.str.lower().str.contains("dark")
movies[movies_with_dark]
Example input:
col
The Dark Movie A
another darkness B
something else C
Intermediates (as columns for clarity):
col index str.lower str.contains("dark")
The Dark Movie A The Dark Movie the dark movie True
another darkness B another darkness another darkness True
something else C something else something else False
Output:
col
The Dark Movie A
another darkness B
CodePudding user response:
I would recommend you read working with text data.
import pandas as pd
df = pd.DataFrame({'Movie': ['Dark Movie', 'Another'], 'Budget': [1, 2]}).set_index(['Movie'])
print(df.index.str.lower())
returns an index object which doesn't have a contains method.
Index(['dark movie', 'another'], dtype='object', name='Movie')
the index object does however have a str method which lets you access vectorized string methods, hence the second contains method.
If you are interested, a believe the StringMethods object is defined here.