How do I create my DataFrame to show only French movies in the 'Language' column of my dataset where there is multiple languages in the column?
Example:
Languages column:
French
English
German,French,Spanish
Spanish,English,French
French, English, Gernman
What I have been trying only brings back the columns that have French only as the value in the language column. Please help!
I have tried:
df.loc[df['column_name'] == some_value]
but it only returns movies that are in the French language only, not those that are in French but also in other languages.
CodePudding user response:
Use str.contains
with word boundaries (\b
) to avoid matching substrings (e.g. 'Abc' matching 'Abcde'):
df.loc[df['column_name'].str.contains(r'\bFrench\b', case=False)]
If you are sure that there is no possible substring match (might be possible with languages):
df.loc[df['column_name'].str.contains('French', case=False)]
CodePudding user response:
Loc function returns the data at the specified index. You should get the rows you want like this:
df[df['column_name'] == 'value']