i want to remove a list string element if it matches the given criteria. the "genres" column in my dataframe contains a list of all the possible genres. and i want to remove on genre entry from the whole dataframe
removing = df['genres']
for row in removing:
for j in range(len(row)):
print(row[j])
if row[j] == 'روايات وقصص':
print('bingo')
print(row)
print(row[j])
print(j)
print(df['genres'].pop(j))
this code gives me the following error :
3626 # InvalidIndexError. Otherwise we fall through and re-raise
3627 # the TypeError.
3628 self._check_indexing_error(key)
an example of what i want to achieve : this is what i get right now
df['genres'][3] = [روايات وقصص, روايات رومانسية, روايات خيالية]
and this is what i want to achieve
df['genres'][3] = [ روايات رومانسية, روايات خيالية]
this is what my dataframe looks like
CodePudding user response:
Code snippet should solve your use case:
df = df[df['genres'] != 'روايات وقصص']
CodePudding user response:
I'd suggest a small workaround:
Eample dataframe:
import pandas as pd
df = pd.DataFrame([['movie_A', 'movie_B', 'movie_C'],
[['action', 'comedy'], ['thriller', 'action'], ['drama']]]).T
df.columns = ['name', 'genres']
Expand your genres column to multiple columns:
df = pd.concat([df.drop(columns='genres'), pd.DataFrame(df['genres'].tolist(),
index=df.index).add_prefix('genre_tmp')], axis=1)
Replace the genre you wish to exclude ('action' in this example, assuming the genre name does not occure in other columns):
df.replace({'action': None}, inplace=True)
Generate a column containing all genres as list.
genres_list = df[df.columns[df.columns.str.contains('genre_tmp')]].values.tolist()
for entry in genres_list:
if None in entry:
entry.remove(None)
df['genres'] = genres_list
Finally, remove the 'genres_tmp' columns:
df = df[df.columns[~df.columns.str.contains('genre_tmp')]]
CodePudding user response:
Try :
s = df["genres"].transform(lambda x: "روایات وقصص" in x)
df.drop(s[s].index, inplace=True)