Home > Back-end >  Can we write below code in more optimize way
Can we write below code in more optimize way

Time:04-21

By these line of code I am trying to drop an entire row from dataframe which doesn't contains a string '-->' in any column values.

try:
    for j in range(len(df)):
        flg = 0
        for i in df.columns:
            if df[i].astype(str).str.contains('-->').iloc[j]:
                flg = 1
        if flg == 0:
            df.drop(df.index[j], axis=0, inplace=True)
except:
    pass

This is a working code. The question is can we write this piece of code in more optimized way as this code is taking more time if we have 20K or more rows in dataframe.

CodePudding user response:

You could vectorize:

mask = df.astype(str).apply(lambda column : column.str.contains('-->')).any(axis=1)
df = df[mask]

CodePudding user response:

Select only object columns by DataFrame.select_dtypes and test if exist --> substrings, numeric and datetimes columns are not tested and filter all rows where it least one value matched by DataFrame.any in boolean indexing:

df = df[df.select_dtypes(object).apply(lambda x : x.str.contains('-->')).any(axis=1)]

Or convert all columns to strings, it should be slowier, because tested all columns (also numeric without substring -->):

df = df[df.astype(str).apply(lambda x : x.str.contains('-->')).any(axis=1)]
  • Related