I have dataframe such as:
EXCLUDE WARNSIGN_DTL EVENT_DTL EVENT_DTL_2
1_1 The thing happened on 2021... according to this, this people did bla bla on 2021... It happened on 2021....
1_2 similar thing happened on 2012... that was happened on 2012...
...
1_1 Sam did on 2012... that was happened on 2012... it hasn't made sense till 2012...
Note: I simplified the original code in this
I made a code such as:
df_check = df[['EXCLUDE','WARNSIGN_DTL','EVENT_DTL','EVENT_DTL_2']]
df_check.EXCLUDE!=1_1 & df_check.apply(lambda x: x.str.contains('2021|2012', na=False))
but I got this error:
---------------------------------------------------------------------------
MemoryError: Unable to allocate 82.1 GiB for an array with shape (104959, 104959) and data type float64
CodePudding user response:
For first condition add ()
and for second add DataFrame.all
if need test if all True
s per rows or
DataFrame.any
if test at least one True
per row:
mask = (df_check.EXCLUDE!='1_1') &
df_check.apply(lambda x: x.str.contains('2021|2012', na=False).all(axis=1))
mask = (df_check.EXCLUDE!='1_1') &
df_check.apply(lambda x: x.str.contains('2021|2012', na=False).any(axis=1))