Python drop rows containing ending characters from any column-CodePudding

Is there a way other than specifying each column, i.e. df.drop(df[Col1]..., where rows can be deleted based on a condition?

For example, can I iterate through Col1, Col2, ...through Col15 and delete all rows ending with the letter "A"?

I was able to delete columns using

df.loc[:,~ df.columns.str.startswith('A')]

CodePudding user response：

IIUC, you have a pandas DataFrame and want to drop all rows that contain at least one string that ends with the letter 'A'. One fast way to accomplish this is by creating a mask via numpy:

import pandas as pd
import numpy as np

Suppose our df looks like this:

      0     1     2     3     4  5
0  ADFC  FDGA  HECH  AFAB  BHDH  0
1  AHBD  BABG  CBCA  AHDF  BCAG  1
2  HEFH  GEHH  CBEF  DGEC  DGFE  2
3  HEDE  BBHE  CCCB  DDGB  DCAG  3
4  BGEC  HACB  ACHH  GEBC  GEEG  4
5  HFCC  CHCD  FCBC  DEDF  AECB  5
6  DEFE  AHCH  CHFB  BBAA  BAGC  6
7  HFEC  DACC  FEDA  CBAG  GEDD  7

Goal: we want to get rid of rows with index 0, 1, 6, 7.

Try:

mask = np.char.endswith(df.to_numpy(dtype=str),'A') # create ndarray with booleans
indices_true = df[mask].index.unique() # Int64Index([0, 1, 6, 7], dtype='int64')
df.drop(indices_true, inplace=True) # drop indices_true
print(df)

out:
      0     1     2     3     4  5
2  HEFH  GEHH  CBEF  DGEC  DGFE  2
3  HEDE  BBHE  CCCB  DDGB  DCAG  3
4  BGEC  HACB  ACHH  GEBC  GEEG  4
5  HFCC  CHCD  FCBC  DEDF  AECB  5

CodePudding user response：

A bit unclear on your requirements but maybe this fits. Generate some words in columns for which end in 'A'. If any string in the designated columns ends with 'A' then delete the row.

nb_cols = 9
nb_vals = 6

def wgen():
    return ''.join(random.choices(string.ascii_lowercase, k=5))   random.choice('ABCDEFGH')

df = pd.DataFrame({'Col' str(c): [wgen() for c in range(1,nb_vals)] for c in range(1,nb_cols 1)})
print(df)

     Col1    Col2    Col3    Col4    Col5    Col6    Col7    Col8    Col9
0  aawivA  qorjeA  qfjuoD  nkwgzF  auablC  aehnqE  cwuvzF  diqwaF  qlnpzG
1  aidjuH  ljalaB  ldhgsC  zaangH  mdtgkF  lypfnB  kynrxG  qlnygH  zzqyrC
2  pzqibD  jdumcF  ddufmG  xstdcH  vqpbkG  rjnqxD  ugscrA  kmvyaE  cykutE
3  gqpycH  ynaeeA  onirjE  mnbtyH  swjuzF  dyvmvC  tpxgsH  ssnhbD  spsojD
4  isptdF  qzpitH  akzwgE  klgqpH  pqpcqH  psryiD  tjaurC  daaieC  piduzE

Say that we are looking for the "ending A" in Col4-Col7. Then row with index 2 needs to be deleted:

df[~df[['Col' str(c) for c in range(4,7 1)]]
   .apply(lambda x: x.str.match('.*A$').any(), axis=1)]

     Col1    Col2    Col3    Col4    Col5    Col6    Col7    Col8    Col9
0  aawivA  qorjeA  qfjuoD  nkwgzF  auablC  aehnqE  cwuvzF  diqwaF  qlnpzG
1  aidjuH  ljalaB  ldhgsC  zaangH  mdtgkF  lypfnB  kynrxG  qlnygH  zzqyrC
3  gqpycH  ynaeeA  onirjE  mnbtyH  swjuzF  dyvmvC  tpxgsH  ssnhbD  spsojD
4  isptdF  qzpitH  akzwgE  klgqpH  pqpcqH  psryiD  tjaurC  daaieC  piduzE