Is there a way other than specifying each column, i.e. df.drop(df[Col1]
...,
where rows can be deleted based on a condition?
For example, can I iterate through Col1, Col2, ...through Col15 and delete all rows ending with the letter "A"?
I was able to delete columns using
df.loc[:,~ df.columns.str.startswith('A')]
CodePudding user response:
IIUC, you have a pandas DataFrame and want to drop all rows that contain at least one string that ends with the letter 'A'. One fast way to accomplish this is by creating a mask via numpy
:
import pandas as pd
import numpy as np
Suppose our df
looks like this:
0 1 2 3 4 5
0 ADFC FDGA HECH AFAB BHDH 0
1 AHBD BABG CBCA AHDF BCAG 1
2 HEFH GEHH CBEF DGEC DGFE 2
3 HEDE BBHE CCCB DDGB DCAG 3
4 BGEC HACB ACHH GEBC GEEG 4
5 HFCC CHCD FCBC DEDF AECB 5
6 DEFE AHCH CHFB BBAA BAGC 6
7 HFEC DACC FEDA CBAG GEDD 7
Goal: we want to get rid of rows with index 0, 1, 6, 7.
Try:
mask = np.char.endswith(df.to_numpy(dtype=str),'A') # create ndarray with booleans
indices_true = df[mask].index.unique() # Int64Index([0, 1, 6, 7], dtype='int64')
df.drop(indices_true, inplace=True) # drop indices_true
print(df)
out:
0 1 2 3 4 5
2 HEFH GEHH CBEF DGEC DGFE 2
3 HEDE BBHE CCCB DDGB DCAG 3
4 BGEC HACB ACHH GEBC GEEG 4
5 HFCC CHCD FCBC DEDF AECB 5
CodePudding user response:
A bit unclear on your requirements but maybe this fits. Generate some words in columns for which end in 'A'. If any string in the designated columns ends with 'A' then delete the row.
nb_cols = 9
nb_vals = 6
def wgen():
return ''.join(random.choices(string.ascii_lowercase, k=5)) random.choice('ABCDEFGH')
df = pd.DataFrame({'Col' str(c): [wgen() for c in range(1,nb_vals)] for c in range(1,nb_cols 1)})
print(df)
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
0 aawivA qorjeA qfjuoD nkwgzF auablC aehnqE cwuvzF diqwaF qlnpzG
1 aidjuH ljalaB ldhgsC zaangH mdtgkF lypfnB kynrxG qlnygH zzqyrC
2 pzqibD jdumcF ddufmG xstdcH vqpbkG rjnqxD ugscrA kmvyaE cykutE
3 gqpycH ynaeeA onirjE mnbtyH swjuzF dyvmvC tpxgsH ssnhbD spsojD
4 isptdF qzpitH akzwgE klgqpH pqpcqH psryiD tjaurC daaieC piduzE
Say that we are looking for the "ending A" in Col4-Col7. Then row with index 2 needs to be deleted:
df[~df[['Col' str(c) for c in range(4,7 1)]]
.apply(lambda x: x.str.match('.*A$').any(), axis=1)]
Col1 Col2 Col3 Col4 Col5 Col6 Col7 Col8 Col9
0 aawivA qorjeA qfjuoD nkwgzF auablC aehnqE cwuvzF diqwaF qlnpzG
1 aidjuH ljalaB ldhgsC zaangH mdtgkF lypfnB kynrxG qlnygH zzqyrC
3 gqpycH ynaeeA onirjE mnbtyH swjuzF dyvmvC tpxgsH ssnhbD spsojD
4 isptdF qzpitH akzwgE klgqpH pqpcqH psryiD tjaurC daaieC piduzE