Find row(s) that contain specific letter-CodePudding

Have a large file (csv) data which I'm trying to parse genes that contain specific letters Example table (the actual data is 5GB and much larger matrix)

Cell_Index	880246	13694	491094
ABCA7	1	0	0
zyg11	0	0	0
ABR	1	0	1
ACAP2	1	0	0
mtycap	0	0	0
zyg11	1	1	0

I'm trying to rows that contain the letter "mt" in the Cell_Index column. What I tried :

df = df.loc[df['Cell_Index'].str.startswith("mt", case=False)]

When I ran that code it gave me an error message KeyError: 'Cell_Index'

Not sure what I did wrong here.....

CodePudding user response：

Just subset the data frame directly without using loc():

df = df[df["Cell_Index"].str.startswith("mt", case=False)]

CodePudding user response：

IIUC, first you need to reset_index then try like below.

df = df.reset_index()
df = df.loc[df['Cell_Index'].str.startswith("mt")]
print(df)

  Cell_Index  880246  13694  491094
4     mtycap       0      0       0