Home > Net >  Find row(s) that contain specific letter
Find row(s) that contain specific letter

Time:07-28

Have a large file (csv) data which I'm trying to parse genes that contain specific letters Example table (the actual data is 5GB and much larger matrix)

Cell_Index 880246 13694 491094
ABCA7 1 0 0
zyg11 0 0 0
ABR 1 0 1
ACAP2 1 0 0
mtycap 0 0 0
zyg11 1 1 0

I'm trying to rows that contain the letter "mt" in the Cell_Index column. What I tried :

df = df.loc[df['Cell_Index'].str.startswith("mt", case=False)]

When I ran that code it gave me an error message KeyError: 'Cell_Index'

Not sure what I did wrong here.....

CodePudding user response:

Just subset the data frame directly without using loc():

df = df[df["Cell_Index"].str.startswith("mt", case=False)]

CodePudding user response:

IIUC, first you need to reset_index then try like below.

df = df.reset_index()
df = df.loc[df['Cell_Index'].str.startswith("mt")]
print(df)

  Cell_Index  880246  13694  491094
4     mtycap       0      0       0
  • Related