Have a large file (csv) data which I'm trying to parse genes that contain specific letters Example table (the actual data is 5GB and much larger matrix)
Cell_Index | 880246 | 13694 | 491094 |
---|---|---|---|
ABCA7 | 1 | 0 | 0 |
zyg11 | 0 | 0 | 0 |
ABR | 1 | 0 | 1 |
ACAP2 | 1 | 0 | 0 |
mtycap | 0 | 0 | 0 |
zyg11 | 1 | 1 | 0 |
I'm trying to rows that contain the letter "mt" in the Cell_Index column. What I tried :
df = df.loc[df['Cell_Index'].str.startswith("mt", case=False)]
When I ran that code it gave me an error message KeyError: 'Cell_Index'
Not sure what I did wrong here.....
CodePudding user response:
Just subset the data frame directly without using loc()
:
df = df[df["Cell_Index"].str.startswith("mt", case=False)]
CodePudding user response:
IIUC, first you need to reset_index then try like below.
df = df.reset_index()
df = df.loc[df['Cell_Index'].str.startswith("mt")]
print(df)
Cell_Index 880246 13694 491094
4 mtycap 0 0 0