Home > Blockchain >  Most efficient way to subset a pd df
Most efficient way to subset a pd df

Time:02-28

I have a big pandas DF (4 X 96103). Based on some conditions on values in first row, I want to extract a subset of the DF to a smaller DF_subset. For a single case this can be done many ways and wont harm computationally. But I will need to apply this operation of thousands of files (the same condition and the same operation). What is the most efficient way of this application. Below is a snippet of what I have done,

tt = []
x1=[]
x2=[]
x3=[]
for i in range(np.shape(DF)[1]):
    if ((float(DF.iloc[0,i]) > -5.0) and (float(DF.iloc[0,i])) < 15.0):
       tt.append(DF.iloc[0,i] )
       x1.append(DF.iloc[1,i] )
       x2.append(DF.iloc[2,i] )
       x3.append(DF.iloc[3,i] )
X = (np.concatenate((tt,x1,x2,x3),axis=0))
X = pd.DataFrame(np.reshape(X,(4,-1)))

The original DF looks like the following and the red marked zone is an example of how I want the DF_subset to be

enter image description here

CodePudding user response:

You need to transpose the df then select by conditions. Use:

df = DF.T
df[(df[0].astype(float)) > -5.0)&(df.iloc[0].astype(float)) < 15.0)]

Example: input df:

enter image description here

Ooutput df:

enter image description here

Example code:

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.rand(3,10))
df2 = df.T
df2[(df2[0]>.2)&(df2[0]<.7)]
  • Related