i have a large measurement data which contain 35O columns after filtering(for example to A49,B0to B49,F0 toF49) with some random numbers. Now i want to look in to (B0 to B49) whether it has values in the range(say: between 20 and 30).If not I want to delete that columns from the measurement data.
How to do this in python with pandas?
I want to know some faster methods for this filtering?
sample data:https://docs.google.com/spreadsheets/d/17Xjc81jkjS-64B4FGZ06SzYDRnc6J27m/edit?usp=sharing&ouid=106137353367530025738&rtpof=true&sd=true
CodePudding user response:
(In Pandas) You can apply a function on all elements of an array using the applymap
function. You can also apply aggregating actions to have a single value out of a whole column. You put those two things together to have what you want.
For instance, you want to know if a given set of columns (the "B" ones) have value in some range (say, 20:30). So, you want to verify the values at the element level, but collect the column names as output.
You can do that with the following code. Execute them separately/progressively to understand what they are doing.
>>> b_cols_of_interest_indx = df.filter(regex='^B').applymap(lambda x:20<x<30).any()
>>> b_cols_of_interest_indx[b_cols_of_interest_indx]
B19 True
B21 True
dtype: bool