I want to get all column names that corr relationship is over 0.2 and lower than 0.8. Is there any way to do this?
CodePudding user response:
Using the example from the pandas docs, we can get the corr and filter with two conditions, take the index of the matches and output to list.
import pandas as pd
def histogram_intersection(a, b):
v = np.minimum(a, b).sum().round(decimals=1)
return v
df = pd.DataFrame([(.2, .3), (.0, .6), (.6, .0), (.2, .1)],
columns=['dogs', 'cats'])
c = abs(df.corr(method=histogram_intersection)['cats'])
print(c)
print(c[(c>.2) & (c<.8)].index.tolist())
Output
dogs 0.3
cats 1.0
Name: cats, dtype: float64
['dogs']
CodePudding user response:
You can index the Series corList
with conditions, and retrieve the names with .index
:
corList[(corList > 0.2) & (corList 0.8)].index
Or a possible more readable version:
corList[corList.gt(0.2) & corList.lt(0.8)].index