Is there a way to find a value based on the combination of column values?
Example:
df = pd.DataFrame({
'One' : [np.random.randint(1, 10) for i in range(10)],
'Two' : [np.random.randint(1, 10) for i in range(10)],
'Three' : [np.random.randint(1, 10) for i in range(10)],
'Four' : [np.random.randint(1, 10) for i in range(10)],
})
.
In [6]: df
Out[6]:
One Two Three Four
0 8 1 7 5
1 6 3 3 3
2 4 7 5 2
3 4 2 6 9
4 1 7 1 9
5 9 8 3 8
6 4 8 4 4
7 1 9 7 1
8 4 2 6 4
9 1 3 7 7
I have the following df and I'm interested which of the columns would I need to combine to get number 9. This would be a column based combination, so each row would output a name of columns that were combined to find 9.
In example row 0 will be give the result:
['One', 'Two']
row 1: will give a combination of
'One'
and all other three.['One', 'Two'], ['One', 'Three'], ['One', 'Four']
row 2: False
row 3: ['Four']
and etc...
NOTE: DataFrame should stay immutabile.
Thanks for any future suggestions or help.
CodePudding user response:
First get all the combinations of column names for the dataframe, you can use itertools.combinations
for it, then create a function that will calculate the sum
for each of the combination of column names, and store such combinations in temporary list if the sum
equals the required sum, finally return the list of column combinations, and apply this function on dataframe, for axis=1
import itertools
cols = [j for i in [[list(col)
for col in (itertools.combinations(df.columns, i))]
for i in range(1, df.shape[1] 1)]
for j in i]
def getSubArray(row, sum_=9):
result=[]
for col in cols:
if row.loc[col].sum()==sum_:
result.append(col)
return result
result = df.apply(getSubArray, axis=1)
OUTPUT:
0 [[One, Two]]
1 [[One, Two], [One, Three], [One, Four], [Two, Three, Four]]
2 [[One, Three], [Two, Four]]
3 [[Four]]
4 [[Four], [One, Two, Three]]
5 [[One]]
6 []
7 [[Two], [One, Three, Four]]
8 []
9 []
dtype: object
You can pass any value for required sum to getSubArray
from .apply
call, like: df.apply(getSubArray, axis=1, sum_=24)
CodePudding user response:
from itertools import combinations
def find_9(x, v=9):
c = combinations(x.index, 2)
columns = []
for i,j in c:
if (x[i] x[j]) == v:
columns.append([i,j])
return columns if columns else False
df.apply(lambda x: find_9(x),axis= 1)
ouput:
0 [[One, Two]]
1 [[One, Two], [One, Three], [One, Four]]
2 [[One, Three], [Two, Four]]
3 False
4 False
5 False
6 False
7 False
8 False
9 False