Home > front end >  Find value based on a combination of columns
Find value based on a combination of columns

Time:12-04

Is there a way to find a value based on the combination of column values?

Example:

df = pd.DataFrame({
    'One' : [np.random.randint(1, 10) for i in range(10)],
    'Two' : [np.random.randint(1, 10) for i in range(10)],
    'Three' : [np.random.randint(1, 10) for i in range(10)],
    'Four' : [np.random.randint(1, 10) for i in range(10)],
})

.

In [6]: df
Out[6]:
   One  Two  Three  Four
0    8    1      7     5
1    6    3      3     3
2    4    7      5     2
3    4    2      6     9
4    1    7      1     9
5    9    8      3     8
6    4    8      4     4
7    1    9      7     1
8    4    2      6     4
9    1    3      7     7

I have the following df and I'm interested which of the columns would I need to combine to get number 9. This would be a column based combination, so each row would output a name of columns that were combined to find 9.

  • In example row 0 will be give the result: ['One', 'Two']

  • row 1: will give a combination of 'One' and all other three. ['One', 'Two'], ['One', 'Three'], ['One', 'Four']

  • row 2: False

  • row 3: ['Four']

and etc...

NOTE: DataFrame should stay immutabile.

Thanks for any future suggestions or help.

CodePudding user response:

First get all the combinations of column names for the dataframe, you can use itertools.combinations for it, then create a function that will calculate the sum for each of the combination of column names, and store such combinations in temporary list if the sum equals the required sum, finally return the list of column combinations, and apply this function on dataframe, for axis=1

import itertools
cols = [j for i in [[list(col)
                     for col in (itertools.combinations(df.columns, i))]
                    for i in range(1, df.shape[1]   1)]
        for j in i]
def getSubArray(row, sum_=9):
    result=[]
    for col in cols:
        if row.loc[col].sum()==sum_:
            result.append(col)
    return result 
result = df.apply(getSubArray, axis=1)

OUTPUT:

0                                                   [[One, Two]]
1    [[One, Two], [One, Three], [One, Four], [Two, Three, Four]]
2                                    [[One, Three], [Two, Four]]
3                                                       [[Four]]
4                                    [[Four], [One, Two, Three]]
5                                                        [[One]]
6                                                             []
7                                    [[Two], [One, Three, Four]]
8                                                             []
9                                                             []
dtype: object

You can pass any value for required sum to getSubArray from .apply call, like: df.apply(getSubArray, axis=1, sum_=24)

CodePudding user response:

from itertools import combinations
def find_9(x, v=9):
    c = combinations(x.index, 2)
    columns = []
    for i,j in c:
        if (x[i] x[j]) == v:
            columns.append([i,j])
    return columns if columns else False

df.apply(lambda x: find_9(x),axis= 1)

ouput:

0                               [[One, Two]]
1    [[One, Two], [One, Three], [One, Four]]
2                [[One, Three], [Two, Four]]
3                                      False
4                                      False
5                                      False
6                                      False
7                                      False
8                                      False
9                                      False
  • Related