Pandas output 2 column in data frame using apply function which returns a tuple / list of 2 items-CodePudding

I have a data frame which has 10 columns. You can use this code to generate an example frame called df.

cols = []
for i in range(1,11):
    cols.append(f'x{i}')

df = pd.DataFrame(np.random.randint(10,99,size=(10, 10)), columns=cols)

The data frame will look something like this, it is randomly generated so your figures will be different.

    x1  x2  x3  x4  x5  x6  x7  x8  x9  x10
0   91  30  82  10  92  62  43  66  96  88
1   61  95  77  16  19  67  88  44  72  52
2   44  21  68  93  29  40  25  78  96  94
3   80  11  50  55  14  56  21  78  36  41
4   84  52  97  29  92  44  89  78  27  62
5   11  82  83  84  34  90  56  74  68  76
6   31  92  13  89  95  80  75  59  81  74
7   14  25  47  98  67  18  78  10  64  40
8   52  75  60  44  36  18  33  79  65  18
9   19  69  12  61  60  92  61  21  43  72

I want to apply a function which returns a tuple. I want to use the tuples to create 2 columns in my data frame.

def some_func(i1,i2):
    o1 = i2 / i1 * 0.5
    o2 = i2 * o1 * 6
    return o1,o2

When I did this,

df['c1'], df['c2'] = df.apply(lambda row: some_func(row['x9'],row['x10']), axis=1)

I get this error,

ValueError: too many values to unpack (expected 2)

The output should look like this,

    x1  x2  x3  x4  x5  x6  x7  x8  x9  x10 c1          c2
0   91  30  82  10  92  62  43  66  96  88  0.458333    242.000000
1   61  95  77  16  19  67  88  44  72  52  0.361111    112.666667
2   44  21  68  93  29  40  25  78  96  94  0.489583    276.125000
3   80  11  50  55  14  56  21  78  36  41  0.569444    140.083333
4   84  52  97  29  92  44  89  78  27  62  1.148148    427.111111
5   11  82  83  84  34  90  56  74  68  76  0.558824    254.823529
6   31  92  13  89  95  80  75  59  81  74  0.456790    202.814815
7   14  25  47  98  67  18  78  10  64  40  0.312500    75.000000
8   52  75  60  44  36  18  33  79  65  18  0.138462    14.953846
9   19  69  12  61  60  92  61  21  43  72  0.837209    361.674419

If I only return 1 output, and create 1 column it works fine. How do I output 2 items (tuple or list of 2 items) and create 2 new columns using this?

CodePudding user response：

Since you need to loop through multiple columns by rows, a better / more efficient approach is to use zip for loop to create a list of tuples which you can directly assign to a list of columns to the original data frame:

df[['c1', 'c2']] = [some_func(x, y) for x, y in zip(df.x9, df.x10)]

df    
   x1  x2  x3  x4  x5  x6  x7  x8  x9  x10        c1          c2
0  20  67  76  95  28  60  82  81  90   93  0.516667  288.300000
1  94  30  97  82  51  10  54  43  36   41  0.569444  140.083333
2  50  57  85  48  67  65  41  91  48   46  0.479167  132.250000
3  61  36  44  59  18  71  42  18  56   77  0.687500  317.625000
4  11  85  34  66  45  55  21  42  77   27  0.175325   28.402597
5  20  19  86  46  97  21  84  12  86   98  0.569767  335.023256
6  24  87  65  62  22  43  26  80  15   64  2.133333  819.200000
7  38  15  23  22  89  89  19  32  21   33  0.785714  155.571429
8  82  88  64  89  92  88  15  30  85   83  0.488235  243.141176
9  96  24  91  70  96  54  57  81  59   32  0.271186   52.067797

CodePudding user response：

There must be several ways.

I tried to do what you want with only a few changes.

No lambda usage because you already defined your own function.
result_type="expand" in the apply() so that the return value will be split to multiple columns.
Dataframe rather than two Series so that the return values can be split into the dataframe (consist of two Series).

import pandas as pd

df = pd.DataFrame({
    'inputcol1': [1, 2, 3, 4],
    'inputcol2': [1, 2, 3, 4]
})


def some_func(x):
    output1 = x['inputcol1']   x['inputcol2']
    output2 = x['inputcol2'] - x['inputcol2']
    return output1, output2

print(df)

#   inputcol1  inputcol2
#0          1          1
#1          2          2
#2          3          3
#3          4          4

df[['outputcol1', 'outputcol2']] = df[['inputcol1', 'inputcol2']].apply(some_func, axis=1, result_type="expand")

print(df)

#   inputcol1  inputcol2  outputcol1  outputcol2
#0          1          1           2           0
#1          2          2           4           0
#2          3          3           6           0
#3          4          4           8           0

CodePudding user response：

First generate a temporary DataFrame:

wrk = df.apply(lambda row: some_func(row['x9'],row['x10']), axis=1)\
    .apply(pd.Series, index=['c1', 'c2'])

Details:

df.apply(…) - your code - creates from each row tuple and they are collected in a Series.
apply(pd.Series, index=['c1', 'c2']) - from each element of the Series generated so far (a tuple) create a Series with index containing new column names. These Series objects are then collected into a DataFrame, where source index values are now column names.

Print wrk to see the result generated so far.

Then join it to df and save the result back under df:

df = df.join(wrk)