Home > Software design >  Pandas output 2 column in data frame using apply function which returns a tuple / list of 2 items
Pandas output 2 column in data frame using apply function which returns a tuple / list of 2 items

Time:02-23

I have a data frame which has 10 columns. You can use this code to generate an example frame called df.

cols = []
for i in range(1,11):
    cols.append(f'x{i}')

df = pd.DataFrame(np.random.randint(10,99,size=(10, 10)), columns=cols)

The data frame will look something like this, it is randomly generated so your figures will be different.

    x1  x2  x3  x4  x5  x6  x7  x8  x9  x10
0   91  30  82  10  92  62  43  66  96  88
1   61  95  77  16  19  67  88  44  72  52
2   44  21  68  93  29  40  25  78  96  94
3   80  11  50  55  14  56  21  78  36  41
4   84  52  97  29  92  44  89  78  27  62
5   11  82  83  84  34  90  56  74  68  76
6   31  92  13  89  95  80  75  59  81  74
7   14  25  47  98  67  18  78  10  64  40
8   52  75  60  44  36  18  33  79  65  18
9   19  69  12  61  60  92  61  21  43  72

I want to apply a function which returns a tuple. I want to use the tuples to create 2 columns in my data frame.

def some_func(i1,i2):
    o1 = i2 / i1 * 0.5
    o2 = i2 * o1 * 6
    return o1,o2

When I did this,

df['c1'], df['c2'] = df.apply(lambda row: some_func(row['x9'],row['x10']), axis=1)

I get this error,

ValueError: too many values to unpack (expected 2)

The output should look like this,

    x1  x2  x3  x4  x5  x6  x7  x8  x9  x10 c1          c2
0   91  30  82  10  92  62  43  66  96  88  0.458333    242.000000
1   61  95  77  16  19  67  88  44  72  52  0.361111    112.666667
2   44  21  68  93  29  40  25  78  96  94  0.489583    276.125000
3   80  11  50  55  14  56  21  78  36  41  0.569444    140.083333
4   84  52  97  29  92  44  89  78  27  62  1.148148    427.111111
5   11  82  83  84  34  90  56  74  68  76  0.558824    254.823529
6   31  92  13  89  95  80  75  59  81  74  0.456790    202.814815
7   14  25  47  98  67  18  78  10  64  40  0.312500    75.000000
8   52  75  60  44  36  18  33  79  65  18  0.138462    14.953846
9   19  69  12  61  60  92  61  21  43  72  0.837209    361.674419

If I only return 1 output, and create 1 column it works fine. How do I output 2 items (tuple or list of 2 items) and create 2 new columns using this?

CodePudding user response:

Since you need to loop through multiple columns by rows, a better / more efficient approach is to use zip for loop to create a list of tuples which you can directly assign to a list of columns to the original data frame:

df[['c1', 'c2']] = [some_func(x, y) for x, y in zip(df.x9, df.x10)]

df    
   x1  x2  x3  x4  x5  x6  x7  x8  x9  x10        c1          c2
0  20  67  76  95  28  60  82  81  90   93  0.516667  288.300000
1  94  30  97  82  51  10  54  43  36   41  0.569444  140.083333
2  50  57  85  48  67  65  41  91  48   46  0.479167  132.250000
3  61  36  44  59  18  71  42  18  56   77  0.687500  317.625000
4  11  85  34  66  45  55  21  42  77   27  0.175325   28.402597
5  20  19  86  46  97  21  84  12  86   98  0.569767  335.023256
6  24  87  65  62  22  43  26  80  15   64  2.133333  819.200000
7  38  15  23  22  89  89  19  32  21   33  0.785714  155.571429
8  82  88  64  89  92  88  15  30  85   83  0.488235  243.141176
9  96  24  91  70  96  54  57  81  59   32  0.271186   52.067797

CodePudding user response:

There must be several ways.

I tried to do what you want with only a few changes.

  • No lambda usage because you already defined your own function.
  • result_type="expand" in the apply() so that the return value will be split to multiple columns.
  • Dataframe rather than two Series so that the return values can be split into the dataframe (consist of two Series).
import pandas as pd

df = pd.DataFrame({
    'inputcol1': [1, 2, 3, 4],
    'inputcol2': [1, 2, 3, 4]
})


def some_func(x):
    output1 = x['inputcol1']   x['inputcol2']
    output2 = x['inputcol2'] - x['inputcol2']
    return output1, output2

print(df)

#   inputcol1  inputcol2
#0          1          1
#1          2          2
#2          3          3
#3          4          4

df[['outputcol1', 'outputcol2']] = df[['inputcol1', 'inputcol2']].apply(some_func, axis=1, result_type="expand")

print(df)

#   inputcol1  inputcol2  outputcol1  outputcol2
#0          1          1           2           0
#1          2          2           4           0
#2          3          3           6           0
#3          4          4           8           0

CodePudding user response:

First generate a temporary DataFrame:

wrk = df.apply(lambda row: some_func(row['x9'],row['x10']), axis=1)\
    .apply(pd.Series, index=['c1', 'c2'])

Details:

  • df.apply(…) - your code - creates from each row tuple and they are collected in a Series.
  • apply(pd.Series, index=['c1', 'c2']) - from each element of the Series generated so far (a tuple) create a Series with index containing new column names. These Series objects are then collected into a DataFrame, where source index values are now column names.

Print wrk to see the result generated so far.

Then join it to df and save the result back under df:

df = df.join(wrk)
  • Related