I have a data frame which has 10 columns. You can use this code to generate an example frame called df
.
cols = []
for i in range(1,11):
cols.append(f'x{i}')
df = pd.DataFrame(np.random.randint(10,99,size=(10, 10)), columns=cols)
The data frame will look something like this, it is randomly generated so your figures will be different.
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10
0 91 30 82 10 92 62 43 66 96 88
1 61 95 77 16 19 67 88 44 72 52
2 44 21 68 93 29 40 25 78 96 94
3 80 11 50 55 14 56 21 78 36 41
4 84 52 97 29 92 44 89 78 27 62
5 11 82 83 84 34 90 56 74 68 76
6 31 92 13 89 95 80 75 59 81 74
7 14 25 47 98 67 18 78 10 64 40
8 52 75 60 44 36 18 33 79 65 18
9 19 69 12 61 60 92 61 21 43 72
I want to apply a function which returns a tuple. I want to use the tuples to create 2 columns in my data frame.
def some_func(i1,i2):
o1 = i2 / i1 * 0.5
o2 = i2 * o1 * 6
return o1,o2
When I did this,
df['c1'], df['c2'] = df.apply(lambda row: some_func(row['x9'],row['x10']), axis=1)
I get this error,
ValueError: too many values to unpack (expected 2)
The output should look like this,
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 c1 c2
0 91 30 82 10 92 62 43 66 96 88 0.458333 242.000000
1 61 95 77 16 19 67 88 44 72 52 0.361111 112.666667
2 44 21 68 93 29 40 25 78 96 94 0.489583 276.125000
3 80 11 50 55 14 56 21 78 36 41 0.569444 140.083333
4 84 52 97 29 92 44 89 78 27 62 1.148148 427.111111
5 11 82 83 84 34 90 56 74 68 76 0.558824 254.823529
6 31 92 13 89 95 80 75 59 81 74 0.456790 202.814815
7 14 25 47 98 67 18 78 10 64 40 0.312500 75.000000
8 52 75 60 44 36 18 33 79 65 18 0.138462 14.953846
9 19 69 12 61 60 92 61 21 43 72 0.837209 361.674419
If I only return 1 output, and create 1 column it works fine. How do I output 2 items (tuple or list of 2 items) and create 2 new columns using this?
CodePudding user response:
Since you need to loop through multiple columns by rows, a better / more efficient approach is to use zip
for loop to create a list of tuples which you can directly assign to a list of columns to the original data frame:
df[['c1', 'c2']] = [some_func(x, y) for x, y in zip(df.x9, df.x10)]
df
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 c1 c2
0 20 67 76 95 28 60 82 81 90 93 0.516667 288.300000
1 94 30 97 82 51 10 54 43 36 41 0.569444 140.083333
2 50 57 85 48 67 65 41 91 48 46 0.479167 132.250000
3 61 36 44 59 18 71 42 18 56 77 0.687500 317.625000
4 11 85 34 66 45 55 21 42 77 27 0.175325 28.402597
5 20 19 86 46 97 21 84 12 86 98 0.569767 335.023256
6 24 87 65 62 22 43 26 80 15 64 2.133333 819.200000
7 38 15 23 22 89 89 19 32 21 33 0.785714 155.571429
8 82 88 64 89 92 88 15 30 85 83 0.488235 243.141176
9 96 24 91 70 96 54 57 81 59 32 0.271186 52.067797
CodePudding user response:
There must be several ways.
I tried to do what you want with only a few changes.
- No
lambda
usage because you already defined your own function. result_type="expand"
in theapply()
so that the return value will be split to multiple columns.Dataframe
rather than twoSeries
so that the return values can be split into the dataframe (consist of twoSeries
).
import pandas as pd
df = pd.DataFrame({
'inputcol1': [1, 2, 3, 4],
'inputcol2': [1, 2, 3, 4]
})
def some_func(x):
output1 = x['inputcol1'] x['inputcol2']
output2 = x['inputcol2'] - x['inputcol2']
return output1, output2
print(df)
# inputcol1 inputcol2
#0 1 1
#1 2 2
#2 3 3
#3 4 4
df[['outputcol1', 'outputcol2']] = df[['inputcol1', 'inputcol2']].apply(some_func, axis=1, result_type="expand")
print(df)
# inputcol1 inputcol2 outputcol1 outputcol2
#0 1 1 2 0
#1 2 2 4 0
#2 3 3 6 0
#3 4 4 8 0
CodePudding user response:
First generate a temporary DataFrame:
wrk = df.apply(lambda row: some_func(row['x9'],row['x10']), axis=1)\
.apply(pd.Series, index=['c1', 'c2'])
Details:
df.apply(…)
- your code - creates from each row tuple and they are collected in a Series.apply(pd.Series, index=['c1', 'c2'])
- from each element of the Series generated so far (a tuple) create a Series with index containing new column names. These Series objects are then collected into a DataFrame, where source index values are now column names.
Print wrk to see the result generated so far.
Then join it to df and save the result back under df:
df = df.join(wrk)