Home > Software design >  too many values to unpack (expected 2) when trying to create two columns in a dataframe
too many values to unpack (expected 2) when trying to create two columns in a dataframe

Time:09-16

Please consider the input/output below :

INPUT :

      Id   Status
0  Id001   online
1  Id002  running
2  Id002      off
3  Id003   online
4  Id003    valid
5  Id003  running
6  Id004      off
7  Id004      off

OUTPUT (expected) :

      Id   Status   Type  Values
0  Id001   online  green  yellow
1  Id002  running    NaN     NaN
2  Id002      off    red   white
3  Id003   online  green  yellow
4  Id003    valid    NaN     NaN
5  Id003  running    NaN     NaN
6  Id004      off    red   white
7  Id004      off    red   white

I need to create the two columns Type and Values throughout the function test_func.

import pandas as pd

df = pd.DataFrame({'Id': ['Id001', 'Id002', 'Id002', 'Id003', 'Id003', 'Id003', 'Id004', 'Id004'],
                   'Status': ['online', 'running', 'off', 'online', 'valid', 'running', 'off', 'off']})

def test_func(df):
    if df['Status']=='online':
        return ['green', 'yellow']
    elif df['Status']=='off':
        return ['red', 'white']
    
df['Type'], df['Values'] = df.apply(test_func, axis=1)

The code above throws an error :

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [70], in <cell line: 7>()
      4     elif df['Status']=='off':
      5         return ['red', 'white']
----> 7 df['Type'], df['Values'] = df.apply(test_func, axis=1)

ValueError: too many values to unpack (expected 2)

Considering the format of the real dataseet, unfortunately, I can't use np.where to create the two columns Type and Values.

Do you know how to fix the error, please ?

CodePudding user response:

You can try assigning the output of dp.apply to one column first and then split that into two.

df['Type_value'] = df.apply(test_func, axis=1) #assign to one column
df[['Type','Value']] = df['Type_value'].apply(pd.Series) #splitting type_value column into two columns 
df.drop(['Type_value'],axis=1,inplace=True) #dropping the type_value column

CodePudding user response:

You can use a np.select in combination with a split to get the desired results

import pandas as pd
import numpy as np

condition_list = [
    df['Status'] == 'online',
    df['Status'] == 'off'
]

choice_list = [
    'green, yellow',
    'red, white'
]

df['List'] = np.select(condition_list, choice_list, None)
df[['Type', 'Value']] = df['List'].str.split(',', 2, expand = True)
df.drop(columns = ['List'])

CodePudding user response:

different approach:

df.assign(Type=df.Status.map({'online':'green','off':'red'}),
          Values=df.Status.map({'online':'yellow','off':'white'}))

>>> out
'''
      Id   Status   Type  Values
0  Id001   online  green  yellow
1  Id002  running    NaN     NaN
2  Id002      off    red   white
3  Id003   online  green  yellow
4  Id003    valid    NaN     NaN
5  Id003  running    NaN     NaN
6  Id004      off    red   white
7  Id004      off    red   white
  • Related