Home > other >  Using lambda to return more than one column
Using lambda to return more than one column

Time:08-18

I have the following table:

                         TABLE_NAME  jace  building    equipment
0  R338_1_FAHU2_SUPAIR_TEMP          NaN       NaN        NaN
1  R1001_1_R1005_1_FAHU_1_CO2_SEN2   NaN       NaN        NaN

I wrote this sub-function that analyses one column and (should) return three.

 def fill(tablename='R338_1_FAHU2_SUPAIR_TEMP'):
    jace,building,equipment=re.findall('(^[RP].*?_[1-9])_*(.*?)_(F.*)',tablename)[0]
    if not len(building):
        building=re.findall('(.*)_',jace)[0]
    return jace,building,equipment

this function returns for the first row

('R338_1', 'R338', 'FAHU2_SUPAIR_TEMP')

I want to insert those into jace,building, and equipment columns above I tried this:

df[['jace','building','equipment']]=df['TABLE_NAME'].apply(lambda x: (fill(x)))

it gave me

ValueError: Must have equal len keys and value when setting with an iterable

I also tried axis=1 inside apply(), which seems to contradict with lambda for some reason

Any ideas how to make it work? (I can use fill(x)[0], fill(x)[1], which should solve the issue, but that feels like hard-coding)

CodePudding user response:

Your function should return a Series:

def fill(tablename='R338_1_FAHU2_SUPAIR_TEMP'):
    jace,building,equipment=re.findall('(^[RP].*?_[1-9])_*(.*?)_(F.*)',tablename)[0]
    if not len(building):
        building=re.findall('(.*)_',jace)[0]
    return pd.Series([jace,building,equipment])

df[['jace','building','equipment']] = df['TABLE_NAME'].apply(fill)

output:

                        TABLE_NAME     jace building          equipment
0         R338_1_FAHU2_SUPAIR_TEMP   R338_1     R338  FAHU2_SUPAIR_TEMP
1  R1001_1_R1005_1_FAHU_1_CO2_SEN2  R1001_1  R1005_1    FAHU_1_CO2_SEN2

Now you could also use vectorial code for efficiency:

(df['TABLE_NAME']
 .str.extract('(?P<jace>^[RP].*?_[1-9])_*(?P<building>.*?)_(?P<equipment>F.*)')
 .assign(building=lambda d: d['building'].mask(d['building'].eq(''),
                                               d['jace'].str.extract('(.*)_',
                                                                   expand=False)))
)

output:

      jace building          equipment
0   R338_1     R338  FAHU2_SUPAIR_TEMP
1  R1001_1  R1005_1    FAHU_1_CO2_SEN2

CodePudding user response:

You can use result_type="expand"

Can you try the following:

def fill(tablename='R338_1_FAHU2_SUPAIR_TEMP'):
    jace,building,equipment = re.findall(
        '(^[RP].*?_[1-9])_*(.*?)_(F.*)',
        tablename['TABLE_NAME'])[0]
    if not len(building):
        building=re.findall('(.*)_',jace)[0]
    return jace, building, equipment

df[['jace','building','equipment']] = df[['TABLE_NAME']].apply(
    fill, axis=1, result_type="expand")
  • Related