I have the following table:
TABLE_NAME jace building equipment
0 R338_1_FAHU2_SUPAIR_TEMP NaN NaN NaN
1 R1001_1_R1005_1_FAHU_1_CO2_SEN2 NaN NaN NaN
I wrote this sub-function that analyses one column and (should) return three.
def fill(tablename='R338_1_FAHU2_SUPAIR_TEMP'):
jace,building,equipment=re.findall('(^[RP].*?_[1-9])_*(.*?)_(F.*)',tablename)[0]
if not len(building):
building=re.findall('(.*)_',jace)[0]
return jace,building,equipment
this function returns for the first row
('R338_1', 'R338', 'FAHU2_SUPAIR_TEMP')
I want to insert those into jace,building, and equipment columns above I tried this:
df[['jace','building','equipment']]=df['TABLE_NAME'].apply(lambda x: (fill(x)))
it gave me
ValueError: Must have equal len keys and value when setting with an iterable
I also tried axis=1
inside apply()
, which seems to contradict with lambda for some reason
Any ideas how to make it work?
(I can use fill(x)[0], fill(x)[1]
, which should solve the issue, but that feels like hard-coding)
CodePudding user response:
Your function should return a Series:
def fill(tablename='R338_1_FAHU2_SUPAIR_TEMP'):
jace,building,equipment=re.findall('(^[RP].*?_[1-9])_*(.*?)_(F.*)',tablename)[0]
if not len(building):
building=re.findall('(.*)_',jace)[0]
return pd.Series([jace,building,equipment])
df[['jace','building','equipment']] = df['TABLE_NAME'].apply(fill)
output:
TABLE_NAME jace building equipment
0 R338_1_FAHU2_SUPAIR_TEMP R338_1 R338 FAHU2_SUPAIR_TEMP
1 R1001_1_R1005_1_FAHU_1_CO2_SEN2 R1001_1 R1005_1 FAHU_1_CO2_SEN2
Now you could also use vectorial code for efficiency:
(df['TABLE_NAME']
.str.extract('(?P<jace>^[RP].*?_[1-9])_*(?P<building>.*?)_(?P<equipment>F.*)')
.assign(building=lambda d: d['building'].mask(d['building'].eq(''),
d['jace'].str.extract('(.*)_',
expand=False)))
)
output:
jace building equipment
0 R338_1 R338 FAHU2_SUPAIR_TEMP
1 R1001_1 R1005_1 FAHU_1_CO2_SEN2
CodePudding user response:
You can use result_type="expand"
Can you try the following:
def fill(tablename='R338_1_FAHU2_SUPAIR_TEMP'):
jace,building,equipment = re.findall(
'(^[RP].*?_[1-9])_*(.*?)_(F.*)',
tablename['TABLE_NAME'])[0]
if not len(building):
building=re.findall('(.*)_',jace)[0]
return jace, building, equipment
df[['jace','building','equipment']] = df[['TABLE_NAME']].apply(
fill, axis=1, result_type="expand")