Getting "AttributeError: Can only use .str accessor with string values" while transposing-CodePudding

I have one source file with structure as below:

ID|RcrdId|SrcId|Name|Address
row1|r1|src1#src2|val1#val2|val4#val5
row2|r2|src2#src1|val11#val12|val14#val15
row3|r3|src1|val44|val23

I need to include values in Name and Address fields only for Src2 value which is present in SrcID column, in case there is no value for SRC2, then null values should be populated in the Name and address field like below(RecordID=r3):

ID|RcrdId|SrcId|Name|Address
row1|r1|src2|val2|val5
row2|r2|src2|val11|val14
row3|r3|||

I have this sample code, but I am not able to handle the record r3 where src2 is not present and I am getting following error : AttributeError: Can only use .str accessor with string values!

from itertools import zip_longest

cols = ['SrcId', 'Name', 'Address']

df[cols] = (df[cols].fillna(#)
             .apply(lambda c: c.str.split('#'))
             .apply(lambda r: next(filter(lambda x: x[0]=='src2',
                                          zip_longest(*r)), float('nan')),
                            axis=1, result_type='expand')
            )

CodePudding user response：

Split your values to get a list then explode it into multiple rows. Filter out your rows and fill missing values with original ones:

cols = ['SrcId', 'Name', 'Address']

df[cols] = df[cols].apply(lambda x: x.str.split('#')).explode(cols) \
                   .query("SrcId == 'src2'").reindex(df.index).fillna(df[cols])

print(df)

# Output
     ID RcrdId SrcId   Name Address
0  row1     r1  src2   val2    val5
1  row2     r2  src2  val11   val14
2  row3     r3  src1  val44   val23

Step-by-step:

>>> out = df[cols].apply(lambda x: x.str.split('#'))
          SrcId            Name         Address
0  [src1, src2]    [val1, val2]    [val4, val5]
1  [src2, src1]  [val11, val12]  [val14, val15]
2        [src1]         [val44]         [val23]

>>> out = out.explode(cols)
  SrcId   Name Address
0  src1   val1    val4
0  src2   val2    val5
1  src2  val11   val14
1  src1  val12   val15
2  src1  val44   val23

>>> out = out.query("SrcId == 'src2'")
  SrcId   Name Address
0  src2   val2    val5
1  src2  val11   val14

>>> out = out.reindex(df.index)
  SrcId   Name Address
0  src2   val2    val5
1  src2  val11   val14
2   NaN    NaN     NaN

>>> out = out.fillna(df[cols])
  SrcId   Name Address
0  src2   val2    val5
1  src2  val11   val14
2  src1  val44   val23