Home > Software design >  Most efficient way to create dataframe based on another one
Most efficient way to create dataframe based on another one

Time:03-04

I have dataframe like this

A     Type    B    C    D 
Train   X     23   230  22
Car     Y     0    2    500
Judge   Z     222  1    600

Is it possible to create a new DF based on the values in the row?

I have the following function:

def quant(x):
    if x>0:
        return 1
    else:
        return 0

Which I then want to apply on some columns of the df

df.apply(lambda row: quant(row[['B','C', 'D']]), axis=1, result_type='expand')

To create new columns with the mapped values based on the function

A      Type  B_mapped C_mapped D_mapped
Train    X    1    1     1
Car      Y    0    1     1
Judge    Z    1    1     1

However my code returns the following error:

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

What is the most efficient way to kind of map columns in this way?

CodePudding user response:

You could do:

df.set_index(['A', 'Type']).applymap(quant).add_suffix('_mapped').reset_index()

or in this particular case, no need for a custom function:

df.set_index(['A', 'Type']).gt(0).astype(int).add_suffix('_mapped').reset_index()

output:

       A Type  B_mapped  C_mapped  D_mapped
0  Train    X         1         1         1
1    Car    Y         0         1         1
2  Judge    Z         1         1         1

Other approach using join:

cols = ['A', 'Type']

df[cols].join(df.drop(cols, axis=1).applymap(quant).add_suffix('_mapped'))

CodePudding user response:

You can use clip:

cols = ['B', 'C', 'D']
df = df.drop(columns=cols).join(df[cols].clip(upper=1).add_suffix('_mapped'))
print(df)

# Output
       A Type  B_mapped  C_mapped  D_mapped
0  Train    X         1         1         1
1    Car    Y         0         1         1
2  Judge    Z         1         1         1
  • Related