Home > Back-end >  python pandas dataframe multiply columns matching index or row name
python pandas dataframe multiply columns matching index or row name

Time:07-17

I have two dataframes,

df1:

hash  a  b  c
ABC   1  2  3
def   5  3  4
Xyz   3  2 -1

df2:

hash  v
Xyz   3
def   5

I want to make

df:
hash  a  b  c
ABC   1  2  3 (= as is, because no matching 'ABC' in df2)
def  25 15 20 (= 5*5 3*5 4*5)
Xyz   9  6 -3 (= 3*3 2*3 -1*3)

as like above,

I want to make a dataframe with values of multiplying df1 and df2 according to their index (or first column name) matched. As df2 only has one column (v), all df1's columns except for the first one (index) should be affected.

Is there any neat Pythonic and Panda's way to achieve it?

df1.set_index(['hash']).mul(df2.set_index(['hash'])) or similar things seem not work..

CodePudding user response:

One approach:

df1 = df1.set_index("hash")
df2 = df2.set_index("hash")["v"]

res = df1.mul(df2, axis=0).combine_first(df1)
print(res)

Output

         a     b     c
hash                  
ABC    1.0   2.0   3.0
Xyz    9.0   6.0  -3.0
def   25.0  15.0  20.0

CodePudding user response:

One Method:

# We'll make this for convenience
cols = ['a', 'b', 'c']

# Merge the DataFrames, keeping everything from df
df = df1.merge(df2, 'left').fillna(1)

# We'll make the v column integers again since it's been filled.
df.v = df.v.astype(int)

# Broadcast the multiplication across axis 0
df[cols] = df[cols].mul(df.v, axis=0)

# Drop the no-longer needed column:
df = df.drop('v', axis=1)

print(df)

Output:

  hash   a   b   c
0  ABC   1   2   3
1  def  25  15  20
2  Xyz   9   6  -3

Alternative Method:

# Set indices
df1 = df1.set_index('hash')
df2 = df2.set_index('hash')

# Apply multiplication and fill values
df = (df1.mul(df2.v, axis=0)
        .fillna(df1)
        .astype(int)
        .reset_index())

# Output:

  hash   a   b   c
0  ABC   1   2   3
1  Xyz   9   6  -3
2  def  25  15  20

CodePudding user response:

The function you are looking for is actually multiply.

Here's how I have done it:

>>> df
  hash  a  b
0  ABC  1  2
1  DEF  5  3
2  XYZ  3 -1

>>> df2
  hash  v
0  XYZ  4
1  ABC  8

df = df.merge(df2, on='hash', how='left').fillna(1)
>>> df
  hash  a  b    v
0  ABC  1  2  8.0
1  DEF  5  3  1.0
2  XYZ  3 -1  4.0


df[['a','b']] = df[['a','b']].multiply(df['v'], axis='index')

>>>df
  hash     a     b    v
0  ABC   8.0  16.0  8.0
1  DEF   5.0   3.0  1.0
2  XYZ  12.0  -4.0  4.0


You can actually drop v at the end if you don't need it.

  • Related