Home > OS >  How to add two dataframes with tuples
How to add two dataframes with tuples

Time:10-13

I am extracting data from a Databank and storing it in a dictionary. Then I convert this dictionary into a DataFrame. I am left with two DataFrames, which I'd like to add but the data is stored in tuples.

Both DataFrames are really big (66 rows x 8497 columns) but look something like this:

df1

0 1 2 3
P00001 (-17.5,) (-16.2,) (-15.9,) (-14.3,)
P00002 (-11.3,) (-13.1,) (-13.8,) (-10.4,)
P00003 (-17.0,) (-18.0,) (-17.6,) (-13.6,)
P00004 None None None None

df2

0 1 2 3
P00001 (3.3,) (3.8,) (5.6,) (7.5,)
P00002 (4.2,) (2.3,) (1.5,) (5.3,)
P00003 (0.0,) (0.0,) (0.0,) (0.0,)
P00004 (2.8,) (3.7,) (4.8,) (3.9,)

I'd like to add for example the value (P00001,0) in df1 = -17.5 with the value (P00001,0) in df2 = 3.3 and so on, so that it looks like this:

0 1 2 3
P00001 -14.2 -12.4 -10.3 -6.8
P00002 -7.1 -10.8 -12.3 -5.1
P00003 -17.0 -18.0 -17.6 -13.6
P00004 2.8 3.7 4.8 3.9

I have tried:

df_add = df1.add(df2, fill_value=0)

tuple(np.add(df1,df2))

tuple(map(sum,zip(df1,df2)))

I also tried turning the dataframe into int, but that didn't work either.

df1_new = df1[:].astype(int)

df_new = df1.convert_dtypes(int)

df_new = df1.apply(pd.to_numeric, errors='ignore')

I am a beginner, please let me know if you need more information.

CodePudding user response:

Transforming the tuples to integers is indeed an option:

import numpy as np

def tuple2int(x):
    try:
        return x[0]
   except:
       return 0

df1[:] = np.vectorize(tuple2int)(df1)
df2[:] = np.vectorize(tuple2int)(df2)

Then add the data frames as you suggested:

df_add = df1.add(df2, fill_value=0)
  • Related