I am extracting data from a Databank and storing it in a dictionary. Then I convert this dictionary into a DataFrame. I am left with two DataFrames, which I'd like to add but the data is stored in tuples.
Both DataFrames are really big (66 rows x 8497 columns) but look something like this:
df1
0 | 1 | 2 | 3 | |
---|---|---|---|---|
P00001 | (-17.5,) | (-16.2,) | (-15.9,) | (-14.3,) |
P00002 | (-11.3,) | (-13.1,) | (-13.8,) | (-10.4,) |
P00003 | (-17.0,) | (-18.0,) | (-17.6,) | (-13.6,) |
P00004 | None | None | None | None |
df2
0 | 1 | 2 | 3 | |
---|---|---|---|---|
P00001 | (3.3,) | (3.8,) | (5.6,) | (7.5,) |
P00002 | (4.2,) | (2.3,) | (1.5,) | (5.3,) |
P00003 | (0.0,) | (0.0,) | (0.0,) | (0.0,) |
P00004 | (2.8,) | (3.7,) | (4.8,) | (3.9,) |
I'd like to add for example the value (P00001,0) in df1 = -17.5 with the value (P00001,0) in df2 = 3.3 and so on, so that it looks like this:
0 | 1 | 2 | 3 | |
---|---|---|---|---|
P00001 | -14.2 | -12.4 | -10.3 | -6.8 |
P00002 | -7.1 | -10.8 | -12.3 | -5.1 |
P00003 | -17.0 | -18.0 | -17.6 | -13.6 |
P00004 | 2.8 | 3.7 | 4.8 | 3.9 |
I have tried:
df_add = df1.add(df2, fill_value=0)
tuple(np.add(df1,df2))
tuple(map(sum,zip(df1,df2)))
I also tried turning the dataframe into int, but that didn't work either.
df1_new = df1[:].astype(int)
df_new = df1.convert_dtypes(int)
df_new = df1.apply(pd.to_numeric, errors='ignore')
I am a beginner, please let me know if you need more information.
CodePudding user response:
Transforming the tuples to integers is indeed an option:
import numpy as np
def tuple2int(x):
try:
return x[0]
except:
return 0
df1[:] = np.vectorize(tuple2int)(df1)
df2[:] = np.vectorize(tuple2int)(df2)
Then add the data frames as you suggested:
df_add = df1.add(df2, fill_value=0)