Home > Net >  concatenated dataframe sum of a column, or difference between two columns in the same dataframe
concatenated dataframe sum of a column, or difference between two columns in the same dataframe

Time:07-28

i have created one concatenated dataframe obtained from 2 different dataframe. the layout is ok of the concat df but when i try to make the sum of one column i obtain one errore. below the code using for the sum column

TotalVolumeAsk = concatenated_df['VolumeAsk'].sum()
print ("Column Volume Ask sum:",Total)

the output is

Column Volume Ask sum: 0.010000000.023380000.009390000.061090000.004690000.010000000.011250000.070510000.004000000.041750000.007460000.000530000.004000000.001060000.008000000.052200000.010000000.004000000.001000000.020000000.069790002.390000000.00401000

i think one only value for the column...how can i solve this problem?

also if i use the code below to calculate the difference between two columns i have this error.

code use for the difference between 2 column:

concatenated_df['spread']=concatenated_df['b']-concatenated_df['c']

error:

TypeError: unsupported operand type(s) for -: 'str' and 'str'

thank you for the help.

CodePudding user response:

I created my own dataframe, if you remove the line where the columns are converted to float, then the same errors will occur. import pandas as pd

df = pd.DataFrame({'bid':['1.2', '1.4', '1.5'], 'ask':['1.5', '1.7', '2.1']})
df[['bid', 'ask']] = df[['bid', 'ask']].astype(float)#If you remove this line, then the same errors will occur
df['spred'] = df['ask'] - df['bid']

TotalAsk = df['ask'].sum()

print ("Column Ask sum:", TotalAsk)
print(df)

Output

Column Ask sum: 5.300000000000001

   bid  ask  spred
0  1.2  1.5   -0.3
1  1.4  1.7   -0.3
2  1.5  2.1   -0.6

If you print the column types before the conversion:

df = pd.DataFrame({'bid':['1.2', '1.4', '1.5'], 'ask':['1.5', '1.7', '2.1']})
print(df.dtypes)

Output

bid    object
ask    object

after:

df[['bid', 'ask']] = df[['bid', 'ask']].astype(float)#If you remove this line, then the same errors will occur
print(df.dtypes)

Output

bid    float64
ask    float64

That's why I was asking about the data type of your columns. And you have it object, that is, not numeric. The values in them are not numbers.

  • Related