summing the values of a column-CodePudding

I have an input file where first column values are same and other column values are different. i want to sum the values of third column only(first row to first row...) and want to keep the added values just beside of common first column values.

input

> > > >
1.000 3 0.002
2.010 4 -0.001
3.020 5 -0.001
> > > >
1.000 5 0.003
2.010 6 0.005
3.020 6 0.002

expected output

1.000 0.005
2.010 0.004
3.020 0.001

My script

import numpy as np
data=np.load("input.txt", allow_pickle=True)
summ=np.sum(data[:,2])

Error: Failed to interpret file 'input' as a pickle. Additionally any numpy based solution will be highly appreciated.Thanks.

CodePudding user response：

IIUC, you can read only the the first and last columns and reshape the array into groups; then sum:

data = np.loadtxt("input.txt", comments='>', usecols=(0,-1))
length = len(data) // len(np.unique(data[:, 0]))
out = data.reshape(length, -1, 2).sum(axis=0) / [length, 1]

Output:

array([[1.00e 00, 5.00e-03],
       [2.01e 00, 4.00e-03],
       [3.02e 00, 1.00e-03]])

Intermediate step:

>>> data.reshape(length, -1, 2)
array([[[ 1.00e 00,  2.00e-03],
        [ 2.01e 00, -1.00e-03],
        [ 3.02e 00, -1.00e-03]],

       [[ 1.00e 00,  3.00e-03],
        [ 2.01e 00,  5.00e-03],
        [ 3.02e 00,  2.00e-03]]])

That said, the whole thing could be a one-liner in pandas:

out = pd.read_csv('file.txt', sep=' ').pipe(lambda x: x[x.ne('>')].dropna(axis=1, how='all').dropna()).astype('float').groupby('>').sum().reset_index().iloc[:, [0,-1]].to_numpy()

CodePudding user response：

(pd.read_table('input.txt', comment = '>', header = None, sep = ' ').
   groupby(0)[2].sum(1).reset_index().values)

array([[1.00e 00, 5.00e-03],
       [2.01e 00, 4.00e-03],
       [3.02e 00, 1.00e-03]])