I have an input
file where first column values are same and other column values are different.
i want to sum the values of third
column only(first row to first row...) and want to keep the added values just beside of common first column values.
input
> > > >
1.000 3 0.002
2.010 4 -0.001
3.020 5 -0.001
> > > >
1.000 5 0.003
2.010 6 0.005
3.020 6 0.002
expected output
1.000 0.005
2.010 0.004
3.020 0.001
My script
import numpy as np
data=np.load("input.txt", allow_pickle=True)
summ=np.sum(data[:,2])
Error: Failed to interpret file 'input' as a pickle. Additionally any numpy based solution will be highly appreciated.Thanks.
CodePudding user response:
IIUC, you can read only the the first and last columns and reshape
the array into groups; then sum
:
data = np.loadtxt("input.txt", comments='>', usecols=(0,-1))
length = len(data) // len(np.unique(data[:, 0]))
out = data.reshape(length, -1, 2).sum(axis=0) / [length, 1]
Output:
array([[1.00e 00, 5.00e-03],
[2.01e 00, 4.00e-03],
[3.02e 00, 1.00e-03]])
Intermediate step:
>>> data.reshape(length, -1, 2)
array([[[ 1.00e 00, 2.00e-03],
[ 2.01e 00, -1.00e-03],
[ 3.02e 00, -1.00e-03]],
[[ 1.00e 00, 3.00e-03],
[ 2.01e 00, 5.00e-03],
[ 3.02e 00, 2.00e-03]]])
That said, the whole thing could be a one-liner in pandas:
out = pd.read_csv('file.txt', sep=' ').pipe(lambda x: x[x.ne('>')].dropna(axis=1, how='all').dropna()).astype('float').groupby('>').sum().reset_index().iloc[:, [0,-1]].to_numpy()
CodePudding user response:
(pd.read_table('input.txt', comment = '>', header = None, sep = ' ').
groupby(0)[2].sum(1).reset_index().values)
array([[1.00e 00, 5.00e-03],
[2.01e 00, 4.00e-03],
[3.02e 00, 1.00e-03]])