Home > Back-end >  Numpy applying a calculation to each element using the previous one
Numpy applying a calculation to each element using the previous one

Time:04-09

I'm learning numpy and have this csv file:

,August,September,October,November,December,January
Johney,84,81.3,82.8,80.1,77.4,75.2
Miki,79.6,75.2,75,74.3,72.8,71.4
Ali,67.5,66.5,65.3,65.9,65.6,64
Bob,110.7,108.2,104.1,101,98.3,95.5

That needs to change to a table that shows the weight change relative to the previous month like this:

[[ 0., -0.03214286, 0.01845018, -0.0326087 , -0.03370787, -0.02842377], 
[ 0., -0.05527638, -0.00265957, -0.00933333, -0.02018843, -0.01923077], 
[ 0., -0.01481481, -0.01804511, 0.00918836, -0.00455235, -0.02439024], 
[ 0., -0.02258356, -0.03789279, -0.02977906, -0.02673267, -0.02848423]]

I had some other questions with this file and my code looks like this:

import numpy as np

def load_training_data(filename):
    data = np.genfromtxt(filename, delimiter=',',skip_header=1)
    data = data[:, 1:]
    with open(filename,'r') as file:
        header = ((file.readline()).rstrip()).split(',')[1:]
        row = [row.split(',')[0] for row in file]
    column_names = np.array(header)
    row_names = np.array(row)

    return data,column_names,row_names
def get_diff_data(data, column_names, row_names):
    #find the diffrence between the months columns
    d = np.diff(data)
    #create a column of zeros
    z = np.zeros((len(row_names),1))
    #add the zero colmun to the matrix
    t = np.hstack((z,d))
    return t

I managed to calculate the first one:

def get_relative_diff_table(data, column_names, row_names):
    dif = get_diff_data(data, column_names, row_names)
    calc = (data[0:1,1] - data[0:1,0])/data[0:1,0]

but I struggle with applying this needed calculation to all the other ones except writing them one by one

CodePudding user response:

With data:

In [115]: data = np.genfromtxt(txt, delimiter=',', skip_header=1,)
In [116]: data
Out[116]: 
array([[  nan,  84. ,  81.3,  82.8,  80.1,  77.4,  75.2],
       [  nan,  79.6,  75.2,  75. ,  74.3,  72.8,  71.4],
       [  nan,  67.5,  66.5,  65.3,  65.9,  65.6,  64. ],
       [  nan, 110.7, 108.2, 104.1, 101. ,  98.3,  95.5]])

The nan are the strings column; which we don't need here (could also use usecols to load just the number columns)

In [117]: data = data[:,1:]
In [118]: np.diff(data, 1)/data[:,:-1]
Out[118]: 
array([[-0.03214286,  0.01845018, -0.0326087 , -0.03370787, -0.02842377],
       [-0.05527638, -0.00265957, -0.00933333, -0.02018843, -0.01923077],
       [-0.01481481, -0.01804511,  0.00918836, -0.00455235, -0.02439024],
       [-0.02258356, -0.03789279, -0.02977906, -0.02673267, -0.02848423]])

You could get the first column as strings with:

In [121]: names = np.genfromtxt(txt, delimiter=',',usecols=[0], dtype=str,ski
     ...: p_header=1)
In [122]: names
Out[122]: array(['Johney', 'Miki', 'Ali', 'Bob'], dtype='<U6')
In [123]: 
  • Related