I'm learning numpy and have this csv file:
,August,September,October,November,December,January
Johney,84,81.3,82.8,80.1,77.4,75.2
Miki,79.6,75.2,75,74.3,72.8,71.4
Ali,67.5,66.5,65.3,65.9,65.6,64
Bob,110.7,108.2,104.1,101,98.3,95.5
That needs to change to a table that shows the weight change relative to the previous month like this:
[[ 0., -0.03214286, 0.01845018, -0.0326087 , -0.03370787, -0.02842377],
[ 0., -0.05527638, -0.00265957, -0.00933333, -0.02018843, -0.01923077],
[ 0., -0.01481481, -0.01804511, 0.00918836, -0.00455235, -0.02439024],
[ 0., -0.02258356, -0.03789279, -0.02977906, -0.02673267, -0.02848423]]
I had some other questions with this file and my code looks like this:
import numpy as np
def load_training_data(filename):
data = np.genfromtxt(filename, delimiter=',',skip_header=1)
data = data[:, 1:]
with open(filename,'r') as file:
header = ((file.readline()).rstrip()).split(',')[1:]
row = [row.split(',')[0] for row in file]
column_names = np.array(header)
row_names = np.array(row)
return data,column_names,row_names
def get_diff_data(data, column_names, row_names):
#find the diffrence between the months columns
d = np.diff(data)
#create a column of zeros
z = np.zeros((len(row_names),1))
#add the zero colmun to the matrix
t = np.hstack((z,d))
return t
I managed to calculate the first one:
def get_relative_diff_table(data, column_names, row_names):
dif = get_diff_data(data, column_names, row_names)
calc = (data[0:1,1] - data[0:1,0])/data[0:1,0]
but I struggle with applying this needed calculation to all the other ones except writing them one by one
CodePudding user response:
With data:
In [115]: data = np.genfromtxt(txt, delimiter=',', skip_header=1,)
In [116]: data
Out[116]:
array([[ nan, 84. , 81.3, 82.8, 80.1, 77.4, 75.2],
[ nan, 79.6, 75.2, 75. , 74.3, 72.8, 71.4],
[ nan, 67.5, 66.5, 65.3, 65.9, 65.6, 64. ],
[ nan, 110.7, 108.2, 104.1, 101. , 98.3, 95.5]])
The nan
are the strings column; which we don't need here (could also use usecols
to load just the number columns)
In [117]: data = data[:,1:]
In [118]: np.diff(data, 1)/data[:,:-1]
Out[118]:
array([[-0.03214286, 0.01845018, -0.0326087 , -0.03370787, -0.02842377],
[-0.05527638, -0.00265957, -0.00933333, -0.02018843, -0.01923077],
[-0.01481481, -0.01804511, 0.00918836, -0.00455235, -0.02439024],
[-0.02258356, -0.03789279, -0.02977906, -0.02673267, -0.02848423]])
You could get the first column as strings with:
In [121]: names = np.genfromtxt(txt, delimiter=',',usecols=[0], dtype=str,ski
...: p_header=1)
In [122]: names
Out[122]: array(['Johney', 'Miki', 'Ali', 'Bob'], dtype='<U6')
In [123]: