First of all, thank you for all the feedback so far, it is much appreciated. I have included the rest of my code from the assignment and added some details to give a better idea of what I am trying to achieve:
So I have this python code I am trying to modify to create a new column for each existing column with percentage change, instead of overwriting the existing values. How could one do this effectively?
I should add, that this is using 1 min trading data for a few select cryptocurrencies with [ time, low, high, open, close ] as the row values.
when I tried adding a new colunm like so:
df[col] = df[col 'pctchg'].pct_change() #calculate pct change
I get an error message. Am I missing some obvius syntax issue?
import pandas as pd
from collections import deque
import random
import numpy as np
import time
from sklearn import preprocessing
pd.set_option('display.max_rows', 500) #increase the display size for dataframes
pd.set_option('display.max_columns', 500)
pd.set_option('display.width', 150)
def classify(current, future):
if float(future) > float(current): # if the future price is higher than the current, that's a buy, or a 1
return 1
else: # else a sell
return 0
def preprocess_df(df):
# df = df.drop("future", 1) # don't need this anymore.
for col in df.columns: # go through all of the columns
if col != "target": # do not adjust the target
df[col] = df[col].pct_change() #calculate pct change
df.dropna(inplace=True) # remove the nas created by pct_change
df[col] = preprocessing.scale(df[col].values) # scale the data
main_df = pd.DataFrame() # begin empty
ratios = ["BTC-USD", "LTC-USD", "BCH-USD", "ETH-USD"] # the 4 ratios to consider
for ratio in ratios: # begin iteration
ratio = ratio.split('.csv')[0] # split away the ticker from the file-name
print(ratio)
dataset = f'crypto_data/{ratio}.csv' # get the full path to the file
df = pd.read_csv(dataset, names=['time', 'low', 'high', 'open', 'close', 'volume']) # read in specific file
# rename volume and close to include the ticker so we can still which close/volume is which:
df.rename(columns={"close": f"{ratio}_close", "volume": f"{ratio}_volume"}, inplace=True)
df.set_index("time", inplace=True) # set time as index so we can join them on this shared time
df = df[[f"{ratio}_close", f"{ratio}_volume"]] # ignore the other columns besides price and volume
if len(main_df)==0: # if the dataframe is empty
main_df = df # then it's just the current df
else: # otherwise, join this data to the main one
main_df = main_df.join(df)
preprocess_df(main_df)
print(main_df)
When I run the code as it I get the following output:
How would I create the same dataframe, but retain my original values and create new columns with the percentage change?
CodePudding user response:
Sorry for posting but i cannot comment.
You can try this:
# create a new list of columns out of the columns you wise to modify
new_cols = [col '_pct' for col in df.drop(columns=['target']).columns]
# then calculate the pct change on the desired columns and add them to the df
df[new_cols] = df.drop(columns=['target']).pct_change()
As @Tim above mentioned you can also try:
def preprocess_df(df):
for col in df.columns: # go through all of the columns
if col != "target": # don't modify target
df[col 'new'] = df[col].pct_change() # <------
df.dropna(inplace=True)
CodePudding user response:
I ended up using the following code, which achieved the goal I was after:
for col in df.columns: # go through all of the columns
if col != "target": # do not adjust the target
df[col 'pctchg'] = df[col].pct_change() #calculate pct change
df.dropna(inplace=True) # remove the nas created by pct_change
df[col 'pctchg'] = preprocessing.scale(df[col 'pctchg'].values) # scale the data