How would I make this function create a new column instead of replacing the contents of exsiting row-CodePudding

First of all, thank you for all the feedback so far, it is much appreciated. I have included the rest of my code from the assignment and added some details to give a better idea of what I am trying to achieve:

So I have this python code I am trying to modify to create a new column for each existing column with percentage change, instead of overwriting the existing values. How could one do this effectively?

I should add, that this is using 1 min trading data for a few select cryptocurrencies with [ time, low, high, open, close ] as the row values.

when I tried adding a new colunm like so:

 df[col] = df[col 'pctchg'].pct_change()  #calculate pct change

I get an error message. Am I missing some obvius syntax issue?

import pandas as pd
    from collections import deque
    import random
    import numpy as np
    import time
    from sklearn import preprocessing
    
    pd.set_option('display.max_rows', 500) #increase the display size for dataframes
    pd.set_option('display.max_columns', 500)
    pd.set_option('display.width', 150)
    
    def classify(current, future):
        if float(future) > float(current):  # if the future price is higher than the current, that's a buy, or a 1
            return 1
        else:  # else a sell
            return 0
    
    
    def preprocess_df(df):
     #   df = df.drop("future", 1)  # don't need this anymore.
    
         for col in df.columns:  # go through all of the columns
            if col != "target":  # do not adjust the target
                df[col] = df[col].pct_change()  #calculate pct change
                df.dropna(inplace=True)  # remove the nas created by pct_change
                df[col] = preprocessing.scale(df[col].values)  # scale the data
    
    main_df = pd.DataFrame() # begin empty
    
    ratios = ["BTC-USD", "LTC-USD", "BCH-USD", "ETH-USD"]  # the 4 ratios to consider
    for ratio in ratios:  # begin iteration
    
        ratio = ratio.split('.csv')[0]  # split away the ticker from the file-name
        print(ratio)
        dataset = f'crypto_data/{ratio}.csv'  # get the full path to the file
        df = pd.read_csv(dataset, names=['time', 'low', 'high', 'open', 'close', 'volume'])  # read in specific file
    
        # rename volume and close to include the ticker so we can still which close/volume is which:
        df.rename(columns={"close": f"{ratio}_close", "volume": f"{ratio}_volume"}, inplace=True)
    
        df.set_index("time", inplace=True)  # set time as index so we can join them on this shared time
        df = df[[f"{ratio}_close", f"{ratio}_volume"]]  # ignore the other columns besides price and volume
    
        if len(main_df)==0:  # if the dataframe is empty
            main_df = df  # then it's just the current df
        else:  # otherwise, join this data to the main one
            main_df = main_df.join(df)
    
    preprocess_df(main_df)
    
    print(main_df)

When I run the code as it I get the following output:

Dataframe output

How would I create the same dataframe, but retain my original values and create new columns with the percentage change?

CodePudding user response：

Sorry for posting but i cannot comment.

You can try this:

# create a new list of columns out of the columns you wise to modify
new_cols = [col '_pct' for col in df.drop(columns=['target']).columns]
# then calculate the pct change on the desired columns and add them to the df
df[new_cols] = df.drop(columns=['target']).pct_change()

As @Tim above mentioned you can also try:

def preprocess_df(df):
    for col in df.columns:  # go through all of the columns
        if col != "target": # don't modify target
            df[col 'new'] = df[col].pct_change()  # <------
            df.dropna(inplace=True)

CodePudding user response：

I ended up using the following code, which achieved the goal I was after:

 for col in df.columns:  # go through all of the columns
    if col != "target":  # do not adjust the target
        df[col 'pctchg'] = df[col].pct_change()  #calculate pct change
        df.dropna(inplace=True)  # remove the nas created by pct_change
        df[col 'pctchg'] = preprocessing.scale(df[col 'pctchg'].values)  # scale the data