Home > Mobile >  Python: How to parse variables from several pandas dataframes?
Python: How to parse variables from several pandas dataframes?

Time:02-14

I want to extract the x and y variables from several pandas dataframes (before proceeding to next steps). I initialize the tab-delimited .txt file, before I extract the information. Error raised is ValueError: too many values to unpack (expected 2).

import pandas as pd

class DataProcessing:

    def __init__(self, data):
        self.df = pd.read_csv(data, sep="\t")
        X, y = self.df.iloc[1:, 1:]
        return X, y

dp_linear_cna = DataProcessing("data_linear_cna.txt")
dp_mrna_seq_v2_rsem = DataProcessing("data_mrna_seq_v2_rsem.txt")

dp_linear_cna.extract_info()
dp_mrna_seq_v2_rsem.extract_info()

Traceback:

ValueError: too many values to unpack (expected 2)

CodePudding user response:

  1. The sep="/t" is supposed to be sep="\t".
  2. Never iterate over rows/columns, select data using index. e.g. selecting a column: df['some_column_name']

CodePudding user response:

You coding style is quite bad. First of all, don't return anything in init. It's a constructor. Make another function instead.

class DataProcessing:
    def __init__(self, data):
        self.df = pd.read_csv(data, sep="\t")

    def split_data(self):
        X = self.df.iloc[:, :-1]
        y = self.df.iloc[:, -1]
        return X, y

Calling your DataProcessing like this:

def main():
    dp = DataProcessing('data_linear_cna.txt')
    X, y = dp.split_data()
    print(X)
    print()
    print(y)

Main point here is selection over position via df.iloc[rows, columns]

X, y = self.df.iloc[1:, 1:]

this is not a valid statement. pandas.DataFrame.iloc return another pandas.DataFrame. Not a tuple. You can't do tuple unpacking.

Indexing both axes

You can mix the indexer types for the index and columns. Use : to select the entire axis.

  • Related