I want to extract the x
and y
variables from several pandas dataframes (before proceeding to next steps). I initialize the tab-delimited .txt
file, before I extract the information.
Error raised is ValueError: too many values to unpack (expected 2)
.
import pandas as pd
class DataProcessing:
def __init__(self, data):
self.df = pd.read_csv(data, sep="\t")
X, y = self.df.iloc[1:, 1:]
return X, y
dp_linear_cna = DataProcessing("data_linear_cna.txt")
dp_mrna_seq_v2_rsem = DataProcessing("data_mrna_seq_v2_rsem.txt")
dp_linear_cna.extract_info()
dp_mrna_seq_v2_rsem.extract_info()
Traceback:
ValueError: too many values to unpack (expected 2)
CodePudding user response:
- The
sep="/t"
is supposed to besep="\t"
. - Never iterate over rows/columns, select data using index.
e.g. selecting a column:
df['some_column_name']
CodePudding user response:
You coding style is quite bad. First of all, don't return anything in init. It's a constructor. Make another function instead.
class DataProcessing:
def __init__(self, data):
self.df = pd.read_csv(data, sep="\t")
def split_data(self):
X = self.df.iloc[:, :-1]
y = self.df.iloc[:, -1]
return X, y
Calling your DataProcessing like this:
def main():
dp = DataProcessing('data_linear_cna.txt')
X, y = dp.split_data()
print(X)
print()
print(y)
Main point here is selection over position via df.iloc[rows, columns]
X, y = self.df.iloc[1:, 1:]
this is not a valid statement. pandas.DataFrame.iloc return another pandas.DataFrame. Not a tuple. You can't do tuple unpacking.
Indexing both axes
You can mix the indexer types for the index and columns. Use : to select the entire axis.