I want to call df["ID"]
in the dataset_csv
function and then call the dataset_csv
function using dataset = RawToCSV.dataset_csv(input_path)
. df["ID"]
was defined in the raw_file_processing
function.
My code raised TypeError: __init__() missing 1 required positional argument: 'df'
error.
import re
import pandas as pd
import os
import numpy as np
input_path = "../input_data"
class RawToCSV:
def __init__(self, path_, df):
self.measurement_df = None
self.cls = None
self.path_ = path_
self.df = df
def raw_file_processing(self, path_):
# Open all the subfolders within path
for root, dirs, files in os.walk(path_):
for file in files:
with open(os.path.join(root, file), "r") as data:
self.df = pd.read_csv(data)
# 'Class' refers to the independent variable
cls_info = self.df.iloc[2]
# Dummy-code the classes
cls = pd.get_dummies(cls_info)
# Create the ID series by concatenating columns 1-3
self.df = self.df.assign(
ID=self.df[['cell_id:cell_id', 'region:region', 'tile_num:tile_num']].apply(
lambda row: '_'.join([str(each) for each in row]), axis=1))
self.df = self.df.drop(columns=['cell_id:cell_id', 'region:region', 'tile_num:tile_num'])
# Obtain measurement info
# Normalize data against blank/empty columns
# log-transform the data
for col in self.df[9:]:
if re.findall(r"Blank|Empty", col):
background = col
else:
line = col.readline()
for dat in line:
norm_data = dat / background
self.measurement_df = np.log2(norm_data)
return self.df["ID"], cls, self.measurement_df
def dataset_csv(self):
"""Col 1: ID
Col 2: class
Col 3-n: measurements"""
ids = self.df["ID"]
id_col = ids.to_frame()
cls_col = self.cls.to_frame()
frames = [id_col, cls_col, self.measurement_df]
dataset_df = pd.concat(frames)
data_csv = dataset_df.to_csv("../input_data/dataset.csv")
return data_csv
raw = RawToCSV(input_path)
three_tuple = raw.raw_file_processing(input_path)
dataset = raw.data_csv()
Traceback:
> --------------------------------------------------------------------------- TypeError Traceback (most recent call
> last) /tmp/ipykernel_136/323215226.py in <module>
> ----> 1 raw = RawToCSV(input_path)
> 2 three_tuple = raw.raw_file_processing(input_path)
>
> TypeError: __init__() missing 1 required positional argument: 'df'
CodePudding user response:
In this part of code:
dataset = RawToCSV.dataset_csv(input_path)
You are using the class itself, however you should first instantiate from the class RawToCSV
, like this:
rawToCSV = RawTOCSV(input_path)
dataset = rawToCSV.data_csv()
But still you have another mistake ,too. In the constructor of the class , __init__
you've initiated the self.df
with self.df
, which the latter one hasn't been defined ,yet.
Therefore in this part of code, you'll get another error (AttributeError: 'RawToCSV' object has no attribute 'df'
):
def __init__(self, path_):
self.measurement_df = None
self.cls = None
self.path_ = path_
self.df = self.df # <-----
CodePudding user response:
On this line:
dataset = RawToCSV.dataset_csv(input_path)
you're calling dataset_csv
as if it were a static method (calling it on the class not an instance). You are passing in input_path
, which I assume is a string. Since you're calling the method as if it were static, it is not invisibly adding the actual self
value into the call (you have to have an object to even be sent as self
).
This means that your one parameter of dataset_csv
, which you named self
, is receiving the (string) value of input_path
.
The error message is telling you that the string input_path
has no member .df
because it doesn't.
With the way your class and its methods are currently set up, you'll need your entry point code at the bottom to be something like this:
raw = RawToCSV(input_path)
three_tuple = raw.raw_file_processing(input_path)
dataset = raw.dataset_csv()
Though, you may want to restructure your class and its methods