TypeError: __init__() missing 1 required positional argument: 'df'-CodePudding

I want to call df["ID"] in the dataset_csv function and then call the dataset_csv function using dataset = RawToCSV.dataset_csv(input_path). df["ID"] was defined in the raw_file_processing function. My code raised TypeError: __init__() missing 1 required positional argument: 'df' error.

import re
import pandas as pd
import os
import numpy as np

input_path = "../input_data"


class RawToCSV:

    def __init__(self, path_, df):
        self.measurement_df = None
        self.cls = None
        self.path_ = path_
        self.df = df

    def raw_file_processing(self, path_):

        # Open all the subfolders within path
        for root, dirs, files in os.walk(path_):
            for file in files:
                with open(os.path.join(root, file), "r") as data:
                    self.df = pd.read_csv(data)

                    # 'Class' refers to the independent variable
                    cls_info = self.df.iloc[2]

                    # Dummy-code the classes
                    cls = pd.get_dummies(cls_info)

                    # Create the ID series by concatenating columns 1-3
                    self.df = self.df.assign(
                        ID=self.df[['cell_id:cell_id', 'region:region', 'tile_num:tile_num']].apply(
                            lambda row: '_'.join([str(each) for each in row]), axis=1))
                    self.df = self.df.drop(columns=['cell_id:cell_id', 'region:region', 'tile_num:tile_num'])

                    # Obtain measurement info
                    # Normalize data against blank/empty columns
                    # log-transform the data
                    for col in self.df[9:]:
                        if re.findall(r"Blank|Empty", col):
                            background = col
                        else:
                            line = col.readline()
                            for dat in line:
                                norm_data = dat / background
                                self.measurement_df = np.log2(norm_data)

        return self.df["ID"], cls, self.measurement_df

    def dataset_csv(self):
        """Col 1: ID
        Col 2: class
        Col 3-n: measurements"""
        ids = self.df["ID"]
        id_col = ids.to_frame()

        cls_col = self.cls.to_frame()
        frames = [id_col, cls_col, self.measurement_df]
        dataset_df = pd.concat(frames)
        data_csv = dataset_df.to_csv("../input_data/dataset.csv")

        return data_csv

raw = RawToCSV(input_path)
three_tuple = raw.raw_file_processing(input_path)
dataset = raw.data_csv()

Traceback:

> --------------------------------------------------------------------------- TypeError                                 Traceback (most recent call
> last) /tmp/ipykernel_136/323215226.py in <module>
> ----> 1 raw = RawToCSV(input_path)
>       2 three_tuple = raw.raw_file_processing(input_path)
> 
> TypeError: __init__() missing 1 required positional argument: 'df'

CodePudding user response：

In this part of code:

dataset = RawToCSV.dataset_csv(input_path)

You are using the class itself, however you should first instantiate from the class RawToCSV, like this:

rawToCSV = RawTOCSV(input_path)
dataset = rawToCSV.data_csv()

But still you have another mistake ,too. In the constructor of the class , __init__ you've initiated the self.df with self.df, which the latter one hasn't been defined ,yet.
Therefore in this part of code, you'll get another error (AttributeError: 'RawToCSV' object has no attribute 'df'):

def __init__(self, path_):
        self.measurement_df = None
        self.cls = None
        self.path_ = path_
        self.df = self.df     #  <-----

CodePudding user response：

On this line:

dataset = RawToCSV.dataset_csv(input_path)

you're calling dataset_csv as if it were a static method (calling it on the class not an instance). You are passing in input_path, which I assume is a string. Since you're calling the method as if it were static, it is not invisibly adding the actual self value into the call (you have to have an object to even be sent as self).

This means that your one parameter of dataset_csv, which you named self, is receiving the (string) value of input_path.

The error message is telling you that the string input_path has no member .df because it doesn't.

With the way your class and its methods are currently set up, you'll need your entry point code at the bottom to be something like this:

raw = RawToCSV(input_path)
three_tuple = raw.raw_file_processing(input_path)
dataset = raw.dataset_csv()

Though, you may want to restructure your class and its methods