Home > OS >  Calling class to create new columns for dataframe
Calling class to create new columns for dataframe

Time:10-12

I wrote functions to manipulate url string in my dataframe and create new columns based on the functions outputs.

I define my class as:

class URL(object):
    def __init__(self, url):
        self.url = url
        self.domain = url.split('//')[-1].split('/')[0]
        self.response = get(self.url)
        self.pq = PyQuery(self.response.text)

    def entropy(self):
        string = self.url.strip()
        prob = [float(string.count(c)) / len(string) for c in dict.fromkeys(list(string))]
        entropy = sum([(p * math.log(p) / math.log(2.0)) for p in prob])
        return entropy

    def bodyLength(self):
        if self.pq is not None:
           return len(self.pq('html').text())
        else:
           return 0
    def run(self,df):
        df['entropy'] = np.vectorize(self.entropy)(df['url_without_parameters'])
        return df

But my brain has stopped and I couldnt figure out how to call my class and create new columns.

CodePudding user response:

If I understood correctly: first create a column of URL instances from the 'url_without_parameters' column, then create a second column by calling the entropy method for each instance. Both actions can be done with the apply method:

urls = df['url_without_parameters'].apply(URL)
df['entropy'] = urls.apply(lambda url: url.entropy())

Or in a single line:

df['entropy'] = df['url_without_parameters'].apply(lambda url_string: URL(url_string).entropy())
  • Related