Home > OS >  Counting symbols (pandas dataframe)
Counting symbols (pandas dataframe)

Time:03-28

I am not able to spot the issue in the code below:

class features_extract:
    
   def countdots(name):  
        return name.count('.')

   def countminus(name):  
        return name.count('-')

def get_features(name, label): 
    
    result = []
    name = str(name)
    result.append(name)

    result.append(features_extract.countdots(name))
    result.append(features_extract.countminus(name))

    return result

featureSet = pd.DataFrame(columns=('Name','number_dots','number_minus'))


for i in range(len(result)):
    features = get_features(result["Name"].loc[i], result["Label"].loc[i])    
    featureSet.loc[i] = features      

What the code above should do is to count the number of . and - in a name. The original dataset is

Name     Label   (other columns not relevant for this task)
read.req 1
re-qwf.as.fda 1
error    0
rqsfa.asa.as. 0

The code above is currently returning only 0s. Expected output:

Name     Label   (other columns not relevant for this task) Number_dots Number_minus
read.req 1    1 0
re-qwf.as.fda 1 2 1
error    0  0 0
rqsfa.asa.as. 0 2 0

CodePudding user response:

Here is the complete test case I ran with your code, showing that it is working exactly as you want:

import pandas as pd

data = [
    ['read.req', 1],
    ['re-qwf.as.fda', 1],
    ['error', 0],
    ['rqsfa.asa.as.', 0]
]

df = pd.DataFrame(data, columns=['Name','Label'])
print(df)

class features_extract:
    
   def countdots(name):  
        return name.count('.')

   def countminus(name):  
        return name.count('-')

def get_features(name, label): 
    
    result = []
    name = str(name)
    result.append(name)

    result.append(features_extract.countdots(name))
    result.append(features_extract.countminus(name))

    return result

featureSet = pd.DataFrame(columns=('Name','number_dots','number_minus'))

for i in range(len(df)):
    features = get_features(df["Name"].loc[i], df["Label"].loc[i])    
    featureSet.loc[i] = features      

print(featureSet)

Output:

            Name  Label
0       read.req      1
1  re-qwf.as.fda      1
2          error      0
3  rqsfa.asa.as.      0
            Name number_dots number_minus
0       read.req           1            0
1  re-qwf.as.fda           2            1
2          error           0            0
3  rqsfa.asa.as.           3            0
  • Related