I am not able to spot the issue in the code below:
class features_extract:
def countdots(name):
return name.count('.')
def countminus(name):
return name.count('-')
def get_features(name, label):
result = []
name = str(name)
result.append(name)
result.append(features_extract.countdots(name))
result.append(features_extract.countminus(name))
return result
featureSet = pd.DataFrame(columns=('Name','number_dots','number_minus'))
for i in range(len(result)):
features = get_features(result["Name"].loc[i], result["Label"].loc[i])
featureSet.loc[i] = features
What the code above should do is to count the number of . and - in a name. The original dataset is
Name Label (other columns not relevant for this task)
read.req 1
re-qwf.as.fda 1
error 0
rqsfa.asa.as. 0
The code above is currently returning only 0s. Expected output:
Name Label (other columns not relevant for this task) Number_dots Number_minus
read.req 1 1 0
re-qwf.as.fda 1 2 1
error 0 0 0
rqsfa.asa.as. 0 2 0
CodePudding user response:
Here is the complete test case I ran with your code, showing that it is working exactly as you want:
import pandas as pd
data = [
['read.req', 1],
['re-qwf.as.fda', 1],
['error', 0],
['rqsfa.asa.as.', 0]
]
df = pd.DataFrame(data, columns=['Name','Label'])
print(df)
class features_extract:
def countdots(name):
return name.count('.')
def countminus(name):
return name.count('-')
def get_features(name, label):
result = []
name = str(name)
result.append(name)
result.append(features_extract.countdots(name))
result.append(features_extract.countminus(name))
return result
featureSet = pd.DataFrame(columns=('Name','number_dots','number_minus'))
for i in range(len(df)):
features = get_features(df["Name"].loc[i], df["Label"].loc[i])
featureSet.loc[i] = features
print(featureSet)
Output:
Name Label
0 read.req 1
1 re-qwf.as.fda 1
2 error 0
3 rqsfa.asa.as. 0
Name number_dots number_minus
0 read.req 1 0
1 re-qwf.as.fda 2 1
2 error 0 0
3 rqsfa.asa.as. 3 0