I have a list of strings that I want to perform operations on and append to rows in a dataframe. Running these operations on a single string works fine but I am having trouble looping through. The code below returns an empty dataframe and I am not sure why?
col_names = ["Version Available", "Newer Version Available"]
def my_function(item):
for x in item:
querywords = x.split()
resultwords = [word for word in querywords if word not in stopwords]
result = ' '.join(resultwords)
line = re.findall(r'\bNewer.*(?=\sVersion\b)', result)
line = "".join(line)
line = line.replace("Newer Version Available :", "")
line2 = re.findall(r'(Version.*){2}(?=\sSource\b)', result)
line2 = "".join(line2)
line2 = line2.replace("Version Available :", "")
s = [[line] [line2]]
data = pd.DataFrame(s)
data.columns = col_names
df = my_function(my_list)
CodePudding user response:
You are creating the data frame inside the for loop, which means that with each loop, you are re-creating the data frame. Moving it outside the for loop should help.
CodePudding user response:
Is this something you are looking for?
col_names = ["Version Available", "Newer Version Available"]
def my_function(item):
df = pd.DataFrame(columns=col_names) #initialize data frame
for x in item:
querywords = x.split()
resultwords = [word for word in querywords if word not in stopwords]
result = ' '.join(resultwords)
line = re.findall(r'\bNewer.*(?=\sVersion\b)', result)
line = "".join(line)
line = line.replace("Newer Version Available :", "")
line2 = re.findall(r'(Version.*){2}(?=\sSource\b)', result)
line2 = "".join(line2)
line2 = line2.replace("Version Available :", "")
df.loc[len(df.index)] = [line,line2] # add row to data
return df
df = my_function(my_list)
Reference