I have extracted the item and price from a receipt in python, creating a database using pandas I get a range index of 1 entry from 0 to 0. have looked on line for quite a while, I have tried it all, nothing seems to change the row index. Here is my code and output.
This is my code
res = re.sub('[^a-zA-z] ', ' ', line)
r = ' '.join([w for w in res.split() if len(w)>1])
dec = re.findall('\d \.\d ',line)
for item in dec:
df = pd.DataFrame({'Item': [r], 'Price': [item]})
df['Price'] = pd.to_numeric(df['Price'], errors='coerce')
print(df)
print(df.info())
How do I convert this
Item Price
0 BAGGED KALE 2.94
Item Price
0 ORG PARSLEY 1.98
Item Price
0 ORG BASIL 1.98
Item Price
0 ORG BASIL 1.98
Item Price
0 ORG BAY LEAV 1.98
Item Price
0 GV ZUC BLND 1.48
To this
Item Price
0 BAGGED KALE 2.94
1 ORG PARSLEY 1.98
2 ORG BASIL 1.98
3 ORG BASIL 1.98
4 ORG BAY LEAV 1.98
5 GV ZUC BLND 1.48
CodePudding user response:
Doesn't simply passing the list to the dataframe help you solve the issue? That is:
df = pd.DataFrame({'Item':r, 'Price':dec})
CodePudding user response:
res = re.sub('[^a-zA-z] ', ' ', line)
r = ' '.join([w for w in res.split() if len(w)>1])
dec = re.findall('\d \.\d ',line)
full_list = []
for i, item in enumerate(dec):
full_list.append((item, r[i]))
# df = pd.DataFrame({'Item': [r], 'Price': [item]})
# df['Price'] = pd.to_numeric(df['Price'], errors='coerce')
# print(df)
# print(df.info())
df = pd.DataFrame(full_list, columns=['Price', 'Item'])
print(df)