Iam working in a project to convert some invoices in PDFs to .xlsx for comparision reasons, but i ran into some trouble, in the conversion phase the program separated the minus signal from negative numbers, so what iam trying to do is, use regex to iterate through the name columns (Where the minus signal went) and match lines with a regex, if it matches it multiplicates the values column by -1 or concatenate a minus in front of the number, but i tried both ways but neither of them changed the values column.
Here's the Dataframe
date name value
519 25/02/2022 LOREM IPSUM 598,72
520 25/02/2022 LOREM IPSUM 656,56
523 25/02/2022 LOREM IPSUM - 220,32
524 25/02/2022 LOREM IPSUM - 339,76
The result I expect is
date name value
519 25/02/2022 LOREM IPSUM 598,72
520 25/02/2022 LOREM IPSUM 656,56
523 25/02/2022 LOREM IPSUM - -220,32
524 25/02/2022 LOREM IPSUM - -339,76
I tried using
r1 = re.compile(r"- $|-$")
for item in diference["name"]:
if r1.match(item):
diference["value"] = diference["value"]*(-1)
And
r1 = re.compile(r"- $|-$")
for item in diference["name"]:
if r1.match(item):
diference["value"] = "-" diference["value"]
But as i said neither of them gave me an error nor changed something
CodePudding user response:
You can use
df['value'] = pd.to_numeric(df['value'].str.replace(',', '.'))
df.loc[df['name'].str.endswith('-'), 'value'] *= -1
Details
df['value'] = pd.to_numeric(df['value'].str.replace(',', '.'))
converts the string numbers to numbersdf.loc[df['name'].str.endswith('-'), 'value'] *= -1
multiplies with -1 all values invalue
column where thename
column ends with a-
.
See a Pandas test:
import pandas as pd
df= pd.DataFrame({'name': ['LOREM IPSUM ', 'LOREM IPSUM -'], 'value':['598,72', '339,76']})
df['value'] = pd.to_numeric(df['value'].str.replace(',', '.'))
df.loc[df['name'].str.endswith('-'), 'value'] *= -1
Output:
>>> df
name value
0 LOREM IPSUM 598.72
1 LOREM IPSUM - -339.76