How to create python function that performs multiple checks on a dataframe?-CodePudding

I have multiple inventory tables like so:

line no	-1 qty	-2 qty
1	-	3
2	42.1 FT	-
3	5	-
4	-	10 FT
5	2	1
6	6.7	-

line no	qty
1	2
2	4.5 KG
3	5
4
5	13
6	AR

I want to create logic check for the quantity column using python. (The table may have more than one qty column and I need to be able to check all of them. In both examples, I have the tables formatted as dataframes.)

Acceptable criteria:

integer with or without "EA" (meaning each)
"AR" (as required)
integer or float with unit of measure
if multiple QTY columns, then "-" is also accepted (first table)

I want to return a list per page, containing the line no. corresponding to rows where quantity value is missing (line 4, second table) or does not meet acceptance criteria (line 6, table 1). If the line passes the checks, then return True.

I have tried:

qty_col = [col for col in df.columns if 'qty' in col]
df['corr_qty'] = np.where(qty_col.isnull(), False, df['line_no'])

but this creates the quantity columns as a list and yields the following AttributeError: 'list' object has no attribute 'isnull'

CodePudding user response：

Intro and Suggestions:

Welcome to StackOverflow. Some general tips when asking questions on S.O. include as much information as possible. In addition, always identify the libraries you want to use and the accepted approach since there can be multiple solutions to the same problem, looks like you've done that.

Also, it is best to always share all, if not, most of your attempted solutions so others can understand the thought process and fully understand the best approach to provide a potential solution.

The Solution:

It wasn't clear if the solution you are looking for required that you read the PDF to create the dataframe or if converting the PDF to a CSV and processing the data using the CSV was sufficient. I took the latter approach.

import tabula as tb
import pandas as pd

#PDF file path
input_file_path = "/home/hackernumber7/Projects/python/resources/Pandas_Sample_Data.pdf"

#CSV file path
output_file_path = "/home/hackernumber7/Projects/python/resources/Pandas_Sample_Data.csv"

#Read the PDF
#id = tb.read_pdf(input_file_path, pages='all')

#Convert the PDF to CSV
cv = tb.convert_into(input_file_path, output_file_path, "csv", pages="all") 

#Read initial data
id = pd.read_csv(output_file_path, delimiter=",")

#Print the initial data
print(id)

#Create the dataframe 
df = pd.DataFrame(id, columns = ['qty'])

#Print the data as a DataFrame object; boolean values when conditions met
print(df.notna())