I have a data frame with the following columns:
d = {'lot_no': [1, 2, 3, 4],
'part_no': [01345678, 01234567, 01123456, 10123456],
'zip_code': [32835, 32835, 32808, 32835]}
df = pd.DataFrame(data=d)
First, I want to check that all 32835 values in the "zip_code" column match to a "part_no" with the following pattern, 01xxxxxx, where the Xs are numbers. Then, I want to make sure all 01xxxxxx part_no correspond to a 32835 "zip_code." If not, I would like to return a list of "lot_no" for the ones that fail the check, or True if the whole dataframe passes.
In this example, the output should be [3, 4].
CodePudding user response:
Use boolean mask:
m1 = df['zip_code'].eq('32835')
m2 = df['part_no'].str.startswith('01')
lot_no = df.loc[~(m1 & m2), 'lot_no'].tolist()
print(lot_no)
# Output
[3, 4]