Home > Enterprise >  Pandas dataframe check if left part of a string matches another entry in a column
Pandas dataframe check if left part of a string matches another entry in a column

Time:01-06

I have the following csv file:

Column1;Column2;Column3;Column4
A;B;6;6200
B;D;5;5000
E;F;6;5100
F;F;6;6200

Now I want to check, if column4 always begins with a '6', if column3 has an entry 6. In case, where this does not match I want to print a message. Same holds for the case vice versa: In case column4 does not begin with a '6', but column3 has an entry 6. Both columns are string.

I tried:

if ((df[df["Column3"] == "6"] and df['Column4'].str[0:1] <> "6") or (df[df["Column3"] <> "6"] and df['Column4'].str[0:1] == "6")):
    print("Error")

But doesn't work. What did I miss?

CodePudding user response:

You can use two boolean masks and check their identity:

out = df[df['Column3'].eq('6') == df['Column4'].str.startswith('6')]

NB. assuming you have strings. If not, add `.astype(str) right after slicing.

out = df[df['Column3'].astype(str).eq('6') == df['Column4'].astype(str).str.startswith('6')]

Output:

0       A       B        6     6200
1       B       D        5     5000
3       F       F        6     6200

printing a message:

m = df['Column3'].eq('6') != df['Column4'].str.startswith('6')
if m.any():
    print(f'Error: row(s) {",".join(m[m].index.astype(str))} are invalid')

Output:

Error: row(s) 2 are invalid

CodePudding user response:

Try this:

import pandas as pd
df = pd.DataFrame({'Column1':['A','B','E','F'],'Column2':['B','D','F','F'], 'Column3':['6','5','6','6'], 'Column4':['6200','5000','5100','6200']})
df.loc[(df['Column3'].str.slice(stop=1)!=df['Column4'].str.slice(stop=1))] #Select rows where first char is not the same in column3 vs 4

#  Column1 Column2 Column3 Column4
#       E       F       6    5100

CodePudding user response:

import pandas as pd

df = pd.read_clipboard(sep=';').astype(str) # Your frame here

out = df.loc[df["Column3"] == "6", "Column4"].str[0].eq("6").all()
# out is True if all values in column 4 start with 6 when the corresponding cell in column 3 is "6", False otherwise.

# Example usage:
if not out:
    print("Caution!")
  • Related