I have the following csv file:
Column1;Column2;Column3;Column4
A;B;6;6200
B;D;5;5000
E;F;6;5100
F;F;6;6200
Now I want to check, if column4 always begins with a '6', if column3 has an entry 6. In case, where this does not match I want to print a message. Same holds for the case vice versa: In case column4 does not begin with a '6', but column3 has an entry 6. Both columns are string.
I tried:
if ((df[df["Column3"] == "6"] and df['Column4'].str[0:1] <> "6") or (df[df["Column3"] <> "6"] and df['Column4'].str[0:1] == "6")):
print("Error")
But doesn't work. What did I miss?
CodePudding user response:
You can use two boolean masks and check their identity:
out = df[df['Column3'].eq('6') == df['Column4'].str.startswith('6')]
NB. assuming you have strings. If not, add `.astype(str) right after slicing.
out = df[df['Column3'].astype(str).eq('6') == df['Column4'].astype(str).str.startswith('6')]
Output:
0 A B 6 6200
1 B D 5 5000
3 F F 6 6200
printing a message:
m = df['Column3'].eq('6') != df['Column4'].str.startswith('6')
if m.any():
print(f'Error: row(s) {",".join(m[m].index.astype(str))} are invalid')
Output:
Error: row(s) 2 are invalid
CodePudding user response:
Try this:
import pandas as pd
df = pd.DataFrame({'Column1':['A','B','E','F'],'Column2':['B','D','F','F'], 'Column3':['6','5','6','6'], 'Column4':['6200','5000','5100','6200']})
df.loc[(df['Column3'].str.slice(stop=1)!=df['Column4'].str.slice(stop=1))] #Select rows where first char is not the same in column3 vs 4
# Column1 Column2 Column3 Column4
# E F 6 5100
CodePudding user response:
import pandas as pd
df = pd.read_clipboard(sep=';').astype(str) # Your frame here
out = df.loc[df["Column3"] == "6", "Column4"].str[0].eq("6").all()
# out is True if all values in column 4 start with 6 when the corresponding cell in column 3 is "6", False otherwise.
# Example usage:
if not out:
print("Caution!")