Code to Extract Data Based on Criteria in CSV-CodePudding

I have a comma delimited text file containing 2M records with multiple columns. I would like a way to extract only the columns and rows I need based on values in other columns.

Criteria:
ODBIC = YES
NAM = 1 OR 2

Keep columns:
ACC | NUM | NAM | ODBIC (and remove all the rest)

Sample data below:

INDEX,ACC,NUM,SUBSCRIBED,PN,PDP,NAM,ODBIC
1,37412900,1221222121,0,-1,-1,1,YES
1,37412911,2323232323,0,-1,-1,2,YES
1,374123434,3434343434,0,-1,-1,343,1
1,374129232,-1,0,-1,-1,434,YES

End result:

ACC	NUM	NAM	ODBIC
37412900	1221222121	1	YES
37412911	2323232323	2	YES

As it's 2M records doing in Excel is tedious and time consuming. I came across recommendations to do in Python but not sure how to write the code. Appreciate the help!

CodePudding user response：

Install pandas module

pip install pandas

CODE

import pandas as pd

file_path = "path_to_csv_file"
data = pd.read_csv(file_path)

data = data[(data["ODBIC"] == "YES") & ((data["NAM"] == 1) | (data["NAM"] == 2))]
data = data[["ACC", "NUM", "NAM", "ODBIC"]]

data.to_csv("result.csv")
print(data)

OUTPUT

        ACC         NUM  NAM ODBIC
0  37412900  1221222121    1   YES
1  37412911  2323232323    2   YES

The results will be saved to result.csv file