Python - Pandas Search and update multiple values - is this possible or should I use Import CSV?-CodePudding

I have CSV with 10k lines. I want to first of all search for the rows with required info and then edit those lines.

example below


jobs = [
  'X01_TEST1_C', 
  'P01_TEST3_B'
  ]


headers = ['job', 'name', 'date', 'extrainfo']
data = [
  ['X01_TEST1_C', 'NAME', 'DATE', 'EXTRADATA'],
  ['P01_TEST3_C', 'NAME', 'DATE', 'EXTRADATA'],
  ['X01_TEST1002_C', 'NAME', 'DATE', 'EXTRADATA'],
  ['X01_TEST4231_C', 'NAME', 'DATE', 'EXTRAP01_TEST3_BDATA']
]

I can load this into PANDAS and then search for single items using below.

df= pd.read_csv("filename",sep=",", encoding='cp1252')
df1 = df[(df['job'].str.contains("X01_TEST1_C", na=False))]
print(df1)

which would print

['X01_TEST1_C', 'NAME', 'DATE', 'EXTRADATA']

How can I search for multiple values at once via pandas

I want something like

df1 = df[(df['job'].str.contains(jobs, na=False))]

But I get error TypeError: first argument must be string or compiled pattern

Once I get passed this part I want to update some jobs from X01_TEST1_C to X01_NEW_TEST1_C - adding this bit of info in incase easier to do whole thing at once.

Is Pandas good for this or do I need to try via different method like import csv?

Thanks for any help.

CodePudding user response：

try:

jobs = [
  'X01_TEST1_C', 
  'P01_TEST3_B'
  ]

df1 = df[df['job'].str.contains('|'.join(jobs), na=False)] #the default is regex=True so no need to add it

#this is similar to:
df1 = df[df['job'].str.contains('X01_TEST1_C|P01_TEST3_B', na=False)]

CodePudding user response：

The simplest might be to concatenate your jobs to a single regexp and use that:

jobs_re = "|".join(re.escape(job) for job in jobs)
df1 = df[df['job'].str.contains(jobs_re, regex=True, na=False)]