Home > Software design >  Search through a dataframe using Regex in a for loop to pull out a value associated with the Regex
Search through a dataframe using Regex in a for loop to pull out a value associated with the Regex

Time:12-09

I have a subset dataframe from a much larger dataframe. I need to be able to create a for loop that searches through a dataframe and pull out the data corresponding to the correct name.

import pandas as pd
import numpy as np
import re

data = {'Name': ['CH_1', 'CH_2', 'CH_3', 'FV_1', 'FV_2', 'FV_3'],
        'Value': [1, 2, 3, 4, 5, 6]
            }

df = pd.DataFrame(data)

FL = [17.7, 60.0]
CH = [20, 81.4]

tol = 8
time1 = FL[0]   tol
time2 = FL[1]   tol
time3 = CH[0]   tol
time4 = CH[1]   tol
FH_mon = df['Values'] *5
workpercent = [.7, .92, .94]
mhpy = [2087, 2503, 3128.75]
list1 = list()
list2 = list()

for x in df['Name']:
    if x == [(re.search('FV_', s)) for s in df['Name'].values]:
        y = np.select([FH_mon < time1 , (FH_mon >= time1) and (FH_mon < time2), FH_mon > time2], [workpercent[0],workpercent[1],workpercent[2]])
        z = np.select([FH_mon < time1 , (FH_mon >= time1) and (FH_mon < time2), FH_mon > time2], [mhpy[0],mhpy[1],mhpy[2]])   
    if x == [(re.search('CH_', s)) for s in df['Name'].values]:
       y = np.select([FH_mon < time3, (FH_mon >= time3) and (FH_mon < time4)],  [workpercent[0],workpercent[1]])
       z = np.select([FH_mon < time3, (FH_mon >= time3) and (FH_mon < time4)],  [mhpy[0],mhpy[1]])

list1.append(y)
list2.append(z)

I had a simple version earlier where I was just added a couple numbers, and I was getting really helpful answers to how I asked my question, but here is the more complex version. I need to search through and any time there is a FV in the name column, the if loop runs and uses data from the Name column with FV. Same for CH. I have the lists to keep track of each value as the loop loops through the Name column. If there is a simpler way I would really appreciate seeing it, but right now this seems like the cleanest way but I am receiving errors or the loop will not function properly.

CodePudding user response:

If the "Name" column only has values starting with "FV_" or "CH_", use where:

df["Value"] = df["Value"].add(2).where(df["Name"].str.startswith("FV_"), df["Value"].add(4))

If you might have other values in "Name", use numpy.select:

import numpy as np

df["Value"] = np.select([df["Name"].str.startswith("FV_"), df["Name"].str.startswith("CH_")], [df["Value"].add(2), df["Value"].add(4)])
Output:
>>> df
   Name  Value
0  CH_1      5
1  CH_2      6
2  CH_3      7
3  FV_1      6
4  FV_2      7
5  FV_3      8

CodePudding user response:

This should be what you want:

for index, row in df.iterrows(): 
    if re.search("FV_", row["Name"]): 
        df.loc[index, "Value"]  = 2 
    elif re.search("CH_", row["Name"]): 
        df.loc[index, "Value"]  = 4
  • Related