Extract several values from a row when a certain value is found using a list-CodePudding

I have a .csv file with 29 columns and 1692 rows. The columns D_INT_1 and D_INT_2 are just dates. I want to check for these 2 columns if there is dates between :>= "2022-03-01" and <= "2024-12-31. And, if a value is found, I want to display the date found the value of the column "NAME" that is located on the same row of said found value.

This is what I did right now, but it only grab the dates and not the adjacent value ('NAME').

# importing module
import pandas as pd
# reading CV file
df = pd.read_csv("scratch_2.csv")

# converting column data to list
D_INT_1 = df['D_INT_1'].tolist()
D_INT_2 = df['D_INT_2'].tolist()



ext = []

ext = [i for i in D_INT_1   D_INT_2 if i >= "2022-03-01" and i <= "2024-12-31"]

print(*ext, sep="\n")

This is what I would like to get:

Example of DF:

NAME, ADDRESS, D_INT_1, D_INT_2
Mark, H4N1V8, 2023-01-02, 2019,-01-01

Expected output:

MARK, 2023-01-02

CodePudding user response：

First for performance dont use loops, because exist vectorized alternatives unpivot by DataFrame.melt and filter by Series.between with DataFrame.loc:

df = df.melt(id_vars='NAME', value_vars=['D_INT_1','D_INT_2'], value_name='Date')

df1 = df.loc[df['Date'].between("2022-03-01","2024-12-31"), ['NAME','Date']]

print (df1)
   NAME       Date
0  Mark 2023-01-02

Or filter original DataFrame and last join in concat:

df1 = df.loc[df['D_INT_1'].between("2022-03-01","2024-12-31"), ['NAME','D_INT_1']]
df2 = df.loc[df['D_INT_2'].between("2022-03-01","2024-12-31"), ['NAME','D_INT_2']]

df = pd.concat([df1.rename(columns={'D_INT_1':'date'}), 
                df2.rename(columns={'D_INT_2':'date'})])

print (df)
   NAME       date
0  Mark 2023-01-02

Last if need loops output with print:

for i in df.itertuples():
    print (i.NAME, i.Date)
    
Mark 2023-01-02 00:00:00
Mark 2019-01-01 00:00:00

CodePudding user response：

Lots of times the compact [for in] syntax can be used efficiently for simple code, but in this case I wouldn't recommend it. I suggest you use a normal for. Here's an example:

# importing module
import pandas as pd
# reading CV file
df = pd.read_csv("scratch_2.csv")

# converting column data to list
D_INT_1 = df['D_INT_1'].tolist()
D_INT_2 = df['D_INT_2'].tolist()
NAMES = df['NAME'].tolist()

# loop for every row in the data
# (i will start as 0 and increase by 1 every iteration)
for i in range(0, len(D_INT_1)):
    if D_INT_1[i] >= "2022-03-01" and D_INT_1[i] <= "2024-12-31":
        print(NAME[i], D_INT_1[i])

    if D_INT_2[i] >= "2022-03-01" and D_INT_2[i] <= "2024-12-31":
        print(NAME[i], D_INT_2[i])

CodePudding user response：

So there a few things to be of note here:

In this case, you are better off probably with a normal for-loop since it can be a bit more complicated.
To do what you want, you want to first:

Load the names:

D_INT_1 = df['D_INT_1'].tolist()
D_INT_2 = df['D_INT_2'].tolist()
NAMES = df['NAME'].tolist()

Use enumerate since we know all lists are aligned the same in your loop, keep in mind that enumerate gets both value and index, but I am getting the value manually just for cleaner (and clearer) code:

ext = []
for i,_ in enumerate(D_INT_1):
   if D_INT_1[i] >= "2022-03-01" and D_INT_1[i] <= "2024-12-31":
      ext.append((D_INT_1[i],NAMES[i]))
   if D_INT_2[i] >= "2022-03-01" and D_INT_1[i] <= "2024-12-31":
      ext.append((D_INT_2[i],NAMES[i]))

Of course, you can use a list comprehension (or in this case, two), but this form should be easier to understand for this answer.

To do so, you will need to still load the names like in the first step, then use enumerate in the list comprehension, while adding the name after i in a tuple, perhaps something like this:

ext = [(i,NAMES[ind]) for ind,i in enumerate(D_INT_1   D_INT_2) if i >= "2022-03-01" and i <= "2024-12-31"]

Keep in mind that I didn't test the above code since I have no access to the original csv, but it should be a good starting point.