Home > database >  how to extract entire row when a value is found
how to extract entire row when a value is found

Time:11-09

I have a code that check in different columns for all the dates that are >= "2022-03-01" and i <= "2024-12-31 then it append it to a list ext=[].

What I would like is to be able to extract the more information about located on the same row.

My code:

from pandas import *

data = read_csv("Book1.csv")

# converting column data to list
D_EXT_1 = data['D_EXT_1'].tolist()
D_INT_1 = data['D_INT_1'].tolist()
D_EXT_2 = data['D_EXT_2'].tolist()
D_INT_2 = data['D_INT_2'].tolist()
D_EXT_3 = data['D_EXT_3'].tolist()
D_INT_3 = data['D_INT_3'].tolist()
D_EXT_4 = data['D_EXT_4'].tolist()
D_INT_4 = data['D_INT_4'].tolist()
D_EXT_5 = data['D_EXT_5'].tolist()
D_INT_5 = data['D_INT_5'].toList()
D_EXT_6 = data['D_EXT_6'].toList()
D_INT_6 = data['D_INT_6'].toList()

ext = []

ext = [i for i in D_INT_1   D_INT_2   D_INT_3   D_INT_4   D_INT_5   D_INT_6 if i >= "2022-03-01" and i <= "2024-12-31"]

print(*ext, sep="\n")

Example of data:

NAME,ADRESS,D_INT_1,D_EXT_1,D_INT_2,D_EXT_2
ALEX,h4n1p8,2020-01-01,2024-01-01,2023-02-02,2020-01-01

What my code will print with that data:

2024-01-01 

DESIRED OUTPUT:

Alex, 2024-01-01 

As requested by not_speshal -> data.head().to_dict()

{'EMPL. NO': {0: 5}, "NOM A L'EMPLACEMENT": {0: 'C010 - HOPITAL REGIONAL DE RIMOUSKI/CENTRE SERVEUR OPTILAB'}, 'ADRESSE': {0: '150 AVENUE ROULEAU'}, 'VILLE': {0: 'RIMOUSKI'}, 'PROV': {0: 'QC'}, 'OBJET NO': {0: 67}, "EMPLACEMENT DE L'APPAREIL": {0: 'CHAUFFERIE'}, 'RBQ 2018': {0: nan}, "DESCRIPTION DE L'APPAREIL": {0: 'CHAUDIERE AQUA. A VAPEUR'}, 'MANUFACTURIER': {0: 'MIURA'}, 'DIMENSIONS': {0: nan}, 'MAWP': {0: 170}, 'SVP': {0: 150}, 'DERNIERE INSP. EXT.': {0: '2019-05-29'}, 'FREQ. EXT.': {0: 12}, 'DERNIERE INSP. INT.': {0: '2020-06-03'}, 'FREQ. INT.': {0: 12}, 'D_EXT_1': {0: '2020-05-29'}, 'D_INT_1': {0: '2021-06-03'}, 'D_EXT_2': {0: '2021-05-29'}, 'D_INT_2': {0: '2022-06-03'}, 'D_EXT_3': {0: '2022-05-29'}, 'D_INT_3': {0: '2023-06-03'}, 'D_EXT_4': {0: '2023-05-29'}, 'D_INT_4': {0: '2024-06-03'}, 'D_EXT_5': {0: '2024-05-29'}, 'D_INT_5': {0: '2025-06-03'}, 'D_EXT_6': {0: '2025-05-29'}, 'D_INT_6': {0: '2026-06-03'}}


CodePudding user response:

IIUC, try:

columns = ['D_EXT_1', 'D_EXT_2', 'D_EXT_3', 'D_EXT_4', 'D_EXT_5', 'D_EXT_6', 'D_INT_1', 'D_INT_2', 'D_INT_3', 'D_INT_4', 'D_INT_5', 'D_INT_6']
data[columns] = data[columns].apply(pd.to_datetime)
output = data[((data[columns]>="2022-03-01")&(data[columns]<="2024-12-31")).any(axis=1)]

This will return all the rows where any date in the columns list is between 2022-03-01 and 2024-12-31

CodePudding user response:

It seems that you want to get only rows where at least one of the dates is in the range ["2022-03-01", "2024-12-31"], correct?

First, convert all the date columns to datetime, using DataFrame.apply pandas.to_datetime.

import pandas as pd

date_cols = ['D_EXT_1', 'D_EXT_2', 'D_EXT_3', 'D_EXT_4', 'D_EXT_5', 'D_EXT_6', 'D_INT_1', 'D_INT_2', 'D_INT_3', 'D_INT_4', 'D_INT_5', 'D_INT_6']
data[date_cols] = data[date_cols].apply(pd.to_datetime)

Then create a 2D boolean mask of all the dates that are in the desired range

is_between_dates = (data[date_cols] > "2022-03-01") & (data[datecols] <= "2024-12-31")
# print(is_between_dates) to clearly understand what it represents

Finally, select the rows that contain at least one True value, meaning that there is at least one date in that row that belongs to the date range. This can be achieved using DataFrame.any with axis=1 on the 2D boolean mask, is_between_dates.

# again, print(is_between_dates.any(axis=1)) to see
data = data[is_between_dates.any(axis=1)]

CodePudding user response:

Use melt to reformat your dataframe to be easily searchable:

df = pd.read_csv('Book1.csv').melt(['NAME', 'ADRESS']) \
       .astype({'value': 'datetime64'}) \
       .query("'2022-03-01' <= value & value <= '2024-12-31'")

At this point your dataframe looks like:

>>> df
   NAME  ADRESS variable      value
1  ALEX  h4n1p8  D_EXT_1 2024-01-01
2  ALEX  h4n1p8  D_INT_2 2023-02-02

Now it's easy to get a NAME for a date:

>>> df.loc[df['value'] == '2024-01-01', 'NAME']
1    ALEX
Name: NAME, dtype: object

# OR

>>> df.loc[df['value'] == '2024-01-01', 'NAME'].tolist()
['ALEX']

CodePudding user response:

Start with

import pandas as pd
cols = [prefix   str(i) for prefix in ['D_EXT_','D_INT_'] for i in range(1,7)]

data = pd.read_csv("Book1.csv")
for col in cols:
  data.loc[:,col] = pd.to_datetime(data.loc[:,col])

Then use

ext = data[
    (
      data.loc[:,cols].ge(pd.to_datetime("2022-03-01"))\
      & data.loc[:,cols].le(pd.to_datetime("2024-12-13"))\
    ).any(axis=1)
  ]
  • Related