I have a code that check in different columns for all the dates that are >= "2022-03-01" and i <= "2024-12-31
then it append it to a list ext=[]
.
What I would like is to be able to extract the more information about located on the same row.
My code:
from pandas import *
data = read_csv("Book1.csv")
# converting column data to list
D_EXT_1 = data['D_EXT_1'].tolist()
D_INT_1 = data['D_INT_1'].tolist()
D_EXT_2 = data['D_EXT_2'].tolist()
D_INT_2 = data['D_INT_2'].tolist()
D_EXT_3 = data['D_EXT_3'].tolist()
D_INT_3 = data['D_INT_3'].tolist()
D_EXT_4 = data['D_EXT_4'].tolist()
D_INT_4 = data['D_INT_4'].tolist()
D_EXT_5 = data['D_EXT_5'].tolist()
D_INT_5 = data['D_INT_5'].toList()
D_EXT_6 = data['D_EXT_6'].toList()
D_INT_6 = data['D_INT_6'].toList()
ext = []
ext = [i for i in D_INT_1 D_INT_2 D_INT_3 D_INT_4 D_INT_5 D_INT_6 if i >= "2022-03-01" and i <= "2024-12-31"]
print(*ext, sep="\n")
Example of data:
NAME,ADRESS,D_INT_1,D_EXT_1,D_INT_2,D_EXT_2
ALEX,h4n1p8,2020-01-01,2024-01-01,2023-02-02,2020-01-01
What my code will print with that data:
2024-01-01
DESIRED OUTPUT:
Alex, 2024-01-01
As requested by not_speshal -> data.head().to_dict()
{'EMPL. NO': {0: 5}, "NOM A L'EMPLACEMENT": {0: 'C010 - HOPITAL REGIONAL DE RIMOUSKI/CENTRE SERVEUR OPTILAB'}, 'ADRESSE': {0: '150 AVENUE ROULEAU'}, 'VILLE': {0: 'RIMOUSKI'}, 'PROV': {0: 'QC'}, 'OBJET NO': {0: 67}, "EMPLACEMENT DE L'APPAREIL": {0: 'CHAUFFERIE'}, 'RBQ 2018': {0: nan}, "DESCRIPTION DE L'APPAREIL": {0: 'CHAUDIERE AQUA. A VAPEUR'}, 'MANUFACTURIER': {0: 'MIURA'}, 'DIMENSIONS': {0: nan}, 'MAWP': {0: 170}, 'SVP': {0: 150}, 'DERNIERE INSP. EXT.': {0: '2019-05-29'}, 'FREQ. EXT.': {0: 12}, 'DERNIERE INSP. INT.': {0: '2020-06-03'}, 'FREQ. INT.': {0: 12}, 'D_EXT_1': {0: '2020-05-29'}, 'D_INT_1': {0: '2021-06-03'}, 'D_EXT_2': {0: '2021-05-29'}, 'D_INT_2': {0: '2022-06-03'}, 'D_EXT_3': {0: '2022-05-29'}, 'D_INT_3': {0: '2023-06-03'}, 'D_EXT_4': {0: '2023-05-29'}, 'D_INT_4': {0: '2024-06-03'}, 'D_EXT_5': {0: '2024-05-29'}, 'D_INT_5': {0: '2025-06-03'}, 'D_EXT_6': {0: '2025-05-29'}, 'D_INT_6': {0: '2026-06-03'}}
CodePudding user response:
IIUC, try:
columns = ['D_EXT_1', 'D_EXT_2', 'D_EXT_3', 'D_EXT_4', 'D_EXT_5', 'D_EXT_6', 'D_INT_1', 'D_INT_2', 'D_INT_3', 'D_INT_4', 'D_INT_5', 'D_INT_6']
data[columns] = data[columns].apply(pd.to_datetime)
output = data[((data[columns]>="2022-03-01")&(data[columns]<="2024-12-31")).any(axis=1)]
This will return all the rows where any date in the columns list is between 2022-03-01
and 2024-12-31
CodePudding user response:
It seems that you want to get only rows where at least one of the dates is in the range ["2022-03-01", "2024-12-31"], correct?
First, convert all the date columns to datetime, using DataFrame.apply
pandas.to_datetime
.
import pandas as pd
date_cols = ['D_EXT_1', 'D_EXT_2', 'D_EXT_3', 'D_EXT_4', 'D_EXT_5', 'D_EXT_6', 'D_INT_1', 'D_INT_2', 'D_INT_3', 'D_INT_4', 'D_INT_5', 'D_INT_6']
data[date_cols] = data[date_cols].apply(pd.to_datetime)
Then create a 2D boolean mask of all the dates that are in the desired range
is_between_dates = (data[date_cols] > "2022-03-01") & (data[datecols] <= "2024-12-31")
# print(is_between_dates) to clearly understand what it represents
Finally, select the rows that contain at least one True value, meaning that there is at least one date in that row that belongs to the date range. This can be achieved using DataFrame.any
with axis=1
on the 2D boolean mask, is_between_dates
.
# again, print(is_between_dates.any(axis=1)) to see
data = data[is_between_dates.any(axis=1)]
CodePudding user response:
Use melt
to reformat your dataframe to be easily searchable:
df = pd.read_csv('Book1.csv').melt(['NAME', 'ADRESS']) \
.astype({'value': 'datetime64'}) \
.query("'2022-03-01' <= value & value <= '2024-12-31'")
At this point your dataframe looks like:
>>> df
NAME ADRESS variable value
1 ALEX h4n1p8 D_EXT_1 2024-01-01
2 ALEX h4n1p8 D_INT_2 2023-02-02
Now it's easy to get a NAME
for a date:
>>> df.loc[df['value'] == '2024-01-01', 'NAME']
1 ALEX
Name: NAME, dtype: object
# OR
>>> df.loc[df['value'] == '2024-01-01', 'NAME'].tolist()
['ALEX']
CodePudding user response:
Start with
import pandas as pd
cols = [prefix str(i) for prefix in ['D_EXT_','D_INT_'] for i in range(1,7)]
data = pd.read_csv("Book1.csv")
for col in cols:
data.loc[:,col] = pd.to_datetime(data.loc[:,col])
Then use
ext = data[
(
data.loc[:,cols].ge(pd.to_datetime("2022-03-01"))\
& data.loc[:,cols].le(pd.to_datetime("2024-12-13"))\
).any(axis=1)
]