I have a df ,you can have by this code:
import numpy as np
import pandas as pd
from io import StringIO
dfs = """
contract RB RateCompany gs IssueDate ValIssueDate ToDate1 FromDate1
84 GA16 77 T G 19940701 19480101.0 197702.0 190001.0
85 GA16 77 T G 19940701 19980101.0 999999.0 197703.0
"""
df = pd.read_csv(StringIO(dfs.strip()), sep='\s ',
dtype={"RB": int}
)
df
Out put:
contract RB RateCompany gs IssueDate ValIssueDate ToDate1 FromDate1
84 GA16 77 T G 19940701 19480101.0 197702.0 190001.0
85 GA16 77 T G 19940701 19980101.0 999999.0 197703.0
For this df, a set of contract with RB ,is unique,it means only 1 row should be left,and the condition is:
FromDate1<=df.IssueDate<=ToDate1
So I tired:
df = df[((df.duplicated(subset=["contract", "RB"], keep=False)) &
(df['IssueDate'] <= df['ToDate1']) &
(df['IssueDate'] >= df['FromDate1']))]
But the output is blank:
contract RB RateCompany gs IssueDate ValIssueDate ToDate1 FromDate1
The expected output should be:
contract RB RateCompany gs IssueDate ValIssueDate ToDate1 FromDate1
85 GA16 77 T G 19940701 19980101.0 999999.0 197703.0
any friend can help?
CodePudding user response:
Divide your IssueDate
by 100, probably a problem of unit from time conversion:
>>> df.loc[df['IssueDate'].div(100).between(df['FromDate1'], df['ToDate1'])]
contract RB RateCompany gs IssueDate ValIssueDate ToDate1 FromDate1
85 GA16 77 T G 19940701 19980101.0 999999.0 197703.0
Check it:
>>> df.loc[85, ['IssueDate', 'FromDate1', 'ToDate1']].astype(int)
IssueDate 19940701
FromDate1 197703
ToDate1 999999
Name: 85, dtype: int64
CodePudding user response:
Lets discuss your answer first
df = df[((df.duplicated(subset=["contract", "RB"], keep=False)) &
(df['IssueDate'] <= df['ToDate1']) &
(df['IssueDate'] >= df['FromDate1']))]
Sure this code will give you blank result. It might be because of different data types of the columns.
First make sure those columns (that you want to compare have same datatype) by df.dtypes
if the data you want to compare is date, compare with same format. Example if you want to compare "YYYYMM"
you should compare it with date with format "YYYYMM"