I have the table below:
import pandas as pd
raw_data = {
'vendor_id': [1, 2, 3, 4, 5, 6],
'name': ['vendor_schmendor', 'parts_r_us', 'vendor_king', 'vendor_diagram', 'venny', 'vendtriloquist'],
'contract_sign_date': ['2018-09-01', '2018-09-03', '2018-10-11', '2018-08-21', '2018-08-13', '2018-10-29'],
'total_spend' :[34324, 23455, 77654, 23334, 94843, 23444]}
df = pd.DataFrame(raw_data, columns = ['vendor_id', 'name', 'contract_sign_date', 'total_spend'])
I was given a task where I have to drop all the rows where the contract_sign_date
is between "2018-09-01" and "2018-10-13", this is my solution (although it doesn't work):
alter = df.drop((df['contract_sign_date'] == "2018-09-01") & (df['contract_sign_date'] == "2018-10-13"))
The output throws: KeyError: '[False, False, False, False, False, False] not found in axis'
So can anyone provide a code so in order that I can construct what I was desired for?
CodePudding user response:
Your condition is to check simultaneous equality with two different values (a == b) and (a==c)
, which is impossible.
Use between
and the boolean NOT operator ~
:
alter = df[~df['contract_sign_date'].between("2018-09-01", "2018-10-13")]
output:
vendor_id name contract_sign_date total_spend
3 4 vendor_diagram 2018-08-21 23334
4 5 venny 2018-08-13 94843
5 6 vendtriloquist 2018-10-29 23444
NB. we're using strings here as the YYYY-MM-DD format enables direct comparison, with a different format you would need to use a datetime type
CodePudding user response:
If you want to use drop
, you can try
m = (df['contract_sign_date'] < "2018-09-01") & (df['contract_sign_date'] >= "2018-08-13")
# or
m = df['contract_sign_date'].between("2018-08-13", "2018-09-01", inclusive="left")
out = df.drop(m[~m].index)
print(out)
vendor_id name contract_sign_date total_spend
3 4 vendor_diagram 2018-08-21 23334
4 5 venny 2018-08-13 94843