import pandas as pd
colnames = ['Date', 'Items', 'Quantity', 'Price']
df1 = pd.read_csv('data_assignment_1.txt',sep=" ",names=colnames, header=None)
print(df1)
Output:
Date Items Quantity Price
0 2020-09-23 Item_A 1 1.9
1 2020-09-23 Item_B 1 1.2
2 2020-09-23 Item_A 2 1.9
3 2020-09-23 Item_B 1 1.2
4 2020-09-24 Item_A 1 1.9
5 2020-09-24 Item_B 1 1.2
6 2020-09-24 Item_C 1 1.3
7 2020-09-25 Item_D 1 2.7
Now I groupby date and items to find the total Quantity:
groupby_date_item = df1.groupby(['Date', 'Items'])['Quantity'].sum()
print(groupby_date_item['2020-09-23','Item_A'])
Output result:
3
Now the problem is if i put Item D with the date 2020-09-23 I will get an error:
print(groupby_date_item['2020-09-23','Item_D'])
Output result:
raise KeyError(key) from err
KeyError: ('2020-09-23', 'Item_d')
How do I handle the error if item does not exist on that date or input wrong date and item?
CodePudding user response:
As you have discovered, you will get an Error if you try to retrieve a value that does not exist. One way to resolve this issue is to use a try-except
clause.
See example below - where I use a custom function to handle the data retrieval using a try-except clause.
If the function generates an error, it will return np.nan
.
Code:
import numpy as np
import pandas as pd
def get_Item(my_Date, my_Item, gb_date_item):
try:
result = gb_date_item[my_Date, my_Item]
except:
result = np.nan
return result
df1 = pd.DataFrame({ 'Date': ['2020-09-23', '2020-09-23', '2020-09-23', '2020-09-23', '2020-09-24', '2020-09-24', '2020-09-24', '2020-09-25'],
'Items': ['Item_A', 'Item_B', 'Item_A', 'Item_B', 'Item_A', 'Item_B', 'Item_C', 'Item_D'],
'Quantity': [1, 1, 2, 1, 1, 1, 1, 1],
'Price': [1.9, 1.2, 1.9, 1.2, 1.9, 1.2, 1.3, 2.7]})
groupby_date_item = df1.groupby(['Date', 'Items'])['Quantity'].sum()
print(get_Item('2020-09-23', 'Item_A', groupby_date_item))
print(get_Item('2020-09-23', 'Item_D', groupby_date_item))
Output:
3
nan
CodePudding user response:
Here is an alternative where you check for the key. In simple terms direct to your question, you would just use if ('2020-09-23', 'Item_D') in groupby_date_item: ...
This is a more complete example:
import pandas as pd
df1 = pd.DataFrame({
"Date":['2020-09-23','2020-09-23','2020-09-25'],
"Items":['Item_A','Item_A','Item_D'],
"Quantity":[1,1,1]
})
g = df1.groupby(['Date','Items'])['Quantity'].sum()
for tup in [('2020-09-23', 'Item_A'), ('2020-09-23', 'Item_D')]:
if tup in g.index:
print(f'{str(tup)}: {g[tup]}')
else:
print(f'{str(tup)}: not found')