I have df with Weeks, Months and Years.
week = ['01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022']
month = ["January", "January", "January", "January"]
year = [2022, 2022, 2022, 2022]
myDict = {}
myDict["Week"] = week
myDict["Month"] = month
myDict["Year"] = year
dates_df = pd.DataFrame(data=myDict)
dates_df
Week Month Year
01/03/2022 - 01/09/2022 January 2022
01/10/2022 - 01/16/2022 January 2022
01/17/2022 - 01/23/2022 January 2022
01/24/2022 - 01/30/2022 January 2022
Data looks like that (Previously Grouped):
test_data = {'CLient Id': [1,1,1,1,2,2,2,3,3],
'Client Name': ['Tom Holland', 'Tom Holland', 'Tom Holland', 'Tom Holland', 'Brad Pitt', 'Brad Pitt', 'Brad Pitt', 'Anna Delvey', 'Anna Delvey'],
'City': ['New York', 'New York', 'New York', 'New York', 'Los Angeles', 'Los Angeles', 'Los Angeles', 'New York', 'New York'],
'Week': ['01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/17/2022 - 01/23/2022', '01/24/2022 - 01/30/2022',
'01/03/2022 - 01/09/2022', '01/10/2022 - 01/16/2022', '01/24/2022 - 01/30/2022', '01/03/2022 - 01/09/2022',
'01/24/2022 - 01/30/2022'],
'Month': ['January', 'January', 'January', 'January', 'January', 'January', 'January', 'January', 'January'],
'Year': [2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022, 2022],
'Spent': [1000, 750, 1200, 850, 777, 1500, 1200, 1500, 1200]}
test_df = pd.DataFrame(data=test_data)
test_df
Client Id Client Name City Week Month Year Spent
1 Tom Holland New York 01/03/2022 - 01/09/2022 January 2022 1000
1 Tom Holland New York 01/10/2022 - 01/16/2022 January 2022 750
1 Tom Holland New York 01/17/2022 - 01/23/2022 January 2022 1200
1 Tom Holland New York 01/24/2022 - 01/30/2022 January 2022 850
2 Brad Pitt Los Angeles 01/03/2022 - 01/09/2022 January 2022 777
2 Brad Pitt Los Angeles 01/10/2022 - 01/16/2022 January 2022 1500
2 Brad Pitt Los Angeles 01/24/2022 - 01/30/2022 January 2022 1200
3 Anna Delvey New York 01/03/2022 - 01/09/2022 January 2022 1500
3 Anna Delvey New York 01/24/2022 - 01/30/2022 January 2022 1200
I need to create additional rows of Client Name, City, Week, Month and Year for each Client (Id) which missing Week, from the date df Spent column in this case should be 0.
Output I need:
Client Id Client Name City Week Month Year Spent
1 Tom Holland New York 01/03/2022 - 01/09/2022 January 2022 1000
1 Tom Holland New York 01/10/2022 - 01/16/2022 January 2022 750
1 Tom Holland New York 01/17/2022 - 01/23/2022 January 2022 1200
1 Tom Holland New York 01/24/2022 - 01/30/2022 January 2022 850
2 Brad Pitt Los Angeles 01/03/2022 - 01/09/2022 January 2022 777
2 Brad Pitt Los Angeles 01/10/2022 - 01/16/2022 January 2022 1500
2 Brad Pitt Los Angeles 01/17/2022 - 01/23/2022 January 2022 0
2 Brad Pitt Los Angeles 01/24/2022 - 01/30/2022 January 2022 1200
3 Anna Delvey New York 01/03/2022 - 01/09/2022 January 2022 1500
3 Anna Delvey New York 01/10/2022 - 01/16/2022 January 2022 0
3 Anna Delvey New York 01/17/2022 - 01/23/2022 January 2022 0
3 Anna Delvey New York 01/24/2022 - 01/30/2022 January 2022 1200
I was trying to code it, but got no luck.
CodePudding user response:
You could pivot
reindex
fillna
(to get the missing data) stack
(to get back to the previous shape):
columns = ['Week','Month','Year']
out = (test_df.pivot(['CLient Id', 'Client Name', 'City'], columns, ['Spent'])
.reindex(pd.MultiIndex.from_arrays(dates_df.assign(Spent='Spent').to_numpy()[:, [-1,0,1,2]].T,
names=[None] columns), axis=1)
.fillna(0).stack(level=columns).reset_index())
Output:
CLient Id Client Name City Week Month Year Spent
0 1 Tom Holland New York 01/03/2022 - 01/09/2022 January 2022 1000.0
1 1 Tom Holland New York 01/10/2022 - 01/16/2022 January 2022 750.0
2 1 Tom Holland New York 01/17/2022 - 01/23/2022 January 2022 1200.0
3 1 Tom Holland New York 01/24/2022 - 01/30/2022 January 2022 850.0
4 2 Brad Pitt Los Angeles 01/03/2022 - 01/09/2022 January 2022 777.0
5 2 Brad Pitt Los Angeles 01/10/2022 - 01/16/2022 January 2022 1500.0
6 2 Brad Pitt Los Angeles 01/17/2022 - 01/23/2022 January 2022 0.0
7 2 Brad Pitt Los Angeles 01/24/2022 - 01/30/2022 January 2022 1200.0
8 3 Anna Delvey New York 01/03/2022 - 01/09/2022 January 2022 1500.0
9 3 Anna Delvey New York 01/10/2022 - 01/16/2022 January 2022 0.0
10 3 Anna Delvey New York 01/17/2022 - 01/23/2022 January 2022 0.0
11 3 Anna Delvey New York 01/24/2022 - 01/30/2022 January 2022 1200.0
If you have more than one columns to fill with 0, you could use:
columns = ['Week','Month','Year']
value_columns = ['Spent', ...]
new_df = (pd.DataFrame({'key':[1]*len(value_columns),'New':value_columns})
.merge(dates_df.assign(key=1)).drop(columns='key'))
out = (test_df.pivot(['CLient Id', 'Client Name', 'City'], columns, value_columns)
.reindex(pd.MultiIndex.from_arrays(zip(*new_df.to_numpy()),
names=[None] columns), axis=1)
.fillna(0).stack(level=columns).reset_index())