I have 2 dataframes:
df1 = pd.DataFrame.from_dict({('category', ''): {0: 'A',
1: 'B',
2: 'C',
3: 'D',
4: 'E',
5: 'F',
6: 'G'},
(pd.Timestamp('2021-06-28 00:00:00'),
'metric_1'): {0: 4120.549999999999, 1: 11226.016666666665, 2: 25049.443333333333, 3: 18261.083333333332, 4: 2553.1208333333334, 5: 2843.01, 6: 73203.51333333334},
(pd.Timestamp('2021-06-28 00:00:00'), 'metric_2'): {0: 9907.79,
1: 7614.650000000001,
2: 13775.259999999998,
3: 13158.250000000004,
4: 1457.85,
5: 1089.5600000000002,
6: 38864.9},
(pd.Timestamp('2021-07-05 00:00:00'),
'metric_1'): {0: 5817.319999999998, 1: 10799.45, 2: 23521.51, 3: 22062.350833333334, 4: 1249.5974999999999, 5: 3229.77, 6: 52796.06083333332},
(pd.Timestamp('2021-07-05 00:00:00'), 'metric_2'): {0: 6321.21,
1: 5606.01,
2: 10239.689999999999,
3: 17476.600000000002,
4: 943.7199999999999,
5: 1410.33,
6: 29645.45}}).set_index('category')
df2 = pd.DataFrame.from_dict({'category': {0: 'A',
1: 'B',
2: 'C',
3: 'D',
4: 'E',
5: 'F',
6: 'G'},
1: {0: 36234.035577957984,
1: 69078.07089184562,
2: 128879.5397517309,
3: 178376.63536908248,
4: 9293.956915067887,
5: 8184.780211399392,
6: 177480.74540313095},
2: {0: 37887.581678419825,
1: 72243.67956241772,
2: 134803.02342121338,
3: 186603.8963173654,
4: 9716.385738295368,
5: 8555.606693927,
6: 185658.87577993725}}).set_index('category')
First I change the column names of df2
to be the same as df
date_mappings = {
1 : '2021-06-28',
2 : '2021-07-05'}
df2 = df2.rename(columns=date_mappings)
Then I try to merge it
f = lambda x: pd.to_datetime(x)
df = (df2.merge(df1.unstack(), left_index=True, right_index=True).sort_index(axis=1))
But I get an error:
ValueError: Cannot merge a Series without a name
What is my mistake?
My goal is to add columns from df2
to df1
in each week like so that df1
would have 3 columns instead of 2.
After using
c = [df2.columns.map(date_mappings.get), df2.columns]
df1.join(df2.set_axis(c, axis=1)).sort_index(axis=1)
I get the values appended to the end of the dataframe rather than to the same columns with the same week naming:
Maybe this could be an issue that df2
holds dates from 2021-06-28 to 2022-06-27
while df1
holds dates from 2020 to today
.
Unwanted adding to the end of the df
CodePudding user response:
Idea is create MultiIndex in both DataFrames:
date_mappings = {
1 : '2021-06-28',
2 : '2021-07-05'}
#create MultiIndex in df2 with datetimes in first level
df2.columns = pd.MultiIndex.from_product([pd.to_datetime(df2.columns.map(date_mappings)),
['metric_3']])
#removed unused levels, here category, so possible convert first leve to datetimes
df1.columns = df1.columns.remove_unused_levels()
df1.columns = df1.columns.set_levels(pd.to_datetime(df1.columns.levels[0]), level=0)
#join together and sorting MultiIndex
df = df1.join(df2).sort_index(axis=1)
print (df)
2021-06-28 2021-07-05 \
metric_1 metric_2 metric_3 metric_1 metric_2
category
A 4120.550000 9907.79 36234.035578 5817.320000 6321.21
B 11226.016667 7614.65 69078.070892 10799.450000 5606.01
C 25049.443333 13775.26 128879.539752 23521.510000 10239.69
D 18261.083333 13158.25 178376.635369 22062.350833 17476.60
E 2553.120833 1457.85 9293.956915 1249.597500 943.72
F 2843.010000 1089.56 8184.780211 3229.770000 1410.33
G 73203.513333 38864.90 177480.745403 52796.060833 29645.45
metric_3
category
A 37887.581678
B 72243.679562
C 134803.023421
D 186603.896317
E 9716.385738
F 8555.606694
G 185658.875780
If need remove greater datetimes like maximal df1
datetimes use:
#change mapping for test
date_mappings = {
1 : '2021-06-28',
2 : '2022-07-05'}
df2.columns = pd.MultiIndex.from_product([pd.to_datetime(df2.columns.map(date_mappings)),
['metric_3']])
df1.columns = df1.columns.remove_unused_levels()
df1.columns = df1.columns.set_levels(pd.to_datetime(df1.columns.levels[0]), level=0)
df2 = df2.loc[:, df2.columns.get_level_values(0) <= df1.columns.get_level_values(0).max()]
print (df2)
2021-06-28
metric_3
category
A 36234.035578
B 69078.070892
C 128879.539752
D 178376.635369
E 9293.956915
F 8184.780211
G 177480.745403
#join together and sorting MultiIndex
df = df1.join(df2).sort_index(axis=1)
print (df)
2021-06-28 2021-07-05
metric_1 metric_2 metric_3 metric_1 metric_2
category
A 4120.550000 9907.79 36234.035578 5817.320000 6321.21
B 11226.016667 7614.65 69078.070892 10799.450000 5606.01
C 25049.443333 13775.26 128879.539752 23521.510000 10239.69
D 18261.083333 13158.25 178376.635369 22062.350833 17476.60
E 2553.120833 1457.85 9293.956915 1249.597500 943.72
F 2843.010000 1089.56 8184.780211 3229.770000 1410.33
G 73203.513333 38864.90 177480.745403 52796.060833 29645.45
CodePudding user response:
Use pd.DataFrame.reindex
pd.DataFrame.join
reindex has a convenient level parameter that allows you to expand on the index levels not present.
df1.join(df2.reindex(df1.index, level=0))
CodePudding user response:
I am not sure if this is what you want but you might need to_frame
:
f = lambda x: pd.to_datetime(x)
df = (df2.merge(df1.unstack().to_frame(), left_index=True, right_index=True).sort_index(level=0))
print(df)