I have a dataframe that looks something like this
df = pd.DataFrame({'year':[23,23,23,23,23,23], 'month':[1,1,1,2,3,3], 'utility':['A','A','B','A','A','B'], 'state':['NY','NJ','CA','NJ','NY','CA']})
year month utility state
0 23 1 A NY
1 23 1 A NJ
2 23 1 B CA
3 23 2 A NJ
4 23 3 A NY
5 23 3 B CA
And I would like to create new rows for utilities-state combinations with missing months. So the new dataframe would look something like this
year month utility state
0 23 1 A NY
1 23 1 A NJ
2 23 1 B CA
3 23 2 A NY
4 23 2 A NJ
5 23 2 B CA
6 23 3 A NY
7 23 3 A NJ
8 23 3 B CA
I know that I can use a MultiIndex and then reindex, but using the from_product() method results in utility-state combinations not present in the original df (I do not want a utility A - CA combination, for example).
I thought about concatenating the utility and state columns and then getting the cartesian product from that, but I think there must be a simpler method.
CodePudding user response:
A possible solution:
cols = ['utility', 'state']
d1 = df.drop_duplicates(cols)
d2 = df.drop_duplicates(['year', 'month'])
d2.assign(**{x: [d1[x].to_list()] * len(d2) for x in cols}).explode(cols)
Output:
year month utility state
0 23 1 A NY
0 23 1 A NJ
0 23 1 B CA
3 23 2 A NY
3 23 2 A NJ
3 23 2 B CA
4 23 3 A NY
4 23 3 A NJ
4 23 3 B CA
I was wondering whether a solution using numpy broadcasting
would be possible, and it is:
cols1, cols2 = ['year', 'month'], ['utility', 'state']
(pd.DataFrame(
np.vstack(np.concatenate(
np.broadcast_arrays(
df[cols1].drop_duplicates(cols1).values[:,None],
df[cols2].drop_duplicates(cols2).values), axis=2)),
columns=df.columns))
CodePudding user response:
One option is with DataFrame.complete
from pyjanitor. For your data, you are basically doing a combination of (year, month) and (utility, state):
# pip install pyjanitor
import janitor
df.complete(('year', 'month'), ('utility', 'state'))
year month utility state
0 23 1 A NY
1 23 1 A NJ
2 23 1 B CA
3 23 2 A NY
4 23 2 A NJ
5 23 2 B CA
6 23 3 A NY
7 23 3 A NJ
8 23 3 B CA
@Timeless, undelete your code and I'll delete mine. You had a good start, and I edited your code to make it simpler.