I have a Pandas dataframe with multiple columns (suppose in the following format to simplify the example):-
Item Interval Specs
Chocolate 0 0.001
Chocolate 3 0.002
Chocolate 5 0.007
Milk 0 0.004
Milk 2 0.008
....
What I would like to do is to create duplicate rows for each of the original rows in the dataframe wherein, the Interval
value is 0
, and have this for other Interval
cases with the value of Interval
changed. For instance, the Item
Chocolate in the original dataframe has 3 cases with Interval
0, 3 and 5
. For the case wherein Interval
is 0, I would like to duplicate this rows and keep the Specs
value same, but only modify the Interval
value and create 2 duplicate copies i.e. with Interval
values 3 and 5. Similarly for the Milk Item
, I want to create 1 duplicate row with Interval changed to 2. (Number of duplicate rows to be created thus depends directly on the number of unique Interval
values for individual Item
).
The expected dataframe is:
Item Interval Specs
Chocolate 0 0.001
Chocolate 3 0.002
Chocolate 5 0.007
Chocolate 3 0.001 --Duplicate row added and Interval value changed
Chocolate 5 0.001 --Duplicate row added and Interval value changed
Milk 0 0.004
Milk 2 0.008
Milk 2 0.004 --Duplicate row added and Interval value changed
A similar question (Duplicate rows based on value with condition) on SO focuses on creating duplicate rows based on condition, but not on changing the values in the dataframe corresponding to the other rows. Any suggestions on how to accomplish this would be much appreciated.
CodePudding user response:
One way of solving this is by creating a second dataframe with all elements which do not have Interval=0
df2 = df[df.Interval != 0]
then map the values of column Specs
from the entries with Interval==0
onto column Specs
in the new dataframe:
df2.loc[:, 'Specs'] = df2['Item'].map(df[df.Interval == 0].set_index('Item')['Specs'])
and concatenate the 2 dataframes in the end
df = pd.concat([df, df2], axis=0)
This will give you the desired output.
CodePudding user response:
We can do this by first creating 2 dataframes:
- DF with rows having interval as 0
- DF with rows having interval as not 0
Then we can create a dictionary of the dataframe which has interval 0. Using this dictionary we can update the other dataframe's specs value. Then the updated dataframe is merged with the original dataframe. The code is given below-
rows = [['Chocolate', 0, 0.001],
['Chocolate', 3, 0.002],
['Chocolate', 5, 0.007],
['Milk', 0, 0.004],
['Milk', 2, 0.008]]
df = pd.DataFrame(rows, columns=['Item', 'Interval', 'Specs']) # original dataframe
df1 = df[df['Interval'].ne(0)].copy() # dataframe with rows having interval other than 0
df2 = df[df['Interval'].eq(0)].copy() # dataframe with rows having interval 0
di = pd.Series(df2['Specs'].values, index=df2['Item']).to_dict()
for key, val in di.items():
df1.loc[df1.Item == key, 'Specs'] = val
res = pd.concat([df, df1], ignore_index=True).sort_values('Item')