Home > Software design >  Pandas - replicate rows with new column value from a list for each replication
Pandas - replicate rows with new column value from a list for each replication

Time:04-20

So I have a data frame that has two columns, State and Cost, and a separate list of new "what-if" costs

State   Cost
A       2
B       9
C       8
D       4

New_Cost_List = [1, 5, 10]

I'd like to replicate all the rows in my data set for each value of New_Cost, adding a new column for each New_Cost for each state.

State   Cost   New_Cost
A       2      1
B       9      1
C       8      1
D       4      1
A       2      5
B       9      5
C       8      5
D       4      5
A       2      10
B       9      10
C       8      10
D       4      10

I thought a for loop might be appropriate to iterate through, replicating my dataset for the length of the list and adding the values of the list as a new column:

for v in New_Cost_List:
    df_new = pd.DataFrame(np.repeat(df.values, len(New_Cost_List), axis=0))
    df_new.columns = df.columns
    df_new['New_Cost'] = v

The output of this gives me the correct replication of State and Cost but the New_Cost value is 10 for each row. Clearly I'm not connecting how to get it to run through the list for each replicated set, so any suggestions? Or is there a better way to approach this?

EDIT 1

Reducing the number of values in the New_Cost_List from 4 to 3 so there's a difference in row count and length of the list.

CodePudding user response:

Here is a way using the keys paramater of pd.concat():

(pd.concat([df]*len(New_Cost_List),
           keys = New_Cost_List,
           names = ['New_Cost',None])
 .reset_index(level=0))

Output:

   New_Cost State  Cost
0         1     A     2
1         1     B     9
2         1     C     8
3         1     D     4
0         5     A     2
1         5     B     9
2         5     C     8
3         5     D     4
0        10     A     2
1        10     B     9
2        10     C     8
3        10     D     4

CodePudding user response:

If i understand your question correctly, this should solve your problem.

df['New Cost'] = new_cost_list
df = pd.concat([df]*len(new_cost_list), ignore_index=True)

Output:

    State   Cost    New Cost
0   A   2   1
1   B   9   5
2   C   8   10
3   D   4   15
4   A   2   1
5   B   9   5
6   C   8   10
7   D   4   15
8   A   2   1
9   B   9   5
10  C   8   10
11  D   4   15
12  A   2   1
13  B   9   5
14  C   8   10
15  D   4   15

CodePudding user response:

You can use index.repeat and numpy.tile:

df2 = (df
 .loc[df.index.repeat(len(New_Cost_List))]
 .assign(**{'New_Cost': np.repeat(New_Cost_List, len(df))})
)

or, simply, with a cross merge:

df2 = df.merge(pd.Series(New_Cost_List, name='New_Cost'), how='cross')

output:

  State  Cost  New_Cost
0     A     2         1
0     A     2         5
0     A     2        10
1     B     9         1
1     B     9         5
1     B     9        10
2     C     8         1
2     C     8         5
2     C     8        10
3     D     4         1
3     D     4         5
3     D     4        10

For the provided order:

(df
 .merge(pd.Series(New_Cost_List, name='New_Cost'), how='cross')
 .sort_values(by='New_Cost', kind='stable')
 .reset_index(drop=True)
)

output:

   State  Cost  New_Cost
0      A     2         1
1      B     9         1
2      C     8         1
3      D     4         1
4      A     2         5
5      B     9         5
6      C     8         5
7      D     4         5
8      A     2        10
9      B     9        10
10     C     8        10
11     D     4        10
  • Related