Home > Software design >  Pandas DataFrame: Duplicate rows and multiply against array
Pandas DataFrame: Duplicate rows and multiply against array

Time:07-07

I am have a dataframe 'df' with a value column. I also have an array with a list of percentages. How can I multiply the percentages against the df, whilst duplicating each row depending on how many are in the array?

I have found a way to do it using the repeat method, but this static and clunky. Is there a better way to this?

Below is an example

array = [0.2, 0.5, 0.7]

df

name value
name1 20
name2 10

Expected Output

name value
name1 4
name1 10
name1 14
name2 2
name2 5
name2 7

CodePudding user response:

You can use pandas.Index.repeat to repeat the rows as many times as you have values is array, then numpy.tile to repeat the array as many times as you have rows and multiply:

out = (df
 .loc[df.index.repeat(len(array))]
 .assign(value=lambda d: d['value'].mul(np.tile(array, len(df))))
 #.reset_index(drop=True) # uncomment if a normal index is needed
)

Or using numpy:

out = (df
 .loc[df.index.repeat(len(array))]
 .assign(value=(df['value'].to_numpy()[:,None]*array).ravel())
)

output:

    name  value
0  name1    4.0
0  name1   10.0
0  name1   14.0
1  name2    2.0
1  name2    5.0
1  name2    7.0

alternative using merge

This is less elegant in my opinion as you create an intermediate column

out = (df
 .merge(pd.Series(array, name='factor'), how='cross')
 .eval('value = value*factor')
 .drop(columns='factor')
)

CodePudding user response:

You can use numpy.repeat on dataframe and numpy.tile on array then use numpy.column_stack for creating new_df and use for computing.

df = pd.DataFrame({'name':['name1','name2'],'value': [20,10]})
a = np.array([0.2,0.5,0.7])
res = pd.DataFrame(np.column_stack((np.repeat(df.values, len(a), axis=0), 
                                    np.tile(a, len(df)))), columns=['name','value', 'mult'])
res['value'] = res['value']*res['mult']
res= res.drop('mult', axis=1)
print(res)

Output:

    name value
0  name1   4.0
1  name1  10.0
2  name1  14.0
3  name2   2.0
4  name2   5.0
5  name2   7.0
  • Related