I am have a dataframe 'df' with a value column. I also have an array with a list of percentages. How can I multiply the percentages against the df, whilst duplicating each row depending on how many are in the array?
I have found a way to do it using the repeat method, but this static and clunky. Is there a better way to this?
Below is an example
array = [0.2, 0.5, 0.7]
df
name value
name1 20
name2 10
Expected Output
name value
name1 4
name1 10
name1 14
name2 2
name2 5
name2 7
CodePudding user response:
You can use pandas.Index.repeat
to repeat the rows as many times as you have values is array, then numpy.tile
to repeat the array as many times as you have rows and multiply:
out = (df
.loc[df.index.repeat(len(array))]
.assign(value=lambda d: d['value'].mul(np.tile(array, len(df))))
#.reset_index(drop=True) # uncomment if a normal index is needed
)
Or using numpy:
out = (df
.loc[df.index.repeat(len(array))]
.assign(value=(df['value'].to_numpy()[:,None]*array).ravel())
)
output:
name value
0 name1 4.0
0 name1 10.0
0 name1 14.0
1 name2 2.0
1 name2 5.0
1 name2 7.0
alternative using merge
This is less elegant in my opinion as you create an intermediate column
out = (df
.merge(pd.Series(array, name='factor'), how='cross')
.eval('value = value*factor')
.drop(columns='factor')
)
CodePudding user response:
You can use numpy.repeat
on dataframe
and numpy.tile
on array
then use numpy.column_stack
for creating new_df
and use for computing.
df = pd.DataFrame({'name':['name1','name2'],'value': [20,10]})
a = np.array([0.2,0.5,0.7])
res = pd.DataFrame(np.column_stack((np.repeat(df.values, len(a), axis=0),
np.tile(a, len(df)))), columns=['name','value', 'mult'])
res['value'] = res['value']*res['mult']
res= res.drop('mult', axis=1)
print(res)
Output:
name value
0 name1 4.0
1 name1 10.0
2 name1 14.0
3 name2 2.0
4 name2 5.0
5 name2 7.0