I have a Python Data Frame with different product classes counted and accumulated (cumsum) I would like to create a new column on my dataframe, that shows the Column Name of the first element of each row (left to right) that gets to an accumulated figure of 10, following the below example:
df2 = pd.DataFrame(np.array([['NY',2, 4, 8,10,15,20,36], ['LA',4, 5, 6,7,11,25,29,], ['BC',7, 8, 9,10,19,29,45]]), columns=['CITY','a', 'b', 'c','d','e','f','g'])
df2
expected_result = pd.DataFrame(np.array([['NY','d'], ['LA','e'], ['BC','d']]),
columns=['CITY','min'])
expected_result
Thank you so much for helping out on this!!!
Francisco
CodePudding user response:
For first match greater or equal by 10
compare values by DataFrame.ge
and use DataFrame.idxmax
:
df = df2.set_index('CITY').astype(int).ge(10).idxmax(axis=1).reset_index(name='min')
print (df)
CITY min
0 NY d
1 LA e
2 BC d
Solution above failed if no value greater or equal, then need test if exist at least match:
df2 = pd.DataFrame(np.array([['NY',2, 4, 8,10,15,20,36],
['LA',4, 5, 6,7,11,25,29,],
['BC',1, 2, 3,4,5,6,9]]),
columns=['CITY','a', 'b', 'c','d','e','f','g'])
m = df2.set_index('CITY').astype(int).ge(10)
df = m.idxmax(axis=1).where(m.any(axis=1), 'no match').reset_index(name='min')
print (df)
CITY min
0 NY d
1 LA e
2 BC no match