I have the following two tables:
>>> df1 = pd.DataFrame(data={'1': ['john', '10', 'john'],
... '2': ['mike', '30', 'ana'],
... '3': ['ana', '20', 'mike'],
... '4': ['eve', 'eve', 'eve'],
... '5': ['10', np.NaN, '10'],
... '6': [np.NaN, np.NaN, '20']},
... index=pd.Series(['ind1', 'ind2', 'ind3'], name='index'))
>>> df1
1 2 3 4 5 6
index
ind1 john mike ana eve 10 NaN
ind2 10 30 20 eve NaN NaN
ind3 john ana mike eve 10 20
df2 = pd.DataFrame(data={'first_n': [4, 4, 3]},
index=pd.Series(['ind1', 'ind2', 'ind3'], name='index'))
>>> df2
first_n
index
ind1 4
ind2 4
ind3 3
I also have the following function that reverses a list and gets the first n
non-NA elements:
def get_rev_first_n(row, top_n):
rev_row = [x for x in row[::-1] if x == x]
return rev_row[:top_n]
>>> get_rev_first_n(['john', 'mike', 'ana', 'eve', '10', np.NaN], 4)
['10', 'eve', 'ana', 'mike']
How would I apply this function to the two tables so that it takes in both df1
and df2
and outputs either a list or columns?
CodePudding user response:
You can try apply
with lambda on each row of the data frame, I just concatenate the two df's using concat
and applied your method to each row of the resulted dataframe.
Full Code:
import pandas as pd
import numpy as np
def get_rev_first_n(row, top_n):
rev_row = [x for x in row[::-1] if x == x]
return rev_row[1:top_n]
df1 = pd.DataFrame(data={'1': ['john', '10', 'john'],
'2': ['mike', '30', 'ana'],
'3': ['ana', '20', 'mike'],
'4': ['eve', 'eve', 'eve'],
'5': ['10', np.NaN, '10'],
'6': [np.NaN, np.NaN, '20']},
index=pd.Series(['ind1', 'ind2', 'ind3'], name='index'))
df2 = pd.DataFrame(data={'first_n': [4, 4, 3]},
index=pd.Series(['ind1', 'ind2', 'ind3'], name='index'))
df3 = pd.concat([df1, df2.reindex(df1.index)], axis=1)
df = df3.apply(lambda row : get_rev_first_n(row, row['first_n']), axis = 1)
print(df)
Output:
index
ind1 [10, eve, ana]
ind2 [eve, 20, 30]
ind3 [20, 10]
dtype: object
CodePudding user response:
df=pd.concat([df1,df2],axis=1)
df.apply(get_rev_first_n,args=[4]) #send args as top_in