Iteration with for loop using round returns "TypeError: 'int' object is not callable&-CodePudding

I have a huge df that looks like this:

date	stock1	stock2	stock3	stock4	stock5	stock6	stock7	stock8	stock9	stock10
10/20	0.1	0.2	0.3	0.4	0.5	0.6	0.7	0.8	0.9	0.9
11/20	0.8	0.9	0.3	0.4	0.3	0.5	0.3	0.2	0.4	0.1
12/20	0.3	0.6	0.9	0.5	0.6	0.7	0.8	0.7	0.9	0.1

I want to find, for each row, the 20% higher values of stocks and the 20% lower. The output should be:

date	higher	lower
10/20	stock9, stock 10	stock1, stock 2
11/20	stock1, stock 2	stock8, stock 10
12/20	stock3, stock 9	stock1, stock 10

My code looks now like this:

l= df.count(1)
rank = pd.DataFrame()
for i in len(l):
      rank[i, "winners"] = df.apply(lambda x: x.nlargest(round(l[i-1]*0.2).index.tolist(), axis=1)
      rank[i, "losers"] = df.apply(lambda x: x.nsmallest(round(l[i-1]*0.2).index.tolist(), axis=1)

and I get the error "TypeError: 'int' object is not callable". I have checked the dtype of l: Freq: M, Length: 344, dtype: float64. What can I be doing wrong?

CodePudding user response：

Use:

df = pd.DataFrame({'stock1':range(10), 'stock2':range(10,20), 'stock3':range(10,20), 'stock4':range(10,20), 'stock5':range(50,60)})

colnum = int(.2*len(df.columns))
df.apply(lambda x: x.nlargest(colnum).index.tolist(), axis=1)

input df:

output:

Based on your comment:

df = pd.DataFrame({f'stock{i}':np.random.randint(1,10,10) for i in range(2000)})
colnum = int(.2*len(df.columns))
df.apply(lambda x: x.nlargest(colnum).index.tolist(), axis=1)

input df with 2000 columns:

output:

Based on the other comment, with possible nans:

df = pd.DataFrame({f'stock{i}':np.random.randint(1,10,10) for i in 

range(10)})
df2 = pd.DataFrame({f'stock{i}':np.random.randint(1,10,1) for i in range(7)})
df3 = df.append(df2)
df3 = df3.astype(np.float64)
df3.apply(lambda x: x.nlargest(int(.2*np.sum([1 for y in x if not np.isnan(y)]))).index.tolist(), axis=1)

input df:

output:

a little bit better:

df3.apply(lambda x: x.nlargest(int(.2*x.notna().sum())).index.tolist(), axis=1)

CodePudding user response：

Use DataFrame.assign for new columns:

df1 = df.set_index('date')
i= int(round(len(df1.columns)*0.2))

df1 = df1.assign(winners = df1.apply(lambda x: x.nlargest(i).index.tolist(), axis=1),
                 losers = df1.apply(lambda x: x.nsmallest(i).index.tolist(), axis=1))
     
rank = df1[['winners','losers']].reset_index()                             
print (rank)
    date            winners             losers
0  10/20  [stock9, stock10]   [stock1, stock2]
1  11/20   [stock2, stock1]  [stock10, stock8]
2  12/20   [stock3, stock9]  [stock10, stock1]