Home > database >  Iteration with for loop using round returns "TypeError: 'int' object is not callable&
Iteration with for loop using round returns "TypeError: 'int' object is not callable&

Time:02-21

I have a huge df that looks like this:

date stock1 stock2 stock3 stock4 stock5 stock6 stock7 stock8 stock9 stock10
10/20 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.9
11/20 0.8 0.9 0.3 0.4 0.3 0.5 0.3 0.2 0.4 0.1
12/20 0.3 0.6 0.9 0.5 0.6 0.7 0.8 0.7 0.9 0.1

I want to find, for each row, the 20% higher values of stocks and the 20% lower. The output should be:

date higher lower
10/20 stock9, stock 10 stock1, stock 2
11/20 stock1, stock 2 stock8, stock 10
12/20 stock3, stock 9 stock1, stock 10

My code looks now like this:

l= df.count(1)
rank = pd.DataFrame()
for i in len(l):
      rank[i, "winners"] = df.apply(lambda x: x.nlargest(round(l[i-1]*0.2).index.tolist(), axis=1)
      rank[i, "losers"] = df.apply(lambda x: x.nsmallest(round(l[i-1]*0.2).index.tolist(), axis=1)

and I get the error "TypeError: 'int' object is not callable". I have checked the dtype of l: Freq: M, Length: 344, dtype: float64. What can I be doing wrong?

CodePudding user response:

Use:

df = pd.DataFrame({'stock1':range(10), 'stock2':range(10,20), 'stock3':range(10,20), 'stock4':range(10,20), 'stock5':range(50,60)})

colnum = int(.2*len(df.columns))
df.apply(lambda x: x.nlargest(colnum).index.tolist(), axis=1)

input df:

enter image description here

output:

enter image description here

Based on your comment:

df = pd.DataFrame({f'stock{i}':np.random.randint(1,10,10) for i in range(2000)})
colnum = int(.2*len(df.columns))
df.apply(lambda x: x.nlargest(colnum).index.tolist(), axis=1)

input df with 2000 columns:

enter image description here

output:

enter image description here

Based on the other comment, with possible nans:

df = pd.DataFrame({f'stock{i}':np.random.randint(1,10,10) for i in 

range(10)})
df2 = pd.DataFrame({f'stock{i}':np.random.randint(1,10,1) for i in range(7)})
df3 = df.append(df2)
df3 = df3.astype(np.float64)
df3.apply(lambda x: x.nlargest(int(.2*np.sum([1 for y in x if not np.isnan(y)]))).index.tolist(), axis=1)

input df:

enter image description here

output:

enter image description here

a little bit better:

df3.apply(lambda x: x.nlargest(int(.2*x.notna().sum())).index.tolist(), axis=1)

CodePudding user response:

Use DataFrame.assign for new columns:

df1 = df.set_index('date')
i= int(round(len(df1.columns)*0.2))

df1 = df1.assign(winners = df1.apply(lambda x: x.nlargest(i).index.tolist(), axis=1),
                 losers = df1.apply(lambda x: x.nsmallest(i).index.tolist(), axis=1))
     
rank = df1[['winners','losers']].reset_index()                             
print (rank)
    date            winners             losers
0  10/20  [stock9, stock10]   [stock1, stock2]
1  11/20   [stock2, stock1]  [stock10, stock8]
2  12/20   [stock3, stock9]  [stock10, stock1]
  • Related