How to use loop for columns in a dataframe in Python-CodePudding

I have the following code, but it is a bit repetitive. Can anyone please tell me how to put these statements inside a loop. The data frame consists of columns from T1 to T12 and P1 to P12. Here, I have shown only 3 statments for each condition.

first condition

df['P1'] = df.apply(lambda row: 1 if row['P1'] >= 5 * row['T1'] else 0, axis=1)
df['P2'] = df.apply(lambda row: 1 if row['P2'] >= 5 * row['T2'] else 0, axis=1)
df['P3'] = df.apply(lambda row: 1 if row['P3'] >= 5 * row['T3'] else 0, axis=1)

second condition

df['T1'] = df.apply(lambda row: 1 if row['T1'] >= 7 else 0, axis=1)
df['T2'] = df.apply(lambda row: 1 if row['T2'] >= 7 else 0, axis=1)
df['T3'] = df.apply(lambda row: 1 if row['T3'] >= 7 else 0, axis=1)

third condition

df['G1'] = np.where((df['T1']==1) & (df['P1'] ==1), 1, 0)
df['G2'] = np.where((df['T2']==1) & (df['P2'] ==1), 1, 0)
df['G3'] = np.where((df['T3']==1) & (df['P3'] ==1), 1, 0)

CodePudding user response：

When trying to roll up code into loops, look for what changes, and how it does it. Here's your first three statements:

df['P1'] = df.apply(lambda row: 1 if row['P1'] >= 5 * row['T1'] else 0, axis=1)
df['P2'] = df.apply(lambda row: 1 if row['P2'] >= 5 * row['T2'] else 0, axis=1)
df['P3'] = df.apply(lambda row: 1 if row['P3'] >= 5 * row['T3'] else 0, axis=1)

Let's simplify this for a moment:

df['P1'] = func("P1", "T1")
df["P2"] = func("P2", "T2")

I've removed everything which doesn't change. What does are these string parameters, which go up by 1 every time. So we can roll it up like this:

for i in range(1,4): # integer between 1 and 3, not 4!
    p = f"p{i}"
    t = f"t{i}"
    print("p", p, "t", t)

Do actually run this code! Do you see how it makes the parameters we need? So now we're home and dry:

for i in range(1, 3):
    p = f"p{1}"
    t = f"t{1}"
    df[p] = df.apply(lambda row: 1 if row[p] >= 5 * row[t] else 0, axis=1)

Your other conditions can all be rolled up similarly.

In this case, as the other answer notes, it's possible to write this loop implicitly in pandas. (I.e. to vectorise it: whether this actually runs a loop or is properly vectorised I don't know, because I've not looked at how pandas is implemented.) But it's still worth understanding how to do it explicitly.

CodePudding user response：

Dont use loops in apply, because here exist vectorized faster alternatives - compare all filtered rows and cast True, False to 1,0 by converting mask to integers:

np.random.seed(2021)
df = pd.DataFrame(np.random.randint(10, size=(5, 9)))
df.columns = ['P1','P2','P3','T1','T2','T3','G1','G2','G3']
print (df)
   P1  P2  P3  T1  T2  T3  G1  G2  G3
0   4   5   9   0   6   5   8   6   6
1   6   6   1   5   7   1   1   5   2
2   0   3   1   0   2   6   4   8   5
3   1   6   7   5   6   9   5   6   9
4   2   4   3   9   2   8   5   3   1

#for filter all columns starting by P, T, G
p = df.filter(regex='^P').columns
t = df.filter(regex='^T').columns
g = df.filter(regex='^G').columns

#for filter all columns by lists
p = ['P1','P2','P3']
t = ['T1','T2','T3']
g = ['G1','G2','G3']

df[p] = (df[p] >= 5 * df[t].to_numpy()).astype(int)

df[t] = (df[t] >= 7).astype(int)

df[g] = ((df[t] == 1) & (df[p].to_numpy() == 1)).astype(int)
print (df)
   P1  P2  P3  T1  T2  T3  G1  G2  G3
0   1   0   0   0   0   0   0   0   0
1   0   0   0   0   1   0   0   0   0
2   1   0   0   0   0   0   0   0   0
3   0   0   0   0   0   1   0   0   0
4   0   0   0   1   0   1   0   0   0

CodePudding user response：

I think you can try this..

indexP = ['P1','P2','P3']
indexT = ['T1','T2','T3']
indexG = ['G1','G2','G3']

for x in indexP :
    df[x] = (df[x] >= 5 * df[x])
for y in indexT :
    df[y] = (df[y] >= 7)
for z in indexG
    df[z] = ((df[z]==1) & (df[z] ==1))