Home > Net >  How to use loop for columns in a dataframe in Python
How to use loop for columns in a dataframe in Python

Time:10-29

I have the following code, but it is a bit repetitive. Can anyone please tell me how to put these statements inside a loop. The data frame consists of columns from T1 to T12 and P1 to P12. Here, I have shown only 3 statments for each condition.

I have the following code, but it is a bit repetitive. Can anyone please tell me how to put these statements inside a loop. The data frame consists of columns from T1 to T12 and P1 to P12. Here, I have shown only 3 statments for each condition.

first condition

df['P1'] = df.apply(lambda row: 1 if row['P1'] >= 5 * row['T1'] else 0, axis=1)
df['P2'] = df.apply(lambda row: 1 if row['P2'] >= 5 * row['T2'] else 0, axis=1)
df['P3'] = df.apply(lambda row: 1 if row['P3'] >= 5 * row['T3'] else 0, axis=1)

second condition

df['T1'] = df.apply(lambda row: 1 if row['T1'] >= 7 else 0, axis=1)
df['T2'] = df.apply(lambda row: 1 if row['T2'] >= 7 else 0, axis=1)
df['T3'] = df.apply(lambda row: 1 if row['T3'] >= 7 else 0, axis=1)

third condition

df['G1'] = np.where((df['T1']==1) & (df['P1'] ==1), 1, 0)
df['G2'] = np.where((df['T2']==1) & (df['P2'] ==1), 1, 0)
df['G3'] = np.where((df['T3']==1) & (df['P3'] ==1), 1, 0)

CodePudding user response:

When trying to roll up code into loops, look for what changes, and how it does it. Here's your first three statements:

df['P1'] = df.apply(lambda row: 1 if row['P1'] >= 5 * row['T1'] else 0, axis=1)
df['P2'] = df.apply(lambda row: 1 if row['P2'] >= 5 * row['T2'] else 0, axis=1)
df['P3'] = df.apply(lambda row: 1 if row['P3'] >= 5 * row['T3'] else 0, axis=1)

Let's simplify this for a moment:

df['P1'] = func("P1", "T1")
df["P2"] = func("P2", "T2")

I've removed everything which doesn't change. What does are these string parameters, which go up by 1 every time. So we can roll it up like this:

for i in range(1,4): # integer between 1 and 3, not 4!
    p = f"p{i}"
    t = f"t{i}"
    print("p", p, "t", t)

Do actually run this code! Do you see how it makes the parameters we need? So now we're home and dry:

for i in range(1, 3):
    p = f"p{1}"
    t = f"t{1}"
    df[p] = df.apply(lambda row: 1 if row[p] >= 5 * row[t] else 0, axis=1)

Your other conditions can all be rolled up similarly.


In this case, as the other answer notes, it's possible to write this loop implicitly in pandas. (I.e. to vectorise it: whether this actually runs a loop or is properly vectorised I don't know, because I've not looked at how pandas is implemented.) But it's still worth understanding how to do it explicitly.

CodePudding user response:

Dont use loops in apply, because here exist vectorized faster alternatives - compare all filtered rows and cast True, False to 1,0 by converting mask to integers:

np.random.seed(2021)
df = pd.DataFrame(np.random.randint(10, size=(5, 9)))
df.columns = ['P1','P2','P3','T1','T2','T3','G1','G2','G3']
print (df)
   P1  P2  P3  T1  T2  T3  G1  G2  G3
0   4   5   9   0   6   5   8   6   6
1   6   6   1   5   7   1   1   5   2
2   0   3   1   0   2   6   4   8   5
3   1   6   7   5   6   9   5   6   9
4   2   4   3   9   2   8   5   3   1

#for filter all columns starting by P, T, G
p = df.filter(regex='^P').columns
t = df.filter(regex='^T').columns
g = df.filter(regex='^G').columns

#for filter all columns by lists
p = ['P1','P2','P3']
t = ['T1','T2','T3']
g = ['G1','G2','G3']

df[p] = (df[p] >= 5 * df[t].to_numpy()).astype(int)

df[t] = (df[t] >= 7).astype(int)

df[g] = ((df[t] == 1) & (df[p].to_numpy() == 1)).astype(int)
print (df)
   P1  P2  P3  T1  T2  T3  G1  G2  G3
0   1   0   0   0   0   0   0   0   0
1   0   0   0   0   1   0   0   0   0
2   1   0   0   0   0   0   0   0   0
3   0   0   0   0   0   1   0   0   0
4   0   0   0   1   0   1   0   0   0

CodePudding user response:

I think you can try this..

indexP = ['P1','P2','P3']
indexT = ['T1','T2','T3']
indexG = ['G1','G2','G3']

for x in indexP :
    df[x] = (df[x] >= 5 * df[x])
for y in indexT :
    df[y] = (df[y] >= 7)
for z in indexG
    df[z] = ((df[z]==1) & (df[z] ==1))
  • Related