I have the following code, but it is a bit repetitive. Can anyone please tell me how to put these statements inside a loop. The data frame consists of columns from T1 to T12 and P1 to P12. Here, I have shown only 3 statments for each condition.
I have the following code, but it is a bit repetitive. Can anyone please tell me how to put these statements inside a loop. The data frame consists of columns from T1 to T12 and P1 to P12. Here, I have shown only 3 statments for each condition.
first condition
df['P1'] = df.apply(lambda row: 1 if row['P1'] >= 5 * row['T1'] else 0, axis=1)
df['P2'] = df.apply(lambda row: 1 if row['P2'] >= 5 * row['T2'] else 0, axis=1)
df['P3'] = df.apply(lambda row: 1 if row['P3'] >= 5 * row['T3'] else 0, axis=1)
second condition
df['T1'] = df.apply(lambda row: 1 if row['T1'] >= 7 else 0, axis=1)
df['T2'] = df.apply(lambda row: 1 if row['T2'] >= 7 else 0, axis=1)
df['T3'] = df.apply(lambda row: 1 if row['T3'] >= 7 else 0, axis=1)
third condition
df['G1'] = np.where((df['T1']==1) & (df['P1'] ==1), 1, 0)
df['G2'] = np.where((df['T2']==1) & (df['P2'] ==1), 1, 0)
df['G3'] = np.where((df['T3']==1) & (df['P3'] ==1), 1, 0)
CodePudding user response:
When trying to roll up code into loops, look for what changes, and how it does it. Here's your first three statements:
df['P1'] = df.apply(lambda row: 1 if row['P1'] >= 5 * row['T1'] else 0, axis=1)
df['P2'] = df.apply(lambda row: 1 if row['P2'] >= 5 * row['T2'] else 0, axis=1)
df['P3'] = df.apply(lambda row: 1 if row['P3'] >= 5 * row['T3'] else 0, axis=1)
Let's simplify this for a moment:
df['P1'] = func("P1", "T1")
df["P2"] = func("P2", "T2")
I've removed everything which doesn't change. What does are these string parameters, which go up by 1 every time. So we can roll it up like this:
for i in range(1,4): # integer between 1 and 3, not 4!
p = f"p{i}"
t = f"t{i}"
print("p", p, "t", t)
Do actually run this code! Do you see how it makes the parameters we need? So now we're home and dry:
for i in range(1, 3):
p = f"p{1}"
t = f"t{1}"
df[p] = df.apply(lambda row: 1 if row[p] >= 5 * row[t] else 0, axis=1)
Your other conditions can all be rolled up similarly.
In this case, as the other answer notes, it's possible to write this loop implicitly in pandas. (I.e. to vectorise it: whether this actually runs a loop or is properly vectorised I don't know, because I've not looked at how pandas is implemented.) But it's still worth understanding how to do it explicitly.
CodePudding user response:
Dont use loops in apply
, because here exist vectorized faster alternatives - compare all filtered rows and cast True, False
to 1,0
by converting mask to integers:
np.random.seed(2021)
df = pd.DataFrame(np.random.randint(10, size=(5, 9)))
df.columns = ['P1','P2','P3','T1','T2','T3','G1','G2','G3']
print (df)
P1 P2 P3 T1 T2 T3 G1 G2 G3
0 4 5 9 0 6 5 8 6 6
1 6 6 1 5 7 1 1 5 2
2 0 3 1 0 2 6 4 8 5
3 1 6 7 5 6 9 5 6 9
4 2 4 3 9 2 8 5 3 1
#for filter all columns starting by P, T, G
p = df.filter(regex='^P').columns
t = df.filter(regex='^T').columns
g = df.filter(regex='^G').columns
#for filter all columns by lists
p = ['P1','P2','P3']
t = ['T1','T2','T3']
g = ['G1','G2','G3']
df[p] = (df[p] >= 5 * df[t].to_numpy()).astype(int)
df[t] = (df[t] >= 7).astype(int)
df[g] = ((df[t] == 1) & (df[p].to_numpy() == 1)).astype(int)
print (df)
P1 P2 P3 T1 T2 T3 G1 G2 G3
0 1 0 0 0 0 0 0 0 0
1 0 0 0 0 1 0 0 0 0
2 1 0 0 0 0 0 0 0 0
3 0 0 0 0 0 1 0 0 0
4 0 0 0 1 0 1 0 0 0
CodePudding user response:
I think you can try this..
indexP = ['P1','P2','P3']
indexT = ['T1','T2','T3']
indexG = ['G1','G2','G3']
for x in indexP :
df[x] = (df[x] >= 5 * df[x])
for y in indexT :
df[y] = (df[y] >= 7)
for z in indexG
df[z] = ((df[z]==1) & (df[z] ==1))