Home > Net >  Generating a new variable based on the values of other variables
Generating a new variable based on the values of other variables

Time:08-15

I have the following data set

import pandas as pd
df = pd.DataFrame({"ID": [1,1,1,1,1,2,2,2,2,2],
 "TP1": [1,2,3,4,5,9,8,7,6,5],
 "TP2": [11,22,32,43,53,94,85,76,66,58],
 "TP10": [114,222,324,443,535,94,385,76,266,548],
 "count": [1,2,3,4,10,1,2,3,4,10]})
print (df)

I want a "Final" variable in the df that will be based on the ID, TP and count variable.

The final result will look like following.

import pandas as pd
import numpy as np
df = pd.DataFrame({"ID": [1,1,1,1,1,2,2,2,2,2], "TP1": [1,2,3,4,5,9,8,7,6,5],
                   "TP2": [11,22,32,43,53,94,85,76,66,58], "TP10": [114,222,324,443,535,94,385,76,266,548],
                   "count": [1,2,3,4,10,1,2,3,4,10],
                   "final" : [1,22,np.nan,np.nan,535,9,85,np.nan,np.nan,548]})
print (df)

So for example, the loop of if will do the following

  1. It will look at the ID
  2. Then for 1st ID it should look at value of count, if the value of count is 1
  3. Then if should look at the variable TP1 and its 1st value should be placed in "final" variable.

The look will then look at count 2 for ID 1 and the value of TP2 should come in the "final" variable and so on.

I hope my question is clear. I am looking for a loop because there are 1000 TP variables in the original dataset.

I tried to make a code something like the following but it is utterly rubbish.

for col in df.columns:
    if col.startswith('TP') and count == int(col[2:])
        df["Final"] = count

Thanks

CodePudding user response:

If my understanding is correct, if count=1 then pick TP1, if count=2 then pick TP2 etc.

This can be done with numpy.select(). Note that I have added the condition if f"TP{x}" in df.columns because not all columns TP1, TP2, TP3, ... TP10 are available in the dataframe. If all are available in your actual dataframe then this if statement is not required.

import numpy as np

conds = [df["count"] == x for x in range(1,11) if f"TP{x}" in df.columns]
output = [df[f"TP{x}"] for x in range(1,11) if f"TP{x}" in df.columns]
df["final"] = np.select(conds, output, np.nan)

print(df)

Output:

   ID  TP1  TP2  TP10  count  final
0   1    1   11   114      1    1.0
1   1    2   22   222      2   22.0
2   1    3   32   324      3    NaN
3   1    4   43   443      4    NaN
4   1    5   53   535     10  535.0
5   2    9   94    94      1    9.0
6   2    8   85   385      2   85.0
7   2    7   76    76      3    NaN
8   2    6   66   266      4    NaN
9   2    5   58   548     10  548.0
  • Related