I have a pandas dataframe and I want to create a new column BB based on the below condition.
- Create a new column BB, if the values in column TGR1 is 0, assign 0 to BB else,
- The value in TGR1 is not 0, look up the columns ('1','2','3') that corresponds with the value in TGR1 assign the value in that column(either '1','2','3') to the new column BB.
I was able to achieve the first step using
df.loc[df['TGR1'] == 0, 'BB'] = 0
I also tried to use np.where to come up with but I can figure out the right way to go about this.
df['BB'] = np.where(df.TGR1 == 0,0, df.columns == test.TGR1.value )
Dist Track EVENT_ID Date 1 2 3 TGR1 TGR2
311m Cran 174331755 2020-10-19 34.00 5.18 19.10 1 0
311m Cran 174331755 2020-10-19 34.00 5.18 19.10 2 1
311m Cran 174331755 2020-10-19 34.00 5.18 19.10 0 2
311m Cran 174331755 2020-10-19 34.00 5.18 19.10 3 1
311m Cran 174331755 2020-10-19 34.00 5.18 19.10 2 2
311m Cran 174331755 2020-10-19 34.00 5.18 19.10 1 2
Expected Output:
Dist Track EVENT_ID Date 1 2 3 TGR1 TGR2 BB
311m Cran 174331755 2020-10-19 34.00 5.18 19.10 1 0 34.00
311m Cran 174331755 2020-10-19 34.00 5.18 19.10 2 1 5.18
311m Cran 174331755 2020-10-19 34.00 5.18 19.10 0 2 0
311m Cran 174331755 2020-10-19 34.00 5.18 19.10 3 1 19.10
CodePudding user response:
One way is to use numpy advanced indexing:
import numpy as np
# extract columns 1,2,3 into a numpy array with a zeros column stacked on the left
vals = np.column_stack((np.zeros(len(df)), df[list('123')]))
vals
array([[ 0. , 34. , 5.18, 19.1 ],
[ 0. , 34. , 5.18, 19.1 ],
[ 0. , 34. , 5.18, 19.1 ],
[ 0. , 34. , 5.18, 19.1 ],
[ 0. , 34. , 5.18, 19.1 ],
[ 0. , 34. , 5.18, 19.1 ]])
# use TGR1 values as the column index to extract corresponding values
df['BB'] = vals[np.arange(len(df)), df.TGR1.values]
df
Dist Track EVENT_ID Date 1 2 3 TGR1 TGR2 BB
0 311m Cran 174331755 2020-10-19 34.0 5.18 19.1 1 0 34.00
1 311m Cran 174331755 2020-10-19 34.0 5.18 19.1 2 1 5.18
2 311m Cran 174331755 2020-10-19 34.0 5.18 19.1 0 2 0.00
3 311m Cran 174331755 2020-10-19 34.0 5.18 19.1 3 1 19.10
4 311m Cran 174331755 2020-10-19 34.0 5.18 19.1 2 2 5.18
5 311m Cran 174331755 2020-10-19 34.0 5.18 19.1 1 2 34.00
CodePudding user response:
Here you can try to play some numpy trick as in this answer.
We first define a matrix with values from columns 1,2 and 3 and add a first column with zeros.
import pandas as pd
import numpy as np
# we first define a matrix
# with len(df) rows and 4 columns
mat = np.zeros((len(df), 4))
# Then we fill the last 3 columns
# with values from df
mat[:,1:] = df[["1", "2", "3"]].values
# Then a vector with values from df["TGR1"]
v = df["TGR1"].values
# Finally we take the given index
# from each row on matrix
df["BB"] = np.take_along_axis(mat, v[:,None], axis=1)
CodePudding user response:
You can create the column using a list comprehension with your if-else logic
# Sample data
df = pd.DataFrame({'TGR1':[random.randint(0,3) for i in range(10)],
'1':[random.randint(0,100) for i in range(10)],
'2':[random.randint(101,200) for i in range(10)],
'3':[random.randint(201,300) for i in range(10)]})
# creating the column
df['BB'] = [0 if tgr1_val == 0 else df.loc[ind,str(tgr1_val)]
for ind,tgr1_val in enumerate(df['TGR1'].values)]
df
# TGR1 1 2 3 BB
# 0 0 54 107 217 0
# 1 2 71 128 277 128
# 2 1 25 103 269 25
# 3 0 80 112 279 0
# 4 2 98 167 228 167
# 5 3 26 192 285 285
# 6 0 27 107 228 0
# 7 2 13 103 298 103
# 8 3 28 196 289 289
# 9 2 72 186 251 186
CodePudding user response:
this is done easy with the use of boolean mask as you did it in your step one:
df['BB'][df['TGR1'] == 0] = 0
for the other values greaters than 0:
df['BB'][df['TGR1'] == 1] = df['1'][df['TGR1'] == 1]
df['BB'][df['TGR1'] == 2] = df['2'][df['TGR1'] == 2]
df['BB'][df['TGR1'] == 3] = df['3'][df['TGR1'] == 3]
output:
1 2 3 TGR1 BB
0 34.0 5.18 19.1 1 34.00
1 34.0 5.18 19.1 2 5.18
2 34.0 5.18 19.1 0 0.00
3 34.0 5.18 19.1 3 19.10
4 34.0 5.18 19.1 2 5.18
probably it is pretty much readable.