Add a column based on the value of another column in a dataframe-CodePudding

I am very new to python. I encountered this task and was on it for a while without a clue. Any suggestions can help! Thanks a lot.

I have a data frame like this:

import pandas as pd
data = {'A': ['Emo/3', 'Emo/4', 'Emo/1','Emo/3', '','Emo/3', 'Emo/4', 'Emo/1','Emo/3', '', 'Neu/5', 'Neu/2','Neu/5', 'Neu/2'],
        'Pos': ["repeat3", "repeat3", "repeat3", "repeat3", '',"repeat1", "repeat1", "repeat1", "repeat1", '', "repeat2", "repeat2","repeat2", "repeat2"],
        }
df = pd.DataFrame(data)
df

    A       Pos
0   Emo/3   repeat3
1   Emo/4   repeat3
2   Emo/1   repeat3
3   Emo/3   repeat3
4       
5   Emo/3   repeat1
6   Emo/4   repeat1
7   Emo/1   repeat1
8   Emo/3   repeat1
9       
10  Neu/5   repeat2
11  Neu/2   repeat2
12  Neu/5   repeat2
13  Neu/2   repeat2

I want a output like this:

    A       Pos     B
0   Emo/3   repeat3 0
1   Emo/4   repeat3 0
2   Emo/1   repeat3 0
3   Emo/3   repeat3 0
4           
5   Emo/3   repeat1 1
6   Emo/4   repeat1 2
7   Emo/1   repeat1 3
8   Emo/3   repeat1 4
9           
10  Neu/5   repeat2 4
11  Neu/2   repeat2 2
12  Neu/5   repeat2 3
13  Neu/2   repeat2 1

The first four position of the column"B" is always 0. Then the other positions in the column "B" are based on the value in column"pos". if the row in column"pos" equal "repeat 1" then the column "B" at that four positions will be: 1, 2, 3, 4. If the row in column"pos" equal "repeat 2", then the column "B" at the four positions will be: 4, 3, 2, 1.

The values in Pos are always arranged as every four rows with the same value and the fifth row will be empty.

Thanks a lot!

CodePudding user response：

Solution

I'm sure there's a better way, but here's one approach:

df["B"] = ""
repeat_mapping = {"repeat3": [0]*4,
                  "repeat2": [*range(4, 0, -1)],
                  "repeat1": [*range(1, 5)]}

repeats = df[::5]["Pos"].map(repeat_mapping).explode()
repeats.index  = pd.Series([*range(4)]*len(df[::5]))

df["B"][repeats.index] = repeats

Output:

        A      Pos  B
0   Emo/3  repeat3  0
1   Emo/4  repeat3  0
2   Emo/1  repeat3  0
3   Emo/3  repeat3  0
4
5   Emo/3  repeat1  1
6   Emo/4  repeat1  2
7   Emo/1  repeat1  3
8   Emo/3  repeat1  4
9
10  Neu/5  repeat2  4
11  Neu/2  repeat2  3
12  Neu/5  repeat2  2
13  Neu/2  repeat2  1

Steps

Prepare the new column:

In [1]: df["B"] = ""

In [2]: df
Out[2]:
        A      Pos B
0   Emo/3  repeat3
1   Emo/4  repeat3
2   Emo/1  repeat3
3   Emo/3  repeat3
4
5   Emo/3  repeat1
6   Emo/4  repeat1
7   Emo/1  repeat1
8   Emo/3  repeat1
9
10  Neu/5  repeat2
11  Neu/2  repeat2
12  Neu/5  repeat2
13  Neu/2  repeat2

Grab each 5th row:

In [3]: df[::5]["Pos"]
Out[3]:
0     repeat3
5     repeat1
10    repeat2
Name: Pos, dtype: object

Use repeat_mapping:

In [4]: df[::5]["Pos"].map(repeat_mapping)
Out[4]:
0     [0, 0, 0, 0]
5     [1, 2, 3, 4]
10    [4, 3, 2, 1]
Name: Pos, dtype: object

Explode lists:

In [5]: repeats = df[::5]["Pos"].map(repeat_mapping).explode()

In [6]: repeats
Out[6]:
0     0
0     0
0     0
0     0
5     1
5     2
5     3
5     4
10    4
10    3
10    2
10    1
Name: Pos, dtype: object

Notice each index in repeats is repeated 4 times. We'll fix this by incrementing each index by 0, 1, 2, 3:

In [7]: pd.Series([*range(4)]*len(df[::5])).values
Out[7]: array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3], dtype=int64)

In [8]: repeats.index  = pd.Series([*range(4)]*len(df[::5]))

In [9]: repeats
Out[9]:
0     0
1     0
2     0
3     0
5     1
6     2
7     3
8     4
10    4
11    3
12    2
13    1
Name: Pos, dtype: object

Finally, df["B"][repeats.index] only selects the rows whose index matches the indices of repeats, to which we then assign the values of repeats.

CodePudding user response：

Generilized solution using Pandas tools

Ok it took me some time to figure it out but I wanted to find a slick answer and I kind of like this one:

import pandas as pd

data = {'A': ['Emo/3', 'Emo/4', 'Emo/1','Emo/3', '','Emo/3', 'Emo/4', 'Emo/1','Emo/3', '', 'Neu/5', 'Neu/2','Neu/5', 'Neu/2'],
        'Pos': ["repeat3", "repeat3", "repeat3", "repeat3", '',"repeat1", "repeat1", "repeat1", "repeat1", '', "repeat2", "repeat2","repeat2", "repeat2"],
        }
df = pd.DataFrame(data)

#First we create column B and set first 4 value that are marked as repeat3 in 'Pos' column to zero
df['B']=df['Pos'].apply(lambda x: 0 if x == "repeat3" else x)

#Then we create a boolean mask for the rows where 'Pos' is equal to repeat1
mask1=df['B'].apply(lambda x: 1 if x == "repeat1"  else 0).astype('bool')
#We do another mask the same for the rows where 'Pos' is equal to repeat2
mask2=df['B'].apply(lambda x: 1 if x == "repeat2"  else 0).astype('bool')

#We define the number sequence that you want to replace in each case
#For rows matchin repeat1
repl1= [1,2,3,4]
#For rows matching repeat2
repl2= [4,3,2,1]

#Finally we simply replace the matched patterns
df.loc[mask1,'B'] = repl1
df.loc[mask2,'B'] = repl2


print(df)

Results:

        A      Pos  B
0   Emo/3  repeat3  0
1   Emo/4  repeat3  0
2   Emo/1  repeat3  0
3   Emo/3  repeat3  0
4                    
5   Emo/3  repeat1  1
6   Emo/4  repeat1  2
7   Emo/1  repeat1  3
8   Emo/3  repeat1  4
9                    
10  Neu/5  repeat2  4
11  Neu/2  repeat2  3
12  Neu/5  repeat2  2
13  Neu/2  repeat2  1