I am very new to python. I encountered this task and was on it for a while without a clue. Any suggestions can help! Thanks a lot.
I have a data frame like this:
import pandas as pd
data = {'A': ['Emo/3', 'Emo/4', 'Emo/1','Emo/3', '','Emo/3', 'Emo/4', 'Emo/1','Emo/3', '', 'Neu/5', 'Neu/2','Neu/5', 'Neu/2'],
'Pos': ["repeat3", "repeat3", "repeat3", "repeat3", '',"repeat1", "repeat1", "repeat1", "repeat1", '', "repeat2", "repeat2","repeat2", "repeat2"],
}
df = pd.DataFrame(data)
df
A Pos
0 Emo/3 repeat3
1 Emo/4 repeat3
2 Emo/1 repeat3
3 Emo/3 repeat3
4
5 Emo/3 repeat1
6 Emo/4 repeat1
7 Emo/1 repeat1
8 Emo/3 repeat1
9
10 Neu/5 repeat2
11 Neu/2 repeat2
12 Neu/5 repeat2
13 Neu/2 repeat2
I want a output like this:
A Pos B
0 Emo/3 repeat3 0
1 Emo/4 repeat3 0
2 Emo/1 repeat3 0
3 Emo/3 repeat3 0
4
5 Emo/3 repeat1 1
6 Emo/4 repeat1 2
7 Emo/1 repeat1 3
8 Emo/3 repeat1 4
9
10 Neu/5 repeat2 4
11 Neu/2 repeat2 2
12 Neu/5 repeat2 3
13 Neu/2 repeat2 1
The first four position of the column"B" is always 0. Then the other positions in the column "B" are based on the value in column"pos". if the row in column"pos" equal "repeat 1" then the column "B" at that four positions will be: 1, 2, 3, 4. If the row in column"pos" equal "repeat 2", then the column "B" at the four positions will be: 4, 3, 2, 1.
The values in Pos are always arranged as every four rows with the same value and the fifth row will be empty.
Thanks a lot!
CodePudding user response:
Solution
I'm sure there's a better way, but here's one approach:
df["B"] = ""
repeat_mapping = {"repeat3": [0]*4,
"repeat2": [*range(4, 0, -1)],
"repeat1": [*range(1, 5)]}
repeats = df[::5]["Pos"].map(repeat_mapping).explode()
repeats.index = pd.Series([*range(4)]*len(df[::5]))
df["B"][repeats.index] = repeats
Output:
A Pos B
0 Emo/3 repeat3 0
1 Emo/4 repeat3 0
2 Emo/1 repeat3 0
3 Emo/3 repeat3 0
4
5 Emo/3 repeat1 1
6 Emo/4 repeat1 2
7 Emo/1 repeat1 3
8 Emo/3 repeat1 4
9
10 Neu/5 repeat2 4
11 Neu/2 repeat2 3
12 Neu/5 repeat2 2
13 Neu/2 repeat2 1
Steps
Prepare the new column:
In [1]: df["B"] = ""
In [2]: df
Out[2]:
A Pos B
0 Emo/3 repeat3
1 Emo/4 repeat3
2 Emo/1 repeat3
3 Emo/3 repeat3
4
5 Emo/3 repeat1
6 Emo/4 repeat1
7 Emo/1 repeat1
8 Emo/3 repeat1
9
10 Neu/5 repeat2
11 Neu/2 repeat2
12 Neu/5 repeat2
13 Neu/2 repeat2
Grab each 5th row:
In [3]: df[::5]["Pos"]
Out[3]:
0 repeat3
5 repeat1
10 repeat2
Name: Pos, dtype: object
Use repeat_mapping
:
In [4]: df[::5]["Pos"].map(repeat_mapping)
Out[4]:
0 [0, 0, 0, 0]
5 [1, 2, 3, 4]
10 [4, 3, 2, 1]
Name: Pos, dtype: object
Explode lists:
In [5]: repeats = df[::5]["Pos"].map(repeat_mapping).explode()
In [6]: repeats
Out[6]:
0 0
0 0
0 0
0 0
5 1
5 2
5 3
5 4
10 4
10 3
10 2
10 1
Name: Pos, dtype: object
Notice each index in repeats
is repeated 4 times. We'll fix this by incrementing each index by 0, 1, 2, 3
:
In [7]: pd.Series([*range(4)]*len(df[::5])).values
Out[7]: array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3], dtype=int64)
In [8]: repeats.index = pd.Series([*range(4)]*len(df[::5]))
In [9]: repeats
Out[9]:
0 0
1 0
2 0
3 0
5 1
6 2
7 3
8 4
10 4
11 3
12 2
13 1
Name: Pos, dtype: object
Finally, df["B"][repeats.index]
only selects the rows whose index matches the indices of repeats
, to which we then assign the values of repeats
.
CodePudding user response:
Generilized solution using Pandas tools
Ok it took me some time to figure it out but I wanted to find a slick answer and I kind of like this one:
import pandas as pd
data = {'A': ['Emo/3', 'Emo/4', 'Emo/1','Emo/3', '','Emo/3', 'Emo/4', 'Emo/1','Emo/3', '', 'Neu/5', 'Neu/2','Neu/5', 'Neu/2'],
'Pos': ["repeat3", "repeat3", "repeat3", "repeat3", '',"repeat1", "repeat1", "repeat1", "repeat1", '', "repeat2", "repeat2","repeat2", "repeat2"],
}
df = pd.DataFrame(data)
#First we create column B and set first 4 value that are marked as repeat3 in 'Pos' column to zero
df['B']=df['Pos'].apply(lambda x: 0 if x == "repeat3" else x)
#Then we create a boolean mask for the rows where 'Pos' is equal to repeat1
mask1=df['B'].apply(lambda x: 1 if x == "repeat1" else 0).astype('bool')
#We do another mask the same for the rows where 'Pos' is equal to repeat2
mask2=df['B'].apply(lambda x: 1 if x == "repeat2" else 0).astype('bool')
#We define the number sequence that you want to replace in each case
#For rows matchin repeat1
repl1= [1,2,3,4]
#For rows matching repeat2
repl2= [4,3,2,1]
#Finally we simply replace the matched patterns
df.loc[mask1,'B'] = repl1
df.loc[mask2,'B'] = repl2
print(df)
Results:
A Pos B
0 Emo/3 repeat3 0
1 Emo/4 repeat3 0
2 Emo/1 repeat3 0
3 Emo/3 repeat3 0
4
5 Emo/3 repeat1 1
6 Emo/4 repeat1 2
7 Emo/1 repeat1 3
8 Emo/3 repeat1 4
9
10 Neu/5 repeat2 4
11 Neu/2 repeat2 3
12 Neu/5 repeat2 2
13 Neu/2 repeat2 1