I have the following dataframe:
import pandas as pd
score = [2,5,1,7,8,3,8,9,10,1]
group = ["A", "A", "B", "C", "C", "B", "A", "D", "D", "C"]
df = pd.DataFrame([group, score]).T
df.columns = ['Group', 'Score']
Which gives:
Group Score
0 A 2
1 A 5
2 B 1
3 C 7
4 C 8
5 B 3
6 A 8
7 D 9
8 D 10
9 C 1
I want to construct a new column that gives the previous entry for the given group. It also removes rows for which there is no previous entry. So my expected output would be something like this:
score = [5,8,3,8,10,1]
previous_score = [2, 7, 1, 5, 9, 8]
group = ["A", "C", "B", "A", "D", "C"]
df1 = pd.DataFrame([group, score, previous_score]).T
df1.columns = ['Group', 'Score', 'Previous Score']
Which gives:
Group Score Previous Score
0 A 5 2
1 C 8 7
2 B 3 1
3 A 8 5
4 D 10 9
5 C 1 8
I could break the dataframe into seperate dataframes based on groups and use shift
and then stick them back together. However, the order of this data is important. I could also just loop through the dataframe and find the previous entry, but this would be inefficient. Again, I think this is going to involve some groupby
work but I can't figure out how to do it without looping.
CodePudding user response:
You can use groupby.shift
to get the previous row's Scores in each row for each Group. Then dropna
drops the NaNs:
df = df.assign(Previous_Score=df.groupby('Group')['Score'].shift()).dropna().reset_index(drop=True)
Output:
Group Score Previous_Score
0 A 5 2
1 C 8 7
2 B 3 1
3 A 8 5
4 D 10 9
5 C 1 8