I have the next dataframe with a lot of elements
code | status | Month |
---|---|---|
a | Active | 1 |
b | Inactive | 2 |
c | Active | 3 |
b | Active | 4 |
a | Inactive | 7 |
c | Active | 8 |
Is there anyway to create a dataframe with the next result
code | status |
---|---|
a | change to inactive on June |
b | change to active on april |
c | no changes |
Or to have something like a report, It will be so helpful
CodePudding user response:
Try:
sort_values
by the "code" and "Month" columns- Use
np.where
to assign actions when the status changes. drop_duplicates
to keep only final row for each "code"
df = df.sort_values(["code", "Month"])
df["status"] = (np.where((df["code"].eq(df["code"].shift())) &
(df["status"].ne(df["status"].shift())),
"change to " df["status"].str.lower() " in " pd.to_datetime(df["Month"],format="%m").dt.strftime("%B"),
"no changes")
)
output = df.drop_duplicates("code", keep="last").drop("Month", axis=1).reset_index(drop=True)
>>> output
code status
0 a change to inactive in July
1 b change to active in April
2 c no changes
CodePudding user response:
You can use a groupby
and join
on 'status' and 'month' and return a new dataframe. Then you can split and combine parts of those two columns (status and month), and lastly you can change the codes that had 'Active' in both occasions with loc
and the help of your grouped object:
# Convert Month and sort
df['Month'] = pd.to_datetime(df['Month'], format='%m').dt.month_name()
df = df.sort_values(["code", "Month"])
>>> print(df)
code status Month
0 a Active January
1 b Inactive February
2 c Active March
3 b Active April
4 a Inactive July
5 c Active August
# Groupby and join
g = df.groupby(['code']).agg(lambda x: ' '.join(x))
>>> print(g)
status Month
code
a Active Inactive January July
b Inactive Active February April
c Active Active March August
# Combine columns and replace to 'no change'
res = pd.DataFrame('change to ' g.status.str.split(' ',1).str[1] ' on ' g.Month.str.split(' ',1).str[1],columns=['status'])
res.loc[res.index.isin(g[g.status.eq('Active Active')].index),'status'] = 'no change'
>>> print(res)
status
code
a change to Inactive on July
b change to Active on April
c no change
Sample DF:
{'code': {0: 'a', 1: 'b', 2: 'c', 3: 'b', 4: 'a', 5: 'c'},
'status': {0: 'Active',
1: 'Inactive',
2: 'Active',
3: 'Active',
4: 'Inactive',
5: 'Active'},
'Month': {0: 1, 1: 2, 2: 3, 3: 4, 4: 7, 5: 8}}