Home > Software design >  Put a state on a column if an element change the state in a dataframe with Python
Put a state on a column if an element change the state in a dataframe with Python

Time:11-30

I have the next dataframe with a lot of elements

code status Month
a Active 1
b Inactive 2
c Active 3
b Active 4
a Inactive 7
c Active 8

Is there anyway to create a dataframe with the next result

code status
a change to inactive on June
b change to active on april
c no changes

Or to have something like a report, It will be so helpful

CodePudding user response:

Try:

  1. sort_values by the "code" and "Month" columns
  2. Use np.where to assign actions when the status changes.
  3. drop_duplicates to keep only final row for each "code"
df = df.sort_values(["code", "Month"])
df["status"] = (np.where((df["code"].eq(df["code"].shift())) & 
                         (df["status"].ne(df["status"].shift())), 
                         "change to " df["status"].str.lower() " in " pd.to_datetime(df["Month"],format="%m").dt.strftime("%B"), 
                         "no changes")
                )

output = df.drop_duplicates("code", keep="last").drop("Month", axis=1).reset_index(drop=True)

>>> output
  code                      status
0    a  change to inactive in July
1    b   change to active in April
2    c                  no changes

CodePudding user response:

You can use a groupby and join on 'status' and 'month' and return a new dataframe. Then you can split and combine parts of those two columns (status and month), and lastly you can change the codes that had 'Active' in both occasions with loc and the help of your grouped object:

# Convert Month and sort
df['Month'] = pd.to_datetime(df['Month'], format='%m').dt.month_name()
df = df.sort_values(["code", "Month"])

>>> print(df)

  code    status     Month
0    a    Active   January
1    b  Inactive  February
2    c    Active     March
3    b    Active     April
4    a  Inactive      July
5    c    Active    August

# Groupby and join
g = df.groupby(['code']).agg(lambda x: ' '.join(x))

>>> print(g)
              status           Month
code                                 
a     Active Inactive    January July
b     Inactive Active  February April
c       Active Active    March August

# Combine columns and replace to 'no change'
res = pd.DataFrame('change to '   g.status.str.split(' ',1).str[1]   ' on '   g.Month.str.split(' ',1).str[1],columns=['status'])
res.loc[res.index.isin(g[g.status.eq('Active Active')].index),'status'] = 'no change'

>>> print(res)

                          status
code                            
a     change to Inactive on July
b      change to Active on April
c                      no change

Sample DF:

{'code': {0: 'a', 1: 'b', 2: 'c', 3: 'b', 4: 'a', 5: 'c'},
 'status': {0: 'Active',
  1: 'Inactive',
  2: 'Active',
  3: 'Active',
  4: 'Inactive',
  5: 'Active'},
 'Month': {0: 1, 1: 2, 2: 3, 3: 4, 4: 7, 5: 8}}
  • Related