Pandas: Inserting rows based on conditions?-CodePudding

Imagine a below dataset. INPUT DATASET

id      status   type  location     bb_count      vo_count   tv_count
123     open      r     hongkong       1             0          4      
456     open      r     hongkong       1             7          2
456     closed    p     India          0             6          1

OUTPUT DATASET I need to insert a row with product type if any(bb_count, tv_count, vo_count) is greater than 0.

id      status   type  location        product      
123     open      r     hongkong       bb            
123     open      r     hongkong       tv          
456     open      r     hongkong       bb             
456     open      r     hongkong       vo          
456     open      r     hongkong       tv             
456     closed    p     India          vo            
456     closed    p     India          rv

what I tried:

def insert_row(df):
    if df["bb_count"] > 0:
        print("inserting bb row")
    if df["tv_count"] > 0:
        print("inserting tv row")
    if df["vo_count"] > 0:
        print("inserting vo row")

df.apply(insert_row, axis=1)

But I'm not getting the exact output.

CodePudding user response：

It looks like your issue is you are checking the value of the entire column. Try something like:

def insert_row(df):
   for i in range(len(df)):
      if df["bb_count"][i] > 0:
         '''inserting bb row'''
      if df["tv_count"][i] > 0:
         '''inserting tv row'''
      if df["vo_count"][i] > 0:
         '''inserting vo row'''

      # continue with rest of function

CodePudding user response：

You aren't changing your dataframe in the function at all. You are simply printing some statements. You don't really need a custom function for what you want to do.

Try:

melt the dataframe to create the required structure.
Filter to keep rows where the value is greater than 0.
Re-format the "product" column as required (removing the "_count").

melted = df.melt(["id", "status", "type", "location"],["bb_count","vo_count","tv_count"],var_name="product")
output = melted[melted["value"].gt(0)].drop("value",axis=1)
output["product"] = output["product"].str.replace("_count","")

>>> output
    id  status type  location product
0  123    open    r  hongkong      bb
1  456    open    r  hongkong      bb
4  456    open    r  hongkong      vo
5  456  closed    p     India      vo
6  123    open    r  hongkong      tv
7  456    open    r  hongkong      tv
8  456  closed    p     India      tv

CodePudding user response：

Another way without melt to keep rows ordered:

out = (
    df.set_index(['id', 'status', 'type', 'location'])
      .rename(columns=lambda x: x.split('_')[0])
      .rename_axis(columns='product')
      .stack().loc[lambda x: x > 0]
      .index.to_frame(index=False)
)

Output:

>>> out
    id  status type  location product
0  123    open    r  hongkong      bb
1  123    open    r  hongkong      tv
2  456    open    r  hongkong      bb
3  456    open    r  hongkong      vo
4  456    open    r  hongkong      tv
5  456  closed    p     India      vo
6  456  closed    p     India      tv