Imagine a below dataset. INPUT DATASET
id status type location bb_count vo_count tv_count
123 open r hongkong 1 0 4
456 open r hongkong 1 7 2
456 closed p India 0 6 1
OUTPUT DATASET I need to insert a row with product type if any(bb_count, tv_count, vo_count) is greater than 0.
id status type location product
123 open r hongkong bb
123 open r hongkong tv
456 open r hongkong bb
456 open r hongkong vo
456 open r hongkong tv
456 closed p India vo
456 closed p India rv
what I tried:
def insert_row(df):
if df["bb_count"] > 0:
print("inserting bb row")
if df["tv_count"] > 0:
print("inserting tv row")
if df["vo_count"] > 0:
print("inserting vo row")
df.apply(insert_row, axis=1)
But I'm not getting the exact output.
CodePudding user response:
It looks like your issue is you are checking the value of the entire column. Try something like:
def insert_row(df):
for i in range(len(df)):
if df["bb_count"][i] > 0:
'''inserting bb row'''
if df["tv_count"][i] > 0:
'''inserting tv row'''
if df["vo_count"][i] > 0:
'''inserting vo row'''
# continue with rest of function
CodePudding user response:
You aren't changing your dataframe in the function at all. You are simply printing some statements. You don't really need a custom function for what you want to do.
Try:
melt
the dataframe to create the required structure.- Filter to keep rows where the value is greater than 0.
- Re-format the "product" column as required (removing the "_count").
melted = df.melt(["id", "status", "type", "location"],["bb_count","vo_count","tv_count"],var_name="product")
output = melted[melted["value"].gt(0)].drop("value",axis=1)
output["product"] = output["product"].str.replace("_count","")
>>> output
id status type location product
0 123 open r hongkong bb
1 456 open r hongkong bb
4 456 open r hongkong vo
5 456 closed p India vo
6 123 open r hongkong tv
7 456 open r hongkong tv
8 456 closed p India tv
CodePudding user response:
Another way without melt to keep rows ordered:
out = (
df.set_index(['id', 'status', 'type', 'location'])
.rename(columns=lambda x: x.split('_')[0])
.rename_axis(columns='product')
.stack().loc[lambda x: x > 0]
.index.to_frame(index=False)
)
Output:
>>> out
id status type location product
0 123 open r hongkong bb
1 123 open r hongkong tv
2 456 open r hongkong bb
3 456 open r hongkong vo
4 456 open r hongkong tv
5 456 closed p India vo
6 456 closed p India tv