Pandas- Add a new row based on a specific condition-CodePudding

I have a specific dataframe which looks like this:

owner	name	col_name	test_col1	test_col2
svc	dmn_dmn	A	1	"String1"
svc	dmn_dmn	B	2	"String12"
svc	dmn_dmn	C	remain_constant_3	"String13"
svc	dmn_dmn	D	remain_constant_4	"String14"
svc	time1	E	5	"String1123"
svc	time1	F	6	"String123223"
svc	sap	J	1	"String11"
svc	sap	K	2	"String12"
svc	sap	D	4	"String14"

If the values "C" and "D" are not present in the column col_name then add "C" and "D" to its col_name. The final dataframe should look like this:

owner	name	col_name	test_col1	test_col2
svc	dmn_dmn	A	1	"String1"
svc	dmn_dmn	B	2	"String12"
svc	dmn_dmn	C	remain_constant_3	"String13"
svc	dmn_dmn	D	remain_constant_4	"String14"
svc	time1	E	5	"String1123"
svc	time1	F	6	"String123223"
svc	time1	C	remain_constant_3	"String13"
svc	time1	D	remain_constant_4	"String14"
svc	sap	J	1	"String11"
svc	sap	K	2	"String12"
svc	sap	C	remain_constant_3	"String13"
svc	sap	D	remain_constant_4	"String14"

Edited: Please also note that there could be more columns in this dataframe. I didnt add the other columns as i thought it wouldnt matter with the code but then i saw there was some confusion

CodePudding user response：

You could use groupby to check if 'C' and 'D' are in the 'col_name' column and add them if not.

df = pd.DataFrame([{'owner':'svc','name':'dmn_dmn','col_name':'A','test_col1':1,'test_col2':'String1'},{'owner':'svc','name':'dmn_dmn','col_name':'B','test_col1':2,'test_col2':'String12'},{'owner':'svc','name':'dmn_dmn','col_name':'C','test_col1':'remain_constant_3','test_col2':'String13'},{'owner':'svc','name':'dmn_dmn','col_name':'D','test_col1':'remain_constant_3','test_col2':'String14'},{'owner':'svc','name':'time1','col_name':'E','test_col1':5,'test_col2':'String1123'}])

for g,g_hold in df.groupby('name'):
    if 'C' not in g_hold['col_name'].tolist():
        df = df.append({'owner':'svc','name':g,'col_name':'C','test_col1':'remain_constant_3','test_col2':'String13'},ignore_index=True)
    if 'D' not in g_hold['col_name'].tolist():
        df = df.append({'owner':'svc','name':g,'col_name':'D','test_col1':'remain_constant_3','test_col2':'String14'},ignore_index=True)

print(df.sort_values(['name','col_name']))

The code would end up looking something like this.

CodePudding user response：

A better way is to use

import pandas as pd
df = pd.DataFrame({"owner": ["owner"] * 9,
               "name": ["dmn_dmn", "dmn_dmn", "dmn_dmn", "dmn_dmn", "time1", "time1", "sap", "sap", "sap"],
               "col_name": ["A", "B", "C", "D", "A", "B", "A", "B", "D"]})
index = pd.MultiIndex.from_product([df.owner.unique(), df.name.unique(), df.col_name.unique()])
result = df.set_index(['owner', 'name', "col_name"]).reindex(index).reset_index()
print(result)

CodePudding user response：

You could use the columns with which you want to perform the combination as index and craft a custom index to reindex. Then groupby and ffill/bfill.

df2 = df.set_index(['owner', 'name', 'col_name'])

idx = pd.MultiIndex.from_product([df['owner'].unique(),
                                  df['name'].unique(),
                                  ['C', 'D'],
                                 ], names=['owner', 'name', 'col_name'])

(df2.reindex(df2.index.union(idx))
    .groupby(level='col_name').ffill()
    .groupby(level='col_name').bfill()
    .reset_index()
)

output:

   owner     name col_name          test_col1       test_col2
0    svc  dmn_dmn        A                  1       "String1"
1    svc  dmn_dmn        B                  2      "String12"
2    svc  dmn_dmn        C  remain_constant_3      "String13"
3    svc  dmn_dmn        D  remain_constant_4      "String14"
4    svc      sap        C  remain_constant_3      "String13"
5    svc      sap        D                  4      "String14"
6    svc      sap        J                  1      "String11"
7    svc      sap        K                  2      "String12"
8    svc    time1        C  remain_constant_3      "String13"
9    svc    time1        D                  4      "String14"
10   svc    time1        E                  5    "String1123"
11   svc    time1        F                  6  "String123223"

CodePudding user response：

here are a straight forward approach

import pandas as pd
import numpy as np

df = pd.DataFrame({"owner": ["svc"] * 9,
                   "name": ["dmn_dmn", "dmn_dmn", "dmn_dmn", "dmn_dmn", "time1", "time1", "sap", "sap", "sap"],
                   "col_name": ["A", "B", "C", "D", "E", "F", "J", "K", "D"],
                   "test_col1": ["1", "2", "remain_constant_3", "remain_constant_4", 5, 6, 1, 2, 4],
                   "test_col2": ["String1", "String12", "String13", "String14", "String1123", "String123223",
                                 "String11", "String12", "String14"]})
list_of_element = ["C", "D"]
for owner in df.owner.unique():
    for name in df.name.unique():
        filtred = df[(df.owner == owner) & (df.name == name)]
        differance = np.setdiff1d(list_of_element, filtred.col_name)
        for diff in differance:
            if diff == 'C':
                df2 = {'owner': owner, 'name': name, 'col_name': diff, 'test_col1': 'remain_constant_3 ',
                       'test_col2': "String13"}
            if diff == 'D':
                df2 = {'owner': owner, 'name': name, 'col_name': diff, 'test_col1': 'remain_constant_4',
                       'test_col2': "String14"}
            df = df.append(df2, ignore_index=True)
print(df)