I have a specific dataframe which looks like this:
owner | name | col_name | test_col1 | test_col2 |
---|---|---|---|---|
svc | dmn_dmn | A | 1 | "String1" |
svc | dmn_dmn | B | 2 | "String12" |
svc | dmn_dmn | C | remain_constant_3 | "String13" |
svc | dmn_dmn | D | remain_constant_4 | "String14" |
svc | time1 | E | 5 | "String1123" |
svc | time1 | F | 6 | "String123223" |
svc | sap | J | 1 | "String11" |
svc | sap | K | 2 | "String12" |
svc | sap | D | 4 | "String14" |
If the values "C" and "D" are not present in the column col_name then add "C" and "D" to its col_name. The final dataframe should look like this:
owner | name | col_name | test_col1 | test_col2 |
---|---|---|---|---|
svc | dmn_dmn | A | 1 | "String1" |
svc | dmn_dmn | B | 2 | "String12" |
svc | dmn_dmn | C | remain_constant_3 | "String13" |
svc | dmn_dmn | D | remain_constant_4 | "String14" |
svc | time1 | E | 5 | "String1123" |
svc | time1 | F | 6 | "String123223" |
svc | time1 | C | remain_constant_3 | "String13" |
svc | time1 | D | remain_constant_4 | "String14" |
svc | sap | J | 1 | "String11" |
svc | sap | K | 2 | "String12" |
svc | sap | C | remain_constant_3 | "String13" |
svc | sap | D | remain_constant_4 | "String14" |
Edited: Please also note that there could be more columns in this dataframe. I didnt add the other columns as i thought it wouldnt matter with the code but then i saw there was some confusion
CodePudding user response:
You could use groupby to check if 'C' and 'D' are in the 'col_name' column and add them if not.
df = pd.DataFrame([{'owner':'svc','name':'dmn_dmn','col_name':'A','test_col1':1,'test_col2':'String1'},{'owner':'svc','name':'dmn_dmn','col_name':'B','test_col1':2,'test_col2':'String12'},{'owner':'svc','name':'dmn_dmn','col_name':'C','test_col1':'remain_constant_3','test_col2':'String13'},{'owner':'svc','name':'dmn_dmn','col_name':'D','test_col1':'remain_constant_3','test_col2':'String14'},{'owner':'svc','name':'time1','col_name':'E','test_col1':5,'test_col2':'String1123'}])
for g,g_hold in df.groupby('name'):
if 'C' not in g_hold['col_name'].tolist():
df = df.append({'owner':'svc','name':g,'col_name':'C','test_col1':'remain_constant_3','test_col2':'String13'},ignore_index=True)
if 'D' not in g_hold['col_name'].tolist():
df = df.append({'owner':'svc','name':g,'col_name':'D','test_col1':'remain_constant_3','test_col2':'String14'},ignore_index=True)
print(df.sort_values(['name','col_name']))
The code would end up looking something like this.
CodePudding user response:
A better way is to use
import pandas as pd
df = pd.DataFrame({"owner": ["owner"] * 9,
"name": ["dmn_dmn", "dmn_dmn", "dmn_dmn", "dmn_dmn", "time1", "time1", "sap", "sap", "sap"],
"col_name": ["A", "B", "C", "D", "A", "B", "A", "B", "D"]})
index = pd.MultiIndex.from_product([df.owner.unique(), df.name.unique(), df.col_name.unique()])
result = df.set_index(['owner', 'name', "col_name"]).reindex(index).reset_index()
print(result)
CodePudding user response:
You could use the columns with which you want to perform the combination as index and craft a custom index to reindex. Then groupby
and ffill
/bfill
.
df2 = df.set_index(['owner', 'name', 'col_name'])
idx = pd.MultiIndex.from_product([df['owner'].unique(),
df['name'].unique(),
['C', 'D'],
], names=['owner', 'name', 'col_name'])
(df2.reindex(df2.index.union(idx))
.groupby(level='col_name').ffill()
.groupby(level='col_name').bfill()
.reset_index()
)
output:
owner name col_name test_col1 test_col2
0 svc dmn_dmn A 1 "String1"
1 svc dmn_dmn B 2 "String12"
2 svc dmn_dmn C remain_constant_3 "String13"
3 svc dmn_dmn D remain_constant_4 "String14"
4 svc sap C remain_constant_3 "String13"
5 svc sap D 4 "String14"
6 svc sap J 1 "String11"
7 svc sap K 2 "String12"
8 svc time1 C remain_constant_3 "String13"
9 svc time1 D 4 "String14"
10 svc time1 E 5 "String1123"
11 svc time1 F 6 "String123223"
CodePudding user response:
here are a straight forward approach
import pandas as pd
import numpy as np
df = pd.DataFrame({"owner": ["svc"] * 9,
"name": ["dmn_dmn", "dmn_dmn", "dmn_dmn", "dmn_dmn", "time1", "time1", "sap", "sap", "sap"],
"col_name": ["A", "B", "C", "D", "E", "F", "J", "K", "D"],
"test_col1": ["1", "2", "remain_constant_3", "remain_constant_4", 5, 6, 1, 2, 4],
"test_col2": ["String1", "String12", "String13", "String14", "String1123", "String123223",
"String11", "String12", "String14"]})
list_of_element = ["C", "D"]
for owner in df.owner.unique():
for name in df.name.unique():
filtred = df[(df.owner == owner) & (df.name == name)]
differance = np.setdiff1d(list_of_element, filtred.col_name)
for diff in differance:
if diff == 'C':
df2 = {'owner': owner, 'name': name, 'col_name': diff, 'test_col1': 'remain_constant_3 ',
'test_col2': "String13"}
if diff == 'D':
df2 = {'owner': owner, 'name': name, 'col_name': diff, 'test_col1': 'remain_constant_4',
'test_col2': "String14"}
df = df.append(df2, ignore_index=True)
print(df)