I have df going like
pd.DataFrame([["A1" "B1", "C1", "P"],
["A2" "B2", "C2", "P"],
["A3" "B3", "C3", "P"]], columns=["col_a" "col_b", "col_c", "col_d"])
col_a col_b col_c col_d
A1 B1 C1 P
A2 B2 C2 P
A3 B3 C3 P
...
the result I need is basically repeat and ensure that columns have P Q R extension in col_d for every unique row occurence
col_a col_b col_c col_d
A1 B1 C1 P
A1 B1 C1 Q
A1 B1 C1 R
A2 B2 C2 P
A2 B2 C2 Q
A2 B2 C2 R
A3 B3 C3 P
A3 B3 C3 Q
A3 B3 C3 R
...
All I have so far is:
new_df = pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)
Which result in duplication of those values, but col_d is unchanged
EDIT:
Now I stumbled upon another need, where for every unique col_a and col_b I need to add "S" to col_d
Resulting for instance in this:
col_a col_b col_c col_d
A1 B1 C1 P
A1 B1 C1 Q
A1 B1 C1 R
A1 B1 T S
A2 B2 C2 P
A2 B2 C2 Q
A2 B2 C2 R
A2 B2 T S
Thank you very much for help!
CodePudding user response:
Add values to column col_d
by DataFrame.assign
with numpy.tile
:
L = ['P','Q','R']
new_df = (pd.DataFrame(np.repeat(df.values, 3, axis=0), columns=df.columns)
.assign(col_d = np.tile(L, len(df))))
print (new_df)
col_acol_b col_c col_d
0 A1B1 C1 P
1 A1B1 C1 Q
2 A1B1 C1 R
3 A2B2 C2 P
4 A2B2 C2 Q
5 A2B2 C2 R
6 A3B3 C3 P
7 A3B3 C3 Q
8 A3B3 C3 R
Another similar idea is repeat indices and duplicated rows by DataFrame.loc
:
L = ['P','Q','R']
new_df = (df.loc[df.index.repeat(3)]
.assign(col_d = np.tile(L, len(df)))
.reset_index(drop=True))
print (new_df)
col_acol_b col_c col_d
0 A1B1 C1 P
1 A1B1 C1 Q
2 A1B1 C1 R
3 A2B2 C2 P
4 A2B2 C2 Q
5 A2B2 C2 R
6 A3B3 C3 P
7 A3B3 C3 Q
8 A3B3 C3 R
EDIT:
L = ['P','Q','R','S']
new_df = (pd.DataFrame(np.repeat(df.values, len(L), axis=0), columns=df.columns)
.assign(col_d = np.tile(L, len(df)),
col_c = lambda x: x['col_c'].mask(x['col_d'].eq('S'), 'T')))
print (new_df)
col_acol_b col_c col_d
0 A1B1 C1 P
1 A1B1 C1 Q
2 A1B1 C1 R
3 A1B1 T S
4 A2B2 C2 P
5 A2B2 C2 Q
6 A2B2 C2 R
7 A2B2 T S
8 A3B3 C3 P
9 A3B3 C3 Q
10 A3B3 C3 R
11 A3B3 T S
CodePudding user response:
If you already have the first dataframe you can assign
and explode
:
l= ['P','Q','R']
new_df = df.assign(col_d=[l]*len(df)).explode('col_d')
or merge
:
new_df = df.drop(columns='col_d').merge(pd.Series(l, name='col_d'), how='cross')
output:
col_acol_b col_c col_d
0 A1B1 C1 P
1 A1B1 C1 Q
2 A1B1 C1 R
3 A2B2 C2 P
4 A2B2 C2 Q
5 A2B2 C2 R
6 A3B3 C3 P
7 A3B3 C3 Q
8 A3B3 C3 R
CodePudding user response:
You can easily generate the combinations with complete from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
df.complete(['col_a', 'col_b', 'col_c'], {'col_d': ['P','Q','R']})
col_a col_b col_c col_d
0 A1 B1 C1 P
1 A1 B1 C1 Q
2 A1 B1 C1 R
3 A2 B2 C2 P
4 A2 B2 C2 Q
5 A2 B2 C2 R
6 A3 B3 C3 P
7 A3 B3 C3 Q
8 A3 B3 C3 R
Basically, you are combining ['col_a', 'col_b', 'col_c']
with {'col_d': ['P','Q','R']}
; using a dictionary allows you to introduce new values into the data.
For the scenario where you need to introduce S
, you can break down the steps:
(df
.complete(['col_a', 'col_b'], {'col_d': ['P','Q','R', 'S']})
.assign(col_c = lambda df: np.where(df.col_d.eq('S'), 'T', df.col_c))
.ffill()
)
col_a col_b col_c col_d
0 A1 B1 C1 P
1 A1 B1 C1 Q
2 A1 B1 C1 R
3 A1 B1 T S
4 A2 B2 C2 P
5 A2 B2 C2 Q
6 A2 B2 C2 R
7 A2 B2 T S
8 A3 B3 C3 P
9 A3 B3 C3 Q
10 A3 B3 C3 R
11 A3 B3 T S