I have a dataframe as follows:
Code_1 | Code_2 |
---|---|
A | C1 |
B | C2 |
C | C3 |
D | C4 |
E | C5 |
NaN | C6 |
NaN | C7 |
NaN | C8 |
NaN | C9 |
NaN | C10 |
Then, I modified my dataframe because I wanted the same Code_1 for the whole column Code_2, please check it out the code: Firstly, I did split the columns:
dfa = pd.DataFrame()
dfb = pd.DataFrame()
dfa['Code_1'] = df['Code_1']
dfb['Code_2'] = df['Code_2']
dfa = dfa.dropna()
dfa['times'] = len(dfa)
dfa = dfa.loc[dfa.index.repeat(dfa.times)].reset_index(drop=True)
** df --> Original dataframe
And then, the output is something like this (I am ignoring "times" column):
Code_1 | Code_2 |
---|---|
A | C1 |
A | C2 |
A | C3 |
A | C4 |
A | C5 |
A | C6 |
A | C7 |
A | C8 |
A | C9 |
A | C10 |
B | NaN |
B | NaN |
B | NaN |
B | NaN |
B | NaN |
B | NaN |
B | NaN |
B | NaN |
B | NaN |
B | NaN |
(and so forth)
But I'd like to have C1 to C10 repeated for each interval of Code_1, like this:
Code_1 | Code_2 |
---|---|
A | C1 |
A | C2 |
A | C3 |
A | C4 |
A | C5 |
A | C6 |
A | C7 |
A | C8 |
A | C9 |
A | C10 |
B | C1 |
B | C2 |
B | C3 |
B | C4 |
B | C5 |
B | C6 |
B | C7 |
B | C8 |
B | C9 |
B | C10 |
(and so forth)
But I don't know how to repeat Code_2 sequence over Code_1 repetition. Can you help me?
Also, if there is an easier way to do the first part of this code, please let me know.
Thank you in advance!
CodePudding user response:
If you have a dataframe like this:
code_1 = pd.DataFrame({'Code_1':['A','B','C','D','E']})
code_2 = pd.DataFrame({'Code_2':['C1','C2','C3','C4','C5','C6','C7','C8','C9','C10']})
df = pd.concat([code_1,code_2], ignore_index=False, axis=1)
Code_1 Code_2
0 A C1
1 B C2
2 C C3
3 D C4
4 E C5
5 NaN C6
6 NaN C7
7 NaN C8
8 NaN C9
9 NaN C10
You can do this to get the pattern you want:
code_mapping = [(code_1,code_2) for code_1 in df['Code_1'] for code_2 in df['Code_2'] if not isinstance(code_1,float)]
new_df = pd.DataFrame(code_mapping, columns =['Code_1', 'Code_2'])
new_df.tail(50)
The list comprehension code_mapping
will use a nested for loop to create a tuple of codes that are not float (python considers NaN
as float type so doing if not isinstance(code_1,float)
will avoid doing the mapping using NaN
as well). Then, from this list that contains the tuples, create a new dataframe.
The output is this:
Code_1 Code_2
0 A C1
1 A C2
2 A C3
3 A C4
4 A C5
5 A C6
6 A C7
7 A C8
8 A C9
9 A C10
10 B C1
11 B C2
12 B C3
13 B C4
14 B C5
15 B C6
16 B C7
17 B C8
18 B C9
19 B C10
20 C C1
21 C C2
22 C C3
23 C C4
24 C C5
26 C C7
27 C C8
28 C C9
29 C C10
30 D C1
31 D C2
32 D C3
33 D C4
34 D C5
35 D C6
36 D C7
37 D C8
38 D C9
39 D C10
40 E C1
41 E C2
42 E C3
43 E C4
44 E C5
45 E C6
46 E C7
47 E C8
48 E C9
49 E C10
This would be a more "pythonic" way of doing things, I'm sure you have other packages that can do this and you could even try some bfill/ffill
for this
CodePudding user response:
You can use a bulit-in Python tool called product
from itertools
to help you with the work.
from itertools import product
new_df = pd.DataFrame(product(df.Code_1[~df.Code_1.isna()],df.Code_2),columns=['Code_1','Code_2'])
Code_1 Code_2
0 A C1
1 A C2
2 A C3
3 A C4
4 A C5
5 A C6
6 A C7
7 A C8
8 A C9
9 A C10
10 B C1
11 B C2
12 B C3
13 B C4
14 B C5
15 B C6
16 B C7
17 B C8
18 B C9
19 B C10
. . .
. . .
. . .