Home > Software engineering >  Repeat a list of cells in a column over an interval in python
Repeat a list of cells in a column over an interval in python

Time:10-12

I have a dataframe as follows:

Code_1 Code_2
A C1
B C2
C C3
D C4
E C5
NaN C6
NaN C7
NaN C8
NaN C9
NaN C10

Then, I modified my dataframe because I wanted the same Code_1 for the whole column Code_2, please check it out the code: Firstly, I did split the columns:

dfa = pd.DataFrame()
dfb = pd.DataFrame()
dfa['Code_1'] = df['Code_1']
dfb['Code_2'] = df['Code_2']
dfa = dfa.dropna()
dfa['times'] = len(dfa)

dfa = dfa.loc[dfa.index.repeat(dfa.times)].reset_index(drop=True)

** df --> Original dataframe

And then, the output is something like this (I am ignoring "times" column):

Code_1 Code_2
A C1
A C2
A C3
A C4
A C5
A C6
A C7
A C8
A C9
A C10
B NaN
B NaN
B NaN
B NaN
B NaN
B NaN
B NaN
B NaN
B NaN
B NaN

(and so forth)

But I'd like to have C1 to C10 repeated for each interval of Code_1, like this:

Code_1 Code_2
A C1
A C2
A C3
A C4
A C5
A C6
A C7
A C8
A C9
A C10
B C1
B C2
B C3
B C4
B C5
B C6
B C7
B C8
B C9
B C10

(and so forth)

But I don't know how to repeat Code_2 sequence over Code_1 repetition. Can you help me?

Also, if there is an easier way to do the first part of this code, please let me know.

Thank you in advance!

CodePudding user response:

If you have a dataframe like this:

code_1 =  pd.DataFrame({'Code_1':['A','B','C','D','E']})
code_2 = pd.DataFrame({'Code_2':['C1','C2','C3','C4','C5','C6','C7','C8','C9','C10']})
df = pd.concat([code_1,code_2], ignore_index=False, axis=1)
  Code_1 Code_2
0      A     C1
1      B     C2
2      C     C3
3      D     C4
4      E     C5
5    NaN     C6
6    NaN     C7
7    NaN     C8
8    NaN     C9
9    NaN    C10

You can do this to get the pattern you want:

code_mapping = [(code_1,code_2) for code_1 in df['Code_1'] for code_2 in df['Code_2'] if not isinstance(code_1,float)]

new_df = pd.DataFrame(code_mapping, columns =['Code_1', 'Code_2'])
new_df.tail(50)

The list comprehension code_mapping will use a nested for loop to create a tuple of codes that are not float (python considers NaN as float type so doing if not isinstance(code_1,float) will avoid doing the mapping using NaN as well). Then, from this list that contains the tuples, create a new dataframe.

The output is this:

   Code_1 Code_2
0       A     C1
1       A     C2
2       A     C3
3       A     C4
4       A     C5
5       A     C6
6       A     C7
7       A     C8
8       A     C9
9       A    C10
10      B     C1
11      B     C2
12      B     C3
13      B     C4
14      B     C5
15      B     C6
16      B     C7
17      B     C8
18      B     C9
19      B    C10
20      C     C1
21      C     C2
22      C     C3
23      C     C4
24      C     C5
26      C     C7
27      C     C8
28      C     C9
29      C    C10
30      D     C1
31      D     C2
32      D     C3
33      D     C4
34      D     C5
35      D     C6
36      D     C7
37      D     C8
38      D     C9
39      D    C10
40      E     C1
41      E     C2
42      E     C3
43      E     C4
44      E     C5
45      E     C6
46      E     C7
47      E     C8
48      E     C9
49      E    C10

This would be a more "pythonic" way of doing things, I'm sure you have other packages that can do this and you could even try some bfill/ffill for this

CodePudding user response:

You can use a bulit-in Python tool called product from itertools to help you with the work.

from itertools import product

new_df = pd.DataFrame(product(df.Code_1[~df.Code_1.isna()],df.Code_2),columns=['Code_1','Code_2'])

   Code_1 Code_2
0       A     C1
1       A     C2
2       A     C3
3       A     C4
4       A     C5
5       A     C6
6       A     C7
7       A     C8
8       A     C9
9       A    C10
10      B     C1
11      B     C2
12      B     C3
13      B     C4
14      B     C5
15      B     C6
16      B     C7
17      B     C8
18      B     C9
19      B    C10
.       .     .
.       .     .
.       .     .

  • Related