How should I generate the data2 from data1? The columns (start & end) of data2 should be the corss terms of data1. The corresponding values will be assigned to each cross term. Thanks in advance!
data = pd.DataFrame([['EU', ('a b'), ('d e'), ('1 2 3 4')], ['NA', ('g h i'), ('j k l'), ('11 12 13 14 15 16 17 18 19')]], columns=['region', 'start', 'end', 'values']) region start end values 0 EU a b d e 1 2 3 4 1 NA g h i j k l 11 12 13 14 15 16 17 18 19
data2 = pd.DataFrame([['EU', 'a', 'd', 1], ['EU', 'a', 'e', 2], ['EU', 'b', 'd', 3], ['EU', 'b', 'e', 4], ['NA', 'g', 'j', 11], ['NA', 'g', 'k', 12], ['NA', 'g', 'l', 13], ['NA', 'h', 'j', 14], ['NA', 'h', 'k', 15], ['NA', 'h', 'l', 16], ['NA', 'i', 'j', 17], ['NA', 'i', 'k', 18], ['NA', 'i', 'l', 19]], columns=['region', 'start', 'end', 'values']) region start end values 0 EU a d 1 1 EU a e 2 2 EU b d 3 3 EU b e 4 4 NA g j 11 5 NA g k 12 6 NA g l 13 7 NA h j 14 8 NA h k 15 9 NA h l 16 10 NA i j 17 11 NA i k 18 12 NA i l 19
CodePudding user response:
You can split the strings, explode
start and end successively, then use groupby.transform
to explode the values per group in place:
out = (data
.assign(start=data['start'].str.split(),
end=data['end'].str.split(),
values=data['values'].str.split(),
)
.explode('start').explode('end')
.assign(values=lambda d: d.groupby(level=0)['values']
.transform(lambda g: g.iloc[0]))
.reset_index(drop=True)
)
NB. this requires that the number of values after splitting is equal to the product of the number of start and end values.
output:
region start end values
0 EU a d 1
1 EU a e 2
2 EU b d 3
3 EU b e 4
4 NA g j 11
5 NA g k 12
6 NA g l 13
7 NA h j 14
8 NA h k 15
9 NA h l 16
10 NA i j 17
11 NA i k 18
12 NA i l 19