I want to concat Pandas array type column element wise.
Input
Year Month
['2021','2020',''] ['11','12','']
['2019','2020',''] ['11','12','']
<iframe name="sif1" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>
Output
Output
['202111','202012','']
['201911','202012','']
<iframe name="sif2" sandbox="allow-forms allow-modals allow-scripts" frameborder="0"></iframe>
CodePudding user response:
Use list comprehension if there is possible different length of lists per rows:
df['Output'] = [[c d for c, d in zip(a, b)] for a, b in zip(df['Year'], df['Month'])]
print (df)
Year Month Output
0 [2021, 2020, ] [11, 12, ] [202111, 202012, ]
1 [2019, 2020, ] [11, 12, ] [201911, 202012, ]
If there are same length in both columns/rows (here 3) use:
df1 = pd.DataFrame(df['Year'].tolist()) pd.DataFrame(df['Month'].tolist())
print (df1)
0 1 2
0 202111 202012
1 201911 202012
df['Output'] = df1.to_numpy().tolist()
print (df)
Year Month Output
0 [2021, 2020, ] [11, 12, ] [202111, 202012, ]
1 [2019, 2020, ] [11, 12, ] [201911, 202012, ]
CodePudding user response:
You can try with explode
:
df['Output'] = np.sum(df.explode(['Year', 'Month']), axis=1) \
.groupby(level=0).apply(list)
For 5,000,000 rows, the above operation took 1min 2s.
Setup:
data = {'Year': [['2021', '2020', ''], ['2019', '2020', ''], ['2018']],
'Month': [['11', '12', ''], ['11', '12', ''], ['07']]}
df = pd.DataFrame(data)
df1 = df.reindex(df.index.repeat(1666666)).reset_index(drop=True)
In [721]: %timeit -n 1 np.sum(df1.explode(['Year', 'Month']), axis=1).groupby(level=0).apply(list)
1min 2s ± 998 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)