I'm looking for a way to duplicate all columns in a dataframe, and have the duplicated column as the original name with a '_2' on the end.
Example:
d = {'col1': [1, 2], 'col2': [3, 4]}
start_df = pd.DataFrame(data=d)
d2 = {'col1':[1,2],'col1_2':[1,2],'col2':[3,4],'col2_2':[3,4]}
end_df = pd.DataFrame(data=d2)
Thanks.
CodePudding user response:
Try this:
d = {'col1': [1, 2], 'col2': [3, 4]}
start_df = pd.DataFrame(data = d)
for column in start_df.columns:
start_df[column '_2'] = start_df[column]
CodePudding user response:
NB. this answer demonstrates a generalization of the process
Without any loop for generating the dataframe, you can simple use the repeat
method of the columns index.
Then you can set columns names programmatically with a list comprehension.
For 2 repeats:
end_df = start_df[start_df.columns.repeat(2)]
end_df.columns = [f'{a}{b}' for a in start_df for b in ('', '_2')]
output:
col1 col1_2 col2 col2_2
0 1 1 3 3
1 2 2 4 4
Generalization:
n = 5
end_df = start_df[start_df.columns.repeat(n)]
end_df.columns = [f'{a}{b}' for a in start_df
for b in [''] [f'_{x 1}' for x in range(1,n)]]
Example n=5:
col1 col1_2 col1_3 col1_4 col1_5 col2 col2_2 col2_3 col2_4 col2_5
0 1 1 1 1 1 3 3 3 3 3
1 2 2 2 2 2 4 4 4 4 4
CodePudding user response:
Use .insert()
function:
import pandas as pd
d = {'col1': [1, 2], 'col2': [3, 4]}
start_df = pd.DataFrame(data=d)
for i, col in enumerate(start_df.columns):
start_df.insert(i 1, col '_2', start_df[col])
start_df
output:
Out[1]:
col1 col1_2 col2_2 col2
0 1 1 3 3
1 2 2 4 4