I have a file with several columns in which there are numbers. I would like these numbers to be in a single column. However, the title of the column and a line must also be present for each digit. In Excel it would be better, and otherwise with pandas in Python.
Here is an example:
original file
what I would like:
CodePudding user response:
With pandas you can try this:
import pandas as pd
vars = {
'title_1': ['word_1', 'word_5'],
'title_2': ['word_2', 'word_6'],
'title_3': ['word_3', 'word_7'],
'title_4': ['word_4', 'word_8'],
'title_5': ['1', ''],
'title_6': ['', '2'],
'title_7': ['3', ''],
'title_8': ['', '4'],
}
inital_df = pd.DataFrame(vars)
words_df = inital_df[['title_1', 'title_2', 'title_3', 'title_4']]
numbers_df = inital_df[['title_5', 'title_6', 'title_7', 'title_8']]
new_df = pd.DataFrame(columns = ['title_1', 'title_2', 'title_3', 'title_4', 'new_column_1', 'new_column_2'])
for row in range(numbers_df.shape[0]):
for column in range(numbers_df.shape[1]):
if numbers_df.iloc[row, column] != '':
values = words_df.iloc[row].to_list()
values.append(numbers_df.columns[column])
values.append(numbers_df.iloc[row, column])
new_df = new_df.append({new_df.columns[idx]:values[idx] for idx, value in enumerate(values)}, ignore_index=True)
print(new_df)
Maybe this is not the best solution considering computational cost, but it solves your problem.