Now I have table1 as below:
A | B | C |
---|---|---|
First | Second | Second |
Second | Third | Forth |
I now want to output a list(The first column has already known):
First | A | ||
---|---|---|---|
Second | A | B | C |
Thrid | B | ||
Forth | C |
How can I output this using Excel(or maybe python, if much easier)?
CodePudding user response:
This can be done with Python and the openpyxl package. The code below reads an existing spreadsheet table1.xlsx
, and creates a new spreadsheet table1_columns.xlsx
listing the unique values and the columns they were found in. With your input, the code produces your output.
import os
from collections import defaultdict
import openpyxl
input_filename = r"table1.xlsx"
output_filename = r"table1_columns.xlsx"
# Read the input spreadsheet file
wb = openpyxl.load_workbook(input_filename)
# Use the first spreadsheet in the workbook
ws = wb.worksheets[0]
# Iterate over every cell in each column, and read the values into a dict
# which has a default value of an empty set.
# The keys are the cell values, e.g. "First"
# The values are sets of column letters the cell value was found in, e.g. {"A"}
values = defaultdict(set)
for cells in ws.iter_cols():
for cell in cells:
values[cell.value].add(cell.column_letter)
# Create a new spreadsheet
wb = openpyxl.Workbook()
ws = wb.active
# Iterate over the dict, writing each value and its column letters to a new row
for row_num, (value, column_letters) in enumerate(values.items(), start=1):
# The first column is always the value, e.g. "First"
ws.cell(row=row_num, column=1).value = value
# As sets are an unordered data type, it first needs to be sorted. If not,
# the second row could be "Second B A C" instead of "Second A B C"
column_letters = sorted(column_letters)
for column_num, column_letter in enumerate(column_letters, start=2):
ws.cell(row=row_num, column=column_num).value = column_letter
# Save the spreadsheet to disk
wb.save(output_filename)
# Launch the new spreadsheet
os.startfile(output_filename)
CodePudding user response:
While I prefer GordonAitchJay's answer as it uses an existing library, most (if not all) spreadsheet editors can export sheets in .csv format (comma-separated values). From there you could read the first line containing the column names, and split on commas.
Off the top of my head it should look something like this:
columnNames = None
with open(filePath, "r", encoding="utf-8") as contents:
columnNames = contents.getlines()[0].split(",") # Or maybe "readlines()"
# Do your magic.