How to extract all the column name from the required table by Excel(or maybe python)-CodePudding

Now I have table1 as below:

A	B	C
First	Second	Second
Second	Third	Forth

I now want to output a list(The first column has already known):

First	A
Second	A	B	C
Thrid	B
Forth	C

How can I output this using Excel(or maybe python, if much easier)?

CodePudding user response：

This can be done with Python and the openpyxl package. The code below reads an existing spreadsheet table1.xlsx, and creates a new spreadsheet table1_columns.xlsx listing the unique values and the columns they were found in. With your input, the code produces your output.

import os
from collections import defaultdict
import openpyxl

input_filename = r"table1.xlsx"
output_filename = r"table1_columns.xlsx"

# Read the input spreadsheet file
wb = openpyxl.load_workbook(input_filename)

# Use the first spreadsheet in the workbook
ws = wb.worksheets[0]

# Iterate over every cell in each column, and read the values into a dict 
# which has a default value of an empty set.
# The keys are the cell values, e.g. "First"
# The values are sets of column letters the cell value was found in, e.g. {"A"}
values = defaultdict(set)
for cells in ws.iter_cols():
    for cell in cells:
        values[cell.value].add(cell.column_letter)

# Create a new spreadsheet
wb = openpyxl.Workbook()
ws = wb.active

# Iterate over the dict, writing each value and its column letters to a new row
for row_num, (value, column_letters) in enumerate(values.items(), start=1):
    # The first column is always the value, e.g. "First"
    ws.cell(row=row_num, column=1).value = value
    
    # As sets are an unordered data type, it first needs to be sorted. If not, 
    # the second row could be "Second B A C" instead of "Second A B C"
    column_letters = sorted(column_letters)
    for column_num, column_letter in enumerate(column_letters, start=2):
        ws.cell(row=row_num, column=column_num).value = column_letter

# Save the spreadsheet to disk
wb.save(output_filename)

# Launch the new spreadsheet
os.startfile(output_filename)

CodePudding user response：

While I prefer GordonAitchJay's answer as it uses an existing library, most (if not all) spreadsheet editors can export sheets in .csv format (comma-separated values). From there you could read the first line containing the column names, and split on commas.

Off the top of my head it should look something like this:

columnNames = None
with open(filePath, "r", encoding="utf-8") as contents:
    columnNames = contents.getlines()[0].split(",")    # Or maybe "readlines()"

# Do your magic.