Home > Software engineering >  Excel: How to analyze data in a table that contains multivalue cells
Excel: How to analyze data in a table that contains multivalue cells

Time:01-10

I am working in a science project right now about insects, and I have been logging information about the insects I have been finding along. Right now, I realize that it was a bad decision to register the name of all the insects that I been finding per each observation. I am not allowed to provide to much information because it is confidential, but I am going to add a similar example of my case in the following table:

# of sample insect (family)
1 Dermestidae, Histeridae
2 Histeridae, Dichotumius
3 Histeriade
4 Dermestidae, Histeridae
5 Cleridae, Dichotumius
485 Histeriade
486 Dermestidae, Histeridae
487 Dermestidae, Cleridae
488 Histeriade

Something like the above table. In my actual table, I have cells with 5 or 6 diferent insects. The thing is:

  • How can I search for all the different values? I mean, I want to create a table that contains all the different values and how many of them are... Something like the following table:
Insect (family) Count
Cleridae 54
Histeridae 154
Dermestidae 34

(There are at least 100 different insects and some of them just appear once, so it is impossible for me to search all the different names manually.

Furthermore, I was thinking about converting my table to a long structure. Something like the following;

Instead of this:

# of sample insect (family)
1 Dermestidae, Histeridae
2 Histeridae, Dichotumius
3 Histeriade
4 Dermestidae, Histeridae
5 Cleridae, Dichotumius

I want this:

# of sample insect (family)
1 Dermestidae
1 Histeridae
2 Histeridae
2 Dichotumius
3 Histeriade
4 Dermestidae
4 Histeridae
5 Cleridae
5 Dichotumius

I was thinking that this arrangement should be better than the one that I have now. I hope someone can help me with this issue. Thanks so much.

I tried the above, but I did´t got it. That's the reasons I asking for help.

CodePudding user response:

This can be accomplished using Power Query, available in Windows Excel 2010 and Excel 365 (Windows or Mac)

I am uncertain if you want just the unpivoted table, the Counts of each family, or something else, but I have shown the results at each of the last two steps in the query. You can use what you need.

To use Power Query

  • Select some cell in your Data Table
  • Data => Get&Transform => from Table/Range
  • When the PQ Editor opens: Home => Advanced Editor
  • Make note of the Table Name in Line 2
  • Paste the M Code below in place of what you see
  • Change the Table name in line 2 back to what was generated originally.
  • Read the comments and explore the Applied Steps to understand the algorithm
let

//Change next line to reflect your actual data source
    Source = Excel.CurrentWorkbook(){[Name="Insects"]}[Content],

//set the data types
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"# of sample", Int64.Type}, {"insect (family)", type text}}),

//Split Insect Family column by the comma, into rows
    #"Split Column by Delimiter" = 
        Table.ExpandListColumn(
            Table.TransformColumns(#"Changed Type", {{"insect (family)", 
                Splitter.SplitTextByDelimiter(
                    ",", QuoteStyle.Csv), let itemType = (type nullable text) meta [Serialized.Text = true] in type {itemType}}}), 
                "insect (family)"),

//Remove any leading and trailing spaces => your unpivoted table
    #"Trim Spaces" = Table.TransformColumns(#"Split Column by Delimiter", {"insect (family)", each Text.Trim(_), type text}),

//To create your table with counts, merely Group by the insect (family) column and aggregate with Count
    #"Grouped Rows" = Table.Group(#"Trim Spaces", {"insect (family)"}, {{"Count", each Table.RowCount(_), Int64.Type}})
in
    #"Grouped Rows"

Source Data
enter image description here

Next to last step showing unpivoted table
enter image description here

Last Step
enter image description here

CodePudding user response:

What you are trying to accomplish is called unpivoting data. Power query is best for this case. If you want it to do by formula then can try the following formula-

=DROP(REDUCE(0,REDUCE(0,B2:B6,LAMBDA(a,x,VSTACK(a,CONCAT(CHOOSEROWS(A2:A6,ROW(x)-1)&"|")&TEXTSPLIT(x,,",")))),LAMBDA(p,q,VSTACK(p,TEXTSPLIT(q,"|")))),2)

enter image description here

CodePudding user response:

To answer 'How can I search for all the different values?', the below formula will create a unique list of the insect families (where the insect families are in range B2:B100)

=UNIQUE(TEXTSPLIT(TEXTJOIN(", ",TRUE,B2:B100),"|",", ",TRUE))

You will then be able to use a COUNTIF() formula to find how many tests contain each family.

  • Related