Home > OS >  openrefine - Sorting column contents within a record?
openrefine - Sorting column contents within a record?

Time:01-07

Scoured the internet as best as I could but couldn't find an answer -- I was wondering, is there some way to sort the contents of a row by record? E.g. take the following table:

Key Row to sort Other row
a bca A
cab
cab
abc f
b zyx
yxz u
c def h
fed h

and turn it into:

Key Row to sort Other row
a abc A
bca
cab
cab f
b yxz
zyx u
c def h
fed h

The ultimate goal is to sort all of the columns for each record alphabetically, and then blank up so that each record is a single row.

I've tried doing a sort on the column to sort within the record itself, but that orders records by whichever record has an entry that comes in alphabetical order (regardless of whether it's the 1st entry for the record or not, interestingly).

CodePudding user response:

I would do it as follows:

  • For all columns except the key column, use the Edit cells > Join multi-valued cells operation, with a separator that is not present in the cell values
  • Transform all columns except the key column with: value.split(',').sort().join(',')
  • Split back your columns with Edit cells > Split multi-valued cells

Then you can blank down / fill down as you wish.

Here is the JSON representation of the workflow for your example:

[
  {
    "op": "core/multivalued-cell-join",
    "columnName": "Row to sort",
    "keyColumnName": "Key",
    "separator": ",",
    "description": "Join multi-valued cells in column Row to sort"
  },
  {
    "op": "core/multivalued-cell-join",
    "columnName": "Other row",
    "keyColumnName": "Key",
    "separator": ",",
    "description": "Join multi-valued cells in column Other row"
  },
  {
    "op": "core/text-transform",
    "engineConfig": {
      "facets": [],
      "mode": "record-based"
    },
    "columnName": "Row to sort",
    "expression": "grel:value.split(',').sort().join(',')",
    "onError": "keep-original",
    "repeat": false,
    "repeatCount": 10,
    "description": "Text transform on cells in column Row to sort using expression grel:value.split(',').sort().join(',')"
  },
  {
    "op": "core/text-transform",
    "engineConfig": {
      "facets": [],
      "mode": "record-based"
    },
    "columnName": "Other row",
    "expression": "grel:value.split(',').sort().join(',')",
    "onError": "keep-original",
    "repeat": false,
    "repeatCount": 10,
    "description": "Text transform on cells in column Other row using expression grel:value.split(',').sort().join(',')"
  },
  {
    "op": "core/multivalued-cell-split",
    "columnName": "Row to sort",
    "keyColumnName": "Key",
    "mode": "separator",
    "separator": ",",
    "regex": false,
    "description": "Split multi-valued cells in column Row to sort"
  },
  {
    "op": "core/multivalued-cell-split",
    "columnName": "Other row",
    "keyColumnName": "Key",
    "mode": "separator",
    "separator": ",",
    "regex": false,
    "description": "Split multi-valued cells in column Other row"
  }
]

CodePudding user response:

Here is a solution using sort

Prerequisite: assuming that the values in the "Key" column are unique.

  1. Switch to rows mode
  2. Fill down the "Key" column via Key=> Edit cells => Fill down.
  3. Sort the "Key" column via Key=> Sort...
  4. Sort the "Row to sort" column via Row to sort => Sort... as additional sort
  5. Make the sorting permanent by selecting Reorder rows permanently in the sort menu.
  6. Blank down the "Key" and "Row to sort" column.

Here is a solution using GREL

As deduplicating and sorting records is quite a common task I have a GREL expression reducing this task to two steps:

  1. Transform the "Row to sort" column with the following GREL expression:
if(
   row.index - row.record.fromRowIndex == 0,
   row.record.cells[columnName].value.uniques().sort().join(","),
   null
)
  1. Split the multi-valued cells in the "Row to sort" column on the separator ,.

The GREL expression will take all the record cells of the current column, extract their values into an array, make the values in the array unique, sort the remaining value in the array and join it into a string using , as separator.

The joining into a string is necessary as OpenRefine currently has no support for displaying arrays in the GUI.

  • Related