Home > OS >  Excel extract text from cells containing large amounts of text
Excel extract text from cells containing large amounts of text

Time:09-24

I have a spreadsheet containing ticket data from an IT ticketing system. The field I need to extract data from is a non-indexed text field, so I am not able to search this field in the ticket system itself. I have exported the data into Excel, where the entire Work Notes field is in one cell for each ticket. The output looks something like this:

2021-09-21 17:15:41 John Smith [Work Notes] - Update inside ticket 2021-09-21 16:12:39 Jane Doe [Work Notes] - Another update inside ticket.

What I am trying to do is capture the individual work notes and separate them into their own column so that I can calculate the time between updates and count the number of updates each individual makes. Counting the updates is easy, I used a simple CountIF formula to get the total number of times each individual has made an update in the tickets I have exported. The difficult part is how to split each update into their own cell.

I have cleaned up the data so that the only time brackets( [ ] ) are used is for the [Work Notes] or [Additional Comments] (a customer visible work note entry). I then tried text to comments with the right bracket as the delimiter, but this works for splitting the first entry into it's own cell, but the next cell will contain the previous entry's comment and then the next entry. So it would look like this:

2021-09-21 17:15:41 John Smith [Work Notes
*next cell*
- Update inside ticket 2021-09-21 16:12:39 Jane Doe [Work Notes

and so forth. This does not really work since I still cannot perform the time calculation and starts to mix up the different updates within a ticket.

Does anyone have an suggestions on a better way to do this? I am open to using VBA if needed. Thanks for the help!

CodePudding user response:

This can be done with a find/replace and then text-to-columns.

2021-09-21 17:15:41 John Smith [Work Notes] - Update inside ticket 2021-09-21 16:12:39 Jane Doe [Work Notes] - Another update inside ticket.

Since the date is the beginning of the entry, you only need to add a character before the date. This can be anything you know will not appear in the rest of the messages.

find and replace

Once you do this, you can use the text-to-columns to select the special character as your delimiter.

CodePudding user response:

Using a regex. Put strings in Column A on Sheet1.

Option Explicit

Sub extract()
    
    Dim wb As Workbook, ws As Worksheet, cell As Range
    Dim lastRow As Long, i As Integer, j As Integer
    Dim c As Long, s As String

   ' build regex pattern
    Dim regex As Object, m As Object
    Set regex = CreateObject("vbscript.regexp")
    With regex
       .Global = True
       .MultiLine = False
       .IgnoreCase = True
       .Pattern = "((20\d\d-\d\d-\d\d \d\d:\d\d:\d\d)(.*?)(\[[^]]*\]))" ' pattern
    End With

    ' data
    Set wb = ThisWorkbook
    Set ws = wb.Sheets("Sheet1")
    lastRow = ws.Cells(Rows.Count, "A").End(xlUp).Row

    For Each cell In ws.Range("A1").Resize(lastRow)
         s = cell.Value
         c = 1
         If regex.test(s) Then
            Set m = regex.Execute(s) '
            For i = 1 To m.Count
                For j = 2 To m(i - 1).submatches.Count
                     cell.Offset(0, c) = m.Item(i - 1).submatches(j - 1) ' matched term
                     c = c   1
                Next
            Next
        End If
    Next

    MsgBox lastRow & " rows parsed", vbInformation

End Sub

CodePudding user response:

With Microsoft365 formulae:

enter image description here

Formula in A3:

=FILTERXML("<t><s>"&CONCAT(IF(ISNUMBER(SEARCH(" ????-??-?? ??:??:??",MID(A1,SEQUENCE(LEN(A1)),20))),"</s><s>",MID(A1,SEQUENCE(LEN(A1)),1)))&"</s></t>","//s")

Obviously nest this inside TRANSPOSE() to spill this over to columns but for the sake of visibility in the screenshot I opted not to do so.


EDIT:

I can only guess your follow-up question but if you want to analyse the data a bit better e.g DateTime-Stamp;Person;Ticket you could try the following:

enter image description here

Formula in B3:

=CHOOSE({1,2,3},--LEFT(A3#,19),MID(A3#,21,FIND(" [",A3#,20)-21),MID(A3#,FIND("] - ",A3#) 4,LEN(A3#)))

As you see it spilled all the different data in seperate columns.

Now if one has access to LET() all this can be condensed into a single formula.

Hope it helps.

  • Related