How to sort data that is placed in multiple columns into one column?-CodePudding

I have a dataset in excel with multiple columns that contain one specific set of data that is randomly distributed within these columns. The data looks like this:

Col1  Col2  Col3  Col4  Col5  Col6
Data  34    NA    NA    NA    NA
NA    NA    Data  32    NA    NA
NA    NA    NA    NA    Data  12
NA    Data  89    NA    NA    NA

I want to get all the Data fields into one column named data, so the data set looks like this (the NA columns are not important for now, I have this issue with several fields, so I will need to replicate the solution for other fields as well):

As I am currently working in excel, an easy solution there would be great, however, I will move to R at some point, so a solution in R would also be welcomed!

Thank you very much and sorry for the cryptic description.

Edit: Here is a picture of my real data - in this example I want a column named "Total supply", one named "Ticker" and one named "Accepted Currencies", with the respective data (following the named cell) in the correct column.

The output should look like this:

CodePudding user response：

If your data is in a matrix called mat, the following code extracts the desired values into a vector:

apply(mat, 1L, \(row) row[which(row == 'Data')   1L])
# [1] "34" "32" "12" "89"

This assumes that the data is correctly formatted. Any divergence will yield unexpected results.

It’s also possible to do this without apply using vectorised functions but the resulting code is a bit longer:

ind = which(mat == 'Data', arr.ind = TRUE)
ord = order(ind[, 1L])
mat[cbind(seq_along(ord), ind[ord, 2L]   1L)]

CodePudding user response：

Try:

Formula used in A6:

=TOCOL(IF(A1:E4="Data",B1:F4,NA()),3)

Or, a bit more dynamic:

=LET(a,A1:F4,TOCOL(IF(DROP(a,,-1)="Data",DROP(a,,1),NA()),3))

EDIT: Updated answer based on new information;

Formula in F2:

=LET(a,TOCOL(A1:D13,3),IFERROR(DROP(REDUCE(EXPAND(0,,3),SEQUENCE(ROWS(a-1)),LAMBDA(x,y,LET(b,INDEX(a,y 1),c,FILTER(x,NOT(ISERROR(TAKE(x,,1)))),SWITCH(INDEX(a,y),"Ticker",VSTACK(c,b),"Total Supply",VSTACK(DROP(c,-1),HSTACK(TOROW(TAKE(c,-1),3),b)),"Accepted Currencies",VSTACK(DROP(c,-1),HSTACK(TAKE(c,-1,2),b)),c)))),1),""))

CodePudding user response：

With the example df from below, you could do:

df[] <- lapply(df, as.character)
df <- data.frame(Data = c(t(df)), stringsAsFactors = FALSE)

subset(df, ave(Data, cumsum(is.na(Data)), FUN = length) > 1L & Data != 'Data')

Output:

This approach would capture multiple consecutive occurrences of values. You haven't specified whether (and how) potential NA values occurring after Data should be captured - that would require a modification (e.g. you could capture all the values occurring after each appearance of Data in case they contain at least one non-NA, or it could be you just wanted to capture single NA in case the whole vector afterwards is empty but we've nevertheless Data, etc.).

Data:

df <- read.table(
  text = 'Col1  Col2  Col3  Col4  Col5  Col6
Data  34    NA    NA    NA    NA
NA    NA    Data  32    NA    NA
NA    NA    NA    NA    Data  12
NA    Data  89    NA    NA    NA', header = TRUE
)