Checking if columns in dataframe are "paired"-CodePudding

I have a very long data frame (~10,000 rows), in which two of the columns look something like this.

Just scrubbing through the data it seems that the two columns are "paired" together, but is there any way of explicitly checking this?

CodePudding user response：

If you run this you will see how many unique values of B there are for each value of A

tapply(dat$B, dat$A, function(x) length(unique(x)))

So if the max of this vector is 1 then there are no values of A that have more than one corresponding value of B.

CodePudding user response：

You want to know if value x in column A always means value y in column B? Let's group by A and count the distinct values in B:

df <- data.frame(
  A = c(1, 1, 2, 9, 9, 2, 9),
  B = c(5.5, 5.5, 201, 18, 18, 201, 18)
)

df %>%
  group_by(A) %>%
  distinct(B) %>%
  summarize(n_unique = n())

# A tibble: 3 x 2
      A n_unique
  <dbl>    <int>
1     1        1
2     2        1
3     9        1

If we now alter the df to the case that this is not true:

df <- data.frame(
  A = c(1, 1, 2, 9, 9, 2, 9),
  B = c(5.5, 5.4, 201, 18, 18, 201, 18)
)

df %>%
  group_by(A) %>%
  distinct(B) %>%
  summarize(n_unique = n())

# A tibble: 3 x 2
      A n_unique
  <dbl>    <int>
1     1        2
2     2        1
3     9        1

Observe the increased count for group 1. As you have more than 10000 rows, what remains is to see whether or not there is at least one instance that has n_unique > 1, for instance by filter(n_unique > 1)