Home > OS >  Finding if there relationship between numbers
Finding if there relationship between numbers

Time:06-24

I have a challenge. This may be little tricky or even not possible but wanted to check if anyone has any thoughts on this?

PS : This question is in general and not related to only to R. May be I can say its general mathematics

I have a data

df
ColA    ColB    ColC
 6       9       27
 1       4       32
 4       8       40

If you observe closely, there is some relationship between these columns.

Example, (ColC/ColB) ColA will give you number 9.

df
ColA    ColB    ColC   ColD
 6       9       27     9
 1       4       32     9
 4       8       40     9

However this data is manipulated and I made sure there is some relation. But in general, lets us take any numbers, is there a way to find if there is any relationship between these numbers. Need not be (ColC/ColB) ColA . It could be anything.

Say we have 5 columns of numeric data. I need to find mathematical operation between these so that common number exists.

This is more into mathematics(algebra). Can anyone let me know is this even possible ?

CodePudding user response:

For some types of relationships this is doable. But when such a method fails to find a relationship, it typically just means there could be a relationship of a kind not covered by your approach.

One common tool for finding relationships is linear algebra, and linear dependencies in particular. Write your data in a matrix like you did. Consider that a linear equation

a*ColA   b*ColB   c*ColC = 0

Use standard techniques such as Gaussian elimination to find coefficients a, b, c which satisfy this equation but are not all zero themselves. You probably can find a library to compute the kernel of a matrix which you can use for that. Now you know whether one of the columns can be expressed as a linear combination of the other two.

This is a very limited class of relationships, and doesn't cover your example yet. But you can improve it by including more columns. Include a column with ones everywhere to allow for a constant term in your formula. Include all pair wise products.

x   a*ColA   b*ColB   c*ColC   ab*ColA*ColB   ac*ColA*ColC   bc*ColB*ColC   aa*ColA^2   bb*ColB^2   cc*ColC^2 = 0

Now for your data this could tell you that there is a solution of the form

b=-9 c=1 ab=1 x=a=ac=bc=aa=bb=cc=0
-9*ColB   ColC   ColA*ColB = 0

which is equivalent to the relationship you described in your question.

But also observed that you are now using 3 data points to determine 10 variables. So this one relationship is by far not the only one.

In general you want at least as many data points as you have variables in your equation. You want at least as many rows as you have columns in your extended matrix. Only then can you say that a relationship between them us indeed a property of the underlying data and not merely an artifact of having too much flexibility and too little data.

In R you might want to look into using linear models for determining coefficients in the presence of imprecise data. You can also use powers of formulas to include all interactions between columns, i.e. those higher degree terms which I included above as well.

  • Related