Home > Software engineering >  Converting between two sets scaled between 0 and 100 using overlap
Converting between two sets scaled between 0 and 100 using overlap

Time:05-22

I am attempting to scale google trends data that is received by the minute every 10 minutes. If you are not familiar with google trends, every response is scaled between 0 and 100 based on the minimum and maximum in the current response. So two different requests for different, but overlapping, time intervals can have different values for the same time (ie a request from 4:30-5:30 and a request for 5-6 may have different values for 5).

What I am attempting to do is scale all values relative to the first 4 hour interval for which I collect trend data. Every 10 minutes, a new 4h chunk will be collected, meaning most of the time will overlap with the previous chunk. Is it possible to exploit this overlap to scale all new values relative to the first interval?

Note: it's ok for new values to be gt 100

CodePudding user response:

Let's say that your initial four-hour window and any data that has gone through a scaling process is 'good.'

Let's say our good data ends at time T, and we have a new 4-hour window of data that ends at time T 10.

The only difference between the data in our new window and good data is the scaling factor. Every minute that the new window has in common with good data can generate a vote for scaling factor we need to make the new data 'good': scaling factor = (good value) / (new value).

Normally I'd use the median of votes for something like this, but because the data is so coarse, you're at risk for there being 'cliffs' in the data, and in particular the median might be next to a significantly larger or smaller number. For that reason, I suggest generating a scaling factor off the votes by eliminating the k outliers in both directions, then taking the average of the remaining votes.

If you want even more votes, you can get them off non-adjacent 4-hour blocks (though obviously with limited returns).

--- Example ---

Say in the initial window, the peak searches is 1000. That means the scaling factor for that window is 0.10, which will result in the peak searches Google displays to us being 100.

In the next window we have a new peak of 2000. Now, these peaks are invisible to us, but what we do see is that each of the points that exist in both windows have half the value in the new window vs the old window. Since votes (described above) are (good value) / (new value), we have a bunch of votes close to 2.0 (close not exact due to coarseness and rounding).

So we multiply each of our 10 new values by 2.0 to convert them to the good scale. A value of zero is unchanged since no searches is no searches whatever the scale.

  • Related