Q：寻找最近的集测量某些测量算法

I have a collection of measurments, example:
measurment #1: { 200, 350, 712, 1023, 1430, 1555, 1800, 2036, 2569 }
measurment #2: { 165, 400, 974, 1124, 1600, 1893, 1919, 2032, 2654, 2932 }
...
measurment #N: { 234, 454, 879, 1432, 1877, 2000, 2543, 2876 }

The order of the elements in each measurment is important.
Each element will have higher value than the previous.
The number of elements in each measurment may vary,
but they should not vary to much.

Now i am getting as an input a new measurment
(lets say: { 212, 354, 978, 1222, 1454, 1922, 2013, 2432, 2987})
and should find the closest measurment from the collection of measurment i already possess.

My question is what algorithm should i use for this task ?

More:
1. It is also possible to extend the task in such meatter that instead input of one measurment i will be given a small collection of measurments.
2. Each element in a measurment represent time passed in second from the begining.
The measuring is stoped when reached 3600 seconds (1 hour), therfore the minimal posible value will be 0 and the maximal will be 3599.
The events creating each element in the measurment to be created is affected by a human behaviour.

I have a collection of measurments, example:
measurment #1: { 200, 350, 712, 1023, 1430, 1555, 1800, 2036, 2569 }
measurment #2: { 165, 400, 974, 1124, 1600, 1893, 1919, 2032, 2654, 2932 }
...
measurment #N: { 234, 454, 879, 1432, 1877, 2000, 2543, 2876 }

The order of the elements in each measurment is important.
Each element will have higher value than the previous.
The number of elements in each measurment may vary,
but they should not vary to much.

Now i am getting as an input a new measurment
(lets say: { 212, 354, 978, 1222, 1454, 1922, 2013, 2432, 2987})
and should find the closest measurment from the collection of measurment i already possess.

More:
1. It is also possible to extend the task in such meatter that instead input of one measurment i will be given a small collection of measurments.
2. Each element in a measurment represent time passed in second from the begining.
The measuring is stoped when reached 3600 seconds (1 hour), therfore the minimal posible value will be 0 and the maximal will be 3599.
The events creating each element in the measurment to be created is affected by a human behaviour.

Assuming that your data is "fuzzy", one class of algorithms you may want to look into is dynamic programming. By fuzzy I mean that two sets are almost align, but one set may have extra elements inserted, removed compared to the other and the matching elements "almost" matches.

In these types of algorithms you typically define a distance score by defining a penalty for inserting/removing an element in the alignment and a penalty score for two elements not quite matching. In your case you may define an insert / delete penalty of "100" seconds for inserting an extra timing event, and a two-element distance score as the absolute distance in seconds. Given that definition you can easily find and modify a needleman-wunsch algorithm implementation or something similar. This will give you the distance between two small sets of measurements in an acceptable amount of time. However, if your number of elements in your measurements is huge or your number of sets is huge, and you need the answer in say milliseconds, then it is a rather difficult problem unless you can find a lot of good constraints for your problem.

The above is just an example, it all boils down to the context. Is your data noisy? How "noisy", with extra elements in the middle, start or end or just slightly off in position? plus a ton of other questions.

Choosing and implementing fuzzy algorithms can range between pretty easy to near impossible all depending on the context and what you are going to use the result for. Does it need to be exact or "just good enough". Does it need to be fast, etc.

In these types of algorithms you typically define a distance score by defining a penalty for inserting/removing an element in the alignment and a penalty score for two elements not quite matching. In your case you may define an insert / delete penalty of "100" seconds for inserting an extra timing event, and a two-element distance score as the absolute distance in seconds. Given that definition you can easily find and modify a needleman-wunsch algorithm implementation or something similar. This will give you the distance between two small sets of measurements in an acceptable amount of time. However, if your number of elements in your measurements is huge or your number of sets is huge, and you need the answer in say milliseconds, then it is a rather difficult problem unless you can find a lot of good constraints for your problem.

Find the squared sum of errors of your new measure with each measurement in your collection. Then return the one from your collection with the smallest error.

``````var measures = [
[1, 2, 3, 4],
[10, 20, 30, 40],
[66, 77, 88, 99],
[101, 202, 303, 404]
];

// ignores measurements that aren't the same length as the data
// uses the squared sum of differences (errors)
function findClosest(data) {
var minError = 0x7FFFFFFF; // max 32bit signed int
var result = null;
for(var i=0; i < measures.length; i++) {
if(data.length !== measures[i].length) { continue; }
var error = 0;
for(var j=0; j < data.length; j++) {
error += Math.pow(measures[i][j] - data[j], 2);
}
if(error < minError) {
minError = error;
result = measures[i];
}
}
return result;
}

// allows data that is different length than measurements by trying to best fit each element of data to an element of the tested measurement
// uses the squared sum of differences (error)
function findClosestV2(data) {
var minError = 0x7FFFFFFF; // max 32bit signed int
var result = null;
for(var i=0; i < measures.length; i++) {
var measure = measures[i];
var error = 0;
var minLocalError = 0x7FFFFFFF;
for(var j=0; j < data.length; j++) {
for(var k=0; k < measure.length; k++) {
var localError = Math.pow(measure[k] - data[j], 2);
if(localError < minLocalError) {
minLocalError = localError;
}
}
error += minLocalError;
}
if(error < minError) {
minError = error;
result = measures[i];
}
}
return result;
}

// allows data that is different length than measurements by trying to best fit each element of data to an element of the tested measurement
// uses the average of the absolute error % using the previous measurement as the ideal value
function findClosestV3(data) {
var minError = 0x7FFFFFFF; // max 32bit signed int
var result = null;
for(var i=0; i < measures.length; i++) {
var measure = measures[i];
var error = 0;
var minLocalError = 0x7FFFFFFF;
for(var j=0; j < data.length; j++) {
for(var k=0; k < measure.length; k++) {
var localError = Math.abs( (measure[k] - data[j]) / measure[k] );
if(localError < minLocalError) {
minLocalError = localError;
}
}
error += minLocalError;
}
// average of sum of error percentages
error /= data.length;
if(error < minError) {
minError = error;
result = measures[i];
}
}
return result;
}

console.log(findClosest([2,3,4,5])); // [1,2,3,4]
console.log(findClosest([70,80,90,100])); // [66,77,88,99]
console.log(findClosest([9,19,304,405])); // [101,202,303,404]

console.log(findClosestV2()); // [101,202,303,404]
console.log(findClosestV2([66,67,68,69])); // [66,77,88,99]
console.log(findClosestV2([9,19,304,405])); // [10,20,30,40]

console.log(findClosestV3()); // [101,202,303,404]
console.log(findClosestV3([66,67,68,69])); // [66,77,88,99]
console.log(findClosestV3([9,19,304,405])); // [10,20,30,40]
``````

``````var measures = [
[1, 2, 3, 4],
[10, 20, 30, 40],
[66, 77, 88, 99],
[101, 202, 303, 404]
];

// ignores measurements that aren't the same length as the data
// uses the squared sum of differences (errors)
function findClosest(data) {
var minError = 0x7FFFFFFF; // max 32bit signed int
var result = null;
for(var i=0; i < measures.length; i++) {
if(data.length !== measures[i].length) { continue; }
var error = 0;
for(var j=0; j < data.length; j++) {
error += Math.pow(measures[i][j] - data[j], 2);
}
if(error < minError) {
minError = error;
result = measures[i];
}
}
return result;
}

// allows data that is different length than measurements by trying to best fit each element of data to an element of the tested measurement
// uses the squared sum of differences (error)
function findClosestV2(data) {
var minError = 0x7FFFFFFF; // max 32bit signed int
var result = null;
for(var i=0; i < measures.length; i++) {
var measure = measures[i];
var error = 0;
var minLocalError = 0x7FFFFFFF;
for(var j=0; j < data.length; j++) {
for(var k=0; k < measure.length; k++) {
var localError = Math.pow(measure[k] - data[j], 2);
if(localError < minLocalError) {
minLocalError = localError;
}
}
error += minLocalError;
}
if(error < minError) {
minError = error;
result = measures[i];
}
}
return result;
}

// allows data that is different length than measurements by trying to best fit each element of data to an element of the tested measurement
// uses the average of the absolute error % using the previous measurement as the ideal value
function findClosestV3(data) {
var minError = 0x7FFFFFFF; // max 32bit signed int
var result = null;
for(var i=0; i < measures.length; i++) {
var measure = measures[i];
var error = 0;
var minLocalError = 0x7FFFFFFF;
for(var j=0; j < data.length; j++) {
for(var k=0; k < measure.length; k++) {
var localError = Math.abs( (measure[k] - data[j]) / measure[k] );
if(localError < minLocalError) {
minLocalError = localError;
}
}
error += minLocalError;
}
// average of sum of error percentages
error /= data.length;
if(error < minError) {
minError = error;
result = measures[i];
}
}
return result;
}

console.log(findClosest([2,3,4,5])); // [1,2,3,4]
console.log(findClosest([70,80,90,100])); // [66,77,88,99]
console.log(findClosest([9,19,304,405])); // [101,202,303,404]

console.log(findClosestV2()); // [101,202,303,404]
console.log(findClosestV2([66,67,68,69])); // [66,77,88,99]
console.log(findClosestV2([9,19,304,405])); // [10,20,30,40]

console.log(findClosestV3()); // [101,202,303,404]
console.log(findClosestV3([66,67,68,69])); // [66,77,88,99]
console.log(findClosestV3([9,19,304,405])); // [10,20,30,40]
``````
algorithm  dynamic-programming