找到你要的答案

Q:Algorithm for finding the closest set of measurment to certain measurment

Q:寻找最近的集测量某些测量算法


I have a collection of measurments, example:
measurment #1: { 200, 350, 712, 1023, 1430, 1555, 1800, 2036, 2569 }
measurment #2: { 165, 400, 974, 1124, 1600, 1893, 1919, 2032, 2654, 2932 }
...
measurment #N: { 234, 454, 879, 1432, 1877, 2000, 2543, 2876 }

The order of the elements in each measurment is important.
Each element will have higher value than the previous.
The number of elements in each measurment may vary,
but they should not vary to much.

Now i am getting as an input a new measurment
(lets say: { 212, 354, 978, 1222, 1454, 1922, 2013, 2432, 2987})
and should find the closest measurment from the collection of measurment i already possess.

My question is what algorithm should i use for this task ?

More:
1. It is also possible to extend the task in such meatter that instead input of one measurment i will be given a small collection of measurments.
2. Each element in a measurment represent time passed in second from the begining.
The measuring is stoped when reached 3600 seconds (1 hour), therfore the minimal posible value will be 0 and the maximal will be 3599.
The events creating each element in the measurment to be created is affected by a human behaviour.

Thanks for your help :)


I have a collection of measurments, example:
measurment #1: { 200, 350, 712, 1023, 1430, 1555, 1800, 2036, 2569 }
measurment #2: { 165, 400, 974, 1124, 1600, 1893, 1919, 2032, 2654, 2932 }
...
measurment #N: { 234, 454, 879, 1432, 1877, 2000, 2543, 2876 }

The order of the elements in each measurment is important.
Each element will have higher value than the previous.
The number of elements in each measurment may vary,
but they should not vary to much.

Now i am getting as an input a new measurment
(lets say: { 212, 354, 978, 1222, 1454, 1922, 2013, 2432, 2987})
and should find the closest measurment from the collection of measurment i already possess.

我的问题是我应该用什么算法来完成这个任务?

More:
1. It is also possible to extend the task in such meatter that instead input of one measurment i will be given a small collection of measurments.
2. Each element in a measurment represent time passed in second from the begining.
The measuring is stoped when reached 3600 seconds (1 hour), therfore the minimal posible value will be 0 and the maximal will be 3599.
The events creating each element in the measurment to be created is affected by a human behaviour.

谢谢你的帮助:

answer1: 回答1:

Assuming that your data is "fuzzy", one class of algorithms you may want to look into is dynamic programming. By fuzzy I mean that two sets are almost align, but one set may have extra elements inserted, removed compared to the other and the matching elements "almost" matches.

In these types of algorithms you typically define a distance score by defining a penalty for inserting/removing an element in the alignment and a penalty score for two elements not quite matching. In your case you may define an insert / delete penalty of "100" seconds for inserting an extra timing event, and a two-element distance score as the absolute distance in seconds. Given that definition you can easily find and modify a needleman-wunsch algorithm implementation or something similar. This will give you the distance between two small sets of measurements in an acceptable amount of time. However, if your number of elements in your measurements is huge or your number of sets is huge, and you need the answer in say milliseconds, then it is a rather difficult problem unless you can find a lot of good constraints for your problem.

The above is just an example, it all boils down to the context. Is your data noisy? How "noisy", with extra elements in the middle, start or end or just slightly off in position? plus a ton of other questions.

Choosing and implementing fuzzy algorithms can range between pretty easy to near impossible all depending on the context and what you are going to use the result for. Does it need to be exact or "just good enough". Does it need to be fast, etc.

假设你的数据是“模糊的”,你可能要研究的一类算法是动态规划。我的意思是,两组几乎对齐,但一组可能有额外的元素插入,删除相比,其他和匹配元素“几乎”匹配。

In these types of algorithms you typically define a distance score by defining a penalty for inserting/removing an element in the alignment and a penalty score for two elements not quite matching. In your case you may define an insert / delete penalty of "100" seconds for inserting an extra timing event, and a two-element distance score as the absolute distance in seconds. Given that definition you can easily find and modify a needleman-wunsch algorithm implementation or something similar. This will give you the distance between two small sets of measurements in an acceptable amount of time. However, if your number of elements in your measurements is huge or your number of sets is huge, and you need the answer in say milliseconds, then it is a rather difficult problem unless you can find a lot of good constraints for your problem.

以上只是一个例子,都归结为上下文。你的数据吵吗?如何“嘈杂”,与额外的元素在中间,开始或结束或只是稍微关闭的位置?加上一大堆其他问题。

选择和实现模糊算法可以介于非常容易接近不可能所有取决于上下文和你要使用的结果。是否需要精确或“足够好”。是否需要快速等

answer2: 回答2:

Find the squared sum of errors of your new measure with each measurement in your collection. Then return the one from your collection with the smallest error.

var measures = [
 [1, 2, 3, 4],
 [10, 20, 30, 40],
 [66, 77, 88, 99],
 [101, 202, 303, 404]
];

// ignores measurements that aren't the same length as the data
// uses the squared sum of differences (errors)
function findClosest(data) {
  var minError = 0x7FFFFFFF; // max 32bit signed int
  var result = null;
  for(var i=0; i < measures.length; i++) {
    if(data.length !== measures[i].length) { continue; }
    var error = 0;
    for(var j=0; j < data.length; j++) {
      error += Math.pow(measures[i][j] - data[j], 2);
    }
    if(error < minError) {
      minError = error;
      result = measures[i];
    }
  }
  return result;
}

// allows data that is different length than measurements by trying to best fit each element of data to an element of the tested measurement
// uses the squared sum of differences (error)
function findClosestV2(data) {
  var minError = 0x7FFFFFFF; // max 32bit signed int
  var result = null;
  for(var i=0; i < measures.length; i++) {
    var measure = measures[i];
    var error = 0;
    var minLocalError = 0x7FFFFFFF;
    for(var j=0; j < data.length; j++) {
      for(var k=0; k < measure.length; k++) {
        var localError = Math.pow(measure[k] - data[j], 2);
        if(localError < minLocalError) {
          minLocalError = localError;
        }
      }
      error += minLocalError;
    }
    if(error < minError) {
      minError = error;
      result = measures[i];
    }
  }
  return result;
}

// allows data that is different length than measurements by trying to best fit each element of data to an element of the tested measurement
// uses the average of the absolute error % using the previous measurement as the ideal value
function findClosestV3(data) {
  var minError = 0x7FFFFFFF; // max 32bit signed int
  var result = null;
  for(var i=0; i < measures.length; i++) {
    var measure = measures[i];
    var error = 0;
    var minLocalError = 0x7FFFFFFF;
    for(var j=0; j < data.length; j++) {
      for(var k=0; k < measure.length; k++) {
        var localError = Math.abs( (measure[k] - data[j]) / measure[k] );
        if(localError < minLocalError) {
          minLocalError = localError;
        }
      }
      error += minLocalError;
    }
    // average of sum of error percentages
    error /= data.length;
    if(error < minError) {
      minError = error;
      result = measures[i];
    }
  }
  return result;
}

console.log(findClosest([2,3,4,5])); // [1,2,3,4]
console.log(findClosest([70,80,90,100])); // [66,77,88,99]
console.log(findClosest([9,19,304,405])); // [101,202,303,404]

console.log(findClosestV2([404])); // [101,202,303,404]
console.log(findClosestV2([66,67,68,69])); // [66,77,88,99]
console.log(findClosestV2([9,19,304,405])); // [10,20,30,40]

console.log(findClosestV3([404])); // [101,202,303,404]
console.log(findClosestV3([66,67,68,69])); // [66,77,88,99]
console.log(findClosestV3([9,19,304,405])); // [10,20,30,40]

在你的集合中找到新度量的误差平方和。然后返回一个从您的集合与最小的错误。

var measures = [
 [1, 2, 3, 4],
 [10, 20, 30, 40],
 [66, 77, 88, 99],
 [101, 202, 303, 404]
];

// ignores measurements that aren't the same length as the data
// uses the squared sum of differences (errors)
function findClosest(data) {
  var minError = 0x7FFFFFFF; // max 32bit signed int
  var result = null;
  for(var i=0; i < measures.length; i++) {
    if(data.length !== measures[i].length) { continue; }
    var error = 0;
    for(var j=0; j < data.length; j++) {
      error += Math.pow(measures[i][j] - data[j], 2);
    }
    if(error < minError) {
      minError = error;
      result = measures[i];
    }
  }
  return result;
}

// allows data that is different length than measurements by trying to best fit each element of data to an element of the tested measurement
// uses the squared sum of differences (error)
function findClosestV2(data) {
  var minError = 0x7FFFFFFF; // max 32bit signed int
  var result = null;
  for(var i=0; i < measures.length; i++) {
    var measure = measures[i];
    var error = 0;
    var minLocalError = 0x7FFFFFFF;
    for(var j=0; j < data.length; j++) {
      for(var k=0; k < measure.length; k++) {
        var localError = Math.pow(measure[k] - data[j], 2);
        if(localError < minLocalError) {
          minLocalError = localError;
        }
      }
      error += minLocalError;
    }
    if(error < minError) {
      minError = error;
      result = measures[i];
    }
  }
  return result;
}

// allows data that is different length than measurements by trying to best fit each element of data to an element of the tested measurement
// uses the average of the absolute error % using the previous measurement as the ideal value
function findClosestV3(data) {
  var minError = 0x7FFFFFFF; // max 32bit signed int
  var result = null;
  for(var i=0; i < measures.length; i++) {
    var measure = measures[i];
    var error = 0;
    var minLocalError = 0x7FFFFFFF;
    for(var j=0; j < data.length; j++) {
      for(var k=0; k < measure.length; k++) {
        var localError = Math.abs( (measure[k] - data[j]) / measure[k] );
        if(localError < minLocalError) {
          minLocalError = localError;
        }
      }
      error += minLocalError;
    }
    // average of sum of error percentages
    error /= data.length;
    if(error < minError) {
      minError = error;
      result = measures[i];
    }
  }
  return result;
}

console.log(findClosest([2,3,4,5])); // [1,2,3,4]
console.log(findClosest([70,80,90,100])); // [66,77,88,99]
console.log(findClosest([9,19,304,405])); // [101,202,303,404]

console.log(findClosestV2([404])); // [101,202,303,404]
console.log(findClosestV2([66,67,68,69])); // [66,77,88,99]
console.log(findClosestV2([9,19,304,405])); // [10,20,30,40]

console.log(findClosestV3([404])); // [101,202,303,404]
console.log(findClosestV3([66,67,68,69])); // [66,77,88,99]
console.log(findClosestV3([9,19,304,405])); // [10,20,30,40]
algorithm  dynamic-programming