找到你要的答案

Q:Using bin counts as weights for random number selection

Q:使用bin计数作为随机数选择的权重

I have a set of data that I wish to approximate via random sampling in a non-parametric manner, e.g.:

eventl=
4
5
6
8
10
11
12
24
32

In order to accomplish this, I initially bin the data up to a certain value:

binsize = 5;
nbins = 20;
[bincounts,ind] = histc(eventl,1:binsize:binsize*nbins);

Then populate a matrix with all possible numbers covered by the bins which the approximation can choose:

sizes = transpose(1:binsize*nbins);

To use the bin counts as weights for selection i.e. bincount (1-5) = 2, thus the weight for choosing 1,2,3,4 or 5 = 2 whilst (16-20) = 0 so 16,17,18, 19 or 20 can never be chosen, I simply take the bincounts and replicate them across the bin size:

w = repelem(bincounts,binsize);

To then perform weighted number selection, I use:

[~,R] = histc(rand(1,1),cumsum([0;w(:)./sum(w)]));
R = sizes(R);

For some reason this approach is unable to approximate the data. It was my understanding that was sufficient sampling depth, the binned version of R would be identical to the binned version of eventl however there is significant variation and often data found in bins whose weights were 0.

Could anybody suggest a better method to do this or point out the error?

我有一组数据,我希望通过随机抽样的非参数的方式,例如近似:

eventl=
4
5
6
8
10
11
12
24
32

为了实现这一目标,我最初把数据放在某个值上:

binsize = 5;
nbins = 20;
[bincounts,ind] = histc(eventl,1:binsize:binsize*nbins);

然后填充矩阵的所有可能的数字覆盖的近似可以选择的垃圾箱:

sizes = transpose(1:binsize*nbins);

使用本数作为选择即bincount权重(1-5)= 2,因此选择1,2,3,4或5 = 2而重量(16-20)= 0,此外,19或20不可选择的,我只是带bincounts和复制他们在垃圾桶的大小:

w = repelem(bincounts,binsize);

然后执行加权编号选择,我使用:

[~,R] = histc(rand(1,1),cumsum([0;w(:)./sum(w)]));
R = sizes(R);

由于某种原因,这种方法是无法近似的数据。这是我的理解,是足够的采样深度、分级版本R将是相同的二进制版本的eventl然而有显著的变化,经常在垃圾桶里发现了数据的权重0。

有谁能提出一个更好的方法来做这件事或指出错误?

answer1: 回答1:

For a better method, I suggest randsample:

 values = [1 2 3 4 5 6 7 8]; %# values from which you want to pick
 numberOfElements = 1000; %# how many values you want to pick 
 weights = [2 2 2 2 2 1 1 1]; %# weights given to the values (1-5 are twice as likely as 6-8)

 sample = randsample(values, numberOfElements, true, weights);

Note that even with 1000 samples, the distribution does not exactly correspond to the weights, so if you only pick 20 samples, the histogram may look rather different.

一个更好的方法,我建议randsample:

 values = [1 2 3 4 5 6 7 8]; %# values from which you want to pick
 numberOfElements = 1000; %# how many values you want to pick 
 weights = [2 2 2 2 2 1 1 1]; %# weights given to the values (1-5 are twice as likely as 6-8)

 sample = randsample(values, numberOfElements, true, weights);

请注意,即使有1000个样本,分布并不完全对应的权重,所以如果你只选择20个样本,直方图可能看起来相当不同。

matlab