# Q：使用bin计数作为随机数选择的权重

I have a set of data that I wish to approximate via random sampling in a non-parametric manner, e.g.:

``````eventl=
4
5
6
8
10
11
12
24
32
``````

In order to accomplish this, I initially bin the data up to a certain value:

``````binsize = 5;
nbins = 20;
[bincounts,ind] = histc(eventl,1:binsize:binsize*nbins);
``````

Then populate a matrix with all possible numbers covered by the bins which the approximation can choose:

``````sizes = transpose(1:binsize*nbins);
``````

To use the bin counts as weights for selection i.e. bincount (1-5) = 2, thus the weight for choosing 1,2,3,4 or 5 = 2 whilst (16-20) = 0 so 16,17,18, 19 or 20 can never be chosen, I simply take the bincounts and replicate them across the bin size:

``````w = repelem(bincounts,binsize);
``````

To then perform weighted number selection, I use:

``````[~,R] = histc(rand(1,1),cumsum([0;w(:)./sum(w)]));
R = sizes(R);
``````

For some reason this approach is unable to approximate the data. It was my understanding that was sufficient sampling depth, the binned version of R would be identical to the binned version of eventl however there is significant variation and often data found in bins whose weights were 0.

Could anybody suggest a better method to do this or point out the error?

``````eventl=
4
5
6
8
10
11
12
24
32
``````

``````binsize = 5;
nbins = 20;
[bincounts,ind] = histc(eventl,1:binsize:binsize*nbins);
``````

``````sizes = transpose(1:binsize*nbins);
``````

``````w = repelem(bincounts,binsize);
``````

``````[~,R] = histc(rand(1,1),cumsum([0;w(:)./sum(w)]));
R = sizes(R);
``````

For a better method, I suggest randsample:

`````` values = [1 2 3 4 5 6 7 8]; %# values from which you want to pick
numberOfElements = 1000; %# how many values you want to pick
weights = [2 2 2 2 2 1 1 1]; %# weights given to the values (1-5 are twice as likely as 6-8)

sample = randsample(values, numberOfElements, true, weights);
``````

Note that even with 1000 samples, the distribution does not exactly correspond to the weights, so if you only pick 20 samples, the histogram may look rather different.

`````` values = [1 2 3 4 5 6 7 8]; %# values from which you want to pick
numberOfElements = 1000; %# how many values you want to pick
weights = [2 2 2 2 2 1 1 1]; %# weights given to the values (1-5 are twice as likely as 6-8)

sample = randsample(values, numberOfElements, true, weights);
``````

matlab