找到你要的答案

Q:Cell array output is Slow in MEX

Q:电池阵列输出慢在墨西哥

I Have observed the following wierd problem pertaining to cell arrays. I am currently using Matlab R2013a. Consider the following code.

#include <iostream>
#include <vector>
#include <chrono>

#include <mex.h>
#undef printf       // to undo redefinition of printf by mex.h
#include <matrix.h>

void mexFunction(int nlhs, mxArray **plhs, int nrhs, mxArray **prhs) {

    if (nrhs != 0) {
        mexErrMsgIdAndTxt("CellArray:InvalidInput", "There are too many input arguments required no of input args = 0");
    }
    if (nlhs != 1) {
        mexErrMsgIdAndTxt("CellArray:InvalidOutput", "Incorrect number of output arguments required no of output args = 1");
    }

    int CellArrSize = 50820;   // Number is prime
    int CellContentSizeCurr = 1;
    int CurrentCell = 732;

    mwSize CellArrayDims[] = { CellArrSize, 1 };
    std::vector<int *> IntArrayofArrays(CellArrSize, nullptr);
    std::vector<int> CellContentSize(CellArrSize);

    plhs[0] = mxCreateCellArray(2, CellArrayDims);

    // Filling the Array of Arrays with the content to output
    // into the cell array
    for (int i = 0; i < CellArrSize; ++i) {
        CellContentSizeCurr = (CellContentSizeCurr * 398) % 431; // 431 is primes 
        #ifdef IS_CONTIG
            CurrentCell = i;
        #else
            CurrentCell = ((CurrentCell + 1) * 732) % 50821 - 1;
        #endif
        IntArrayofArrays[CurrentCell] = reinterpret_cast<int *>(mxCalloc(CellContentSizeCurr, sizeof(int)));
        for (int j = 0; j < CellContentSizeCurr; ++j) {
            IntArrayofArrays[CurrentCell][j] = i * CellContentSizeCurr + j;
        }
        CellContentSize[CurrentCell] = CellContentSizeCurr;
    }

    // Performing output of the Array of Array into Cell array
    // (along with profiling code)
    int TimeTakeninus = 0;
    for (int i = 0; i < CellArrSize; ++i) {
        mxArray * tempmxArray;
        mwSize mxArrayDims[] = { 0,0 };
        tempmxArray = mxCreateNumericArray(2, mxArrayDims, mxINT32_CLASS, mxREAL);
        mxSetM(tempmxArray, CellContentSize[i]);
        mxSetN(tempmxArray, 1);

        auto TimeBeg = std::chrono::system_clock::now();
        mxSetData(tempmxArray, IntArrayofArrays[i]);
        auto TimeEnd = std::chrono::system_clock::now();
        TimeTakeninus += std::chrono::duration_cast<std::chrono::microseconds>(TimeEnd - TimeBeg).count();
        mxSetCell(plhs[0], i, tempmxArray);
    }

    mexPrintf("The time taken to perform output = %d ms\n", TimeTakeninus / 1000);
    mexEvalString("drawnow");
}

There is an option (define) IS_CONTIG which basically decides if the memory in successive Arrays in IntArrayofArrays` is allocated contiguously or not. The numbers above are chosen so that even in the discontiguous case, all cell indices from 0..to Size-1 are filled (and allocated) exactly once.

The MATLAB Side of this is simple

FinalCellArray = MexCellArrayTest();

The issue observed is as follows

  • If we remove the definition of IS_CONTIG and allow for contiguous allocation of memory, The Execution is ligntning fast (output < 1ms).
  • However, in the other case, with non-contiguous The Output portion (as detected by the profiling code) takes nearly 6 seconds.
  • Note that the output is given using the mxSetData Function which (in principle) should only copy the address of the memory location as passed to it by IntArrayofArrays[i].

    WHY THE BIG DIFFERENCE?

我观察到了以下奇怪的问题有关的细胞阵列。我目前使用MATLAB r2013a。考虑下面的代码。

#include <iostream>
#include <vector>
#include <chrono>

#include <mex.h>
#undef printf       // to undo redefinition of printf by mex.h
#include <matrix.h>

void mexFunction(int nlhs, mxArray **plhs, int nrhs, mxArray **prhs) {

    if (nrhs != 0) {
        mexErrMsgIdAndTxt("CellArray:InvalidInput", "There are too many input arguments required no of input args = 0");
    }
    if (nlhs != 1) {
        mexErrMsgIdAndTxt("CellArray:InvalidOutput", "Incorrect number of output arguments required no of output args = 1");
    }

    int CellArrSize = 50820;   // Number is prime
    int CellContentSizeCurr = 1;
    int CurrentCell = 732;

    mwSize CellArrayDims[] = { CellArrSize, 1 };
    std::vector<int *> IntArrayofArrays(CellArrSize, nullptr);
    std::vector<int> CellContentSize(CellArrSize);

    plhs[0] = mxCreateCellArray(2, CellArrayDims);

    // Filling the Array of Arrays with the content to output
    // into the cell array
    for (int i = 0; i < CellArrSize; ++i) {
        CellContentSizeCurr = (CellContentSizeCurr * 398) % 431; // 431 is primes 
        #ifdef IS_CONTIG
            CurrentCell = i;
        #else
            CurrentCell = ((CurrentCell + 1) * 732) % 50821 - 1;
        #endif
        IntArrayofArrays[CurrentCell] = reinterpret_cast<int *>(mxCalloc(CellContentSizeCurr, sizeof(int)));
        for (int j = 0; j < CellContentSizeCurr; ++j) {
            IntArrayofArrays[CurrentCell][j] = i * CellContentSizeCurr + j;
        }
        CellContentSize[CurrentCell] = CellContentSizeCurr;
    }

    // Performing output of the Array of Array into Cell array
    // (along with profiling code)
    int TimeTakeninus = 0;
    for (int i = 0; i < CellArrSize; ++i) {
        mxArray * tempmxArray;
        mwSize mxArrayDims[] = { 0,0 };
        tempmxArray = mxCreateNumericArray(2, mxArrayDims, mxINT32_CLASS, mxREAL);
        mxSetM(tempmxArray, CellContentSize[i]);
        mxSetN(tempmxArray, 1);

        auto TimeBeg = std::chrono::system_clock::now();
        mxSetData(tempmxArray, IntArrayofArrays[i]);
        auto TimeEnd = std::chrono::system_clock::now();
        TimeTakeninus += std::chrono::duration_cast<std::chrono::microseconds>(TimeEnd - TimeBeg).count();
        mxSetCell(plhs[0], i, tempmxArray);
    }

    mexPrintf("The time taken to perform output = %d ms\n", TimeTakeninus / 1000);
    mexEvalString("drawnow");
}

There is an option (define) IS_CONTIG which basically decides if the memory in successive Arrays in IntArrayofArrays` is allocated contiguously or not. The numbers above are chosen so that even in the discontiguous case, all cell indices from 0..to Size-1 are filled (and allocated) exactly once.

这是简单的MATLAB的一面

FinalCellArray = MexCellArrayTest();

所观察到的问题如下

  • If we remove the definition of IS_CONTIG and allow for contiguous allocation of memory, The Execution is ligntning fast (output < 1ms).
  • However, in the other case, with non-contiguous The Output portion (as detected by the profiling code) takes nearly 6 seconds.
  • Note that the output is given using the mxSetData Function which (in principle) should only copy the address of the memory location as passed to it by IntArrayofArrays[i].

    为什么有很大的区别?

c++  performance  matlab  mex