Labfans是一个针对大学生、工程师和科研工作者的技术社区。 | 论坛首页 | 联系我们(Contact Us) |
![]() |
![]() |
#1 |
高级会员
注册日期: 2019-11-21
帖子: 3,006
声望力: 66 ![]() |
![]()
我正在尝试处理非常大的数据集。我有k =〜4200个矩阵(大小可变),必须组合比较,跳过非唯一和自我比较。 k(k-1)/ 2个比较中的每个比较都产生一个矩阵,该矩阵必须针对其父级进行索引(即可以找出其来源)。方便的方法是用每个比较的结果(三角形)填充一个k x k单元格数组。这些平均是〜100 X〜100矩阵。使用单精度浮点数,总共可达到400 GB。
我需要1)生成单元数组或其一部分而不尝试将整个事物放入内存中,以及2)以类似的方式访问其元素( 及其元素)。由于依赖MATLAB的eval()以及在循环中进行save和clear ,因此我的尝试效率很低。 for i=1:k [~,m] = size(data{i}); cur_var = ['H' int2str(i)]; %# if i == 1; save('FileName'); end; %# If using a single MAT file and need to create it. eval([cur_var ' = cell(1,ki);']); for j=i+1:k [~,n] = size(data{j}); eval([cur_var '{i,j} = zeros(m,n,''single'');']); eval([cur_var '{i,j} = compare(data{i},data{j});']); end save(cur_var,cur_var); %# Add '-append' when using a single MAT file. clear(cur_var); end 我做的另一件事是在mod((i+j-1)/2,max(factor(k(k-1)/2))) == 0时执行拆分。这会将结果分成最大数量的相同大小的片段,这似乎是合乎逻辑的。索引稍微复杂一点,但也不错,因为可以使用线性索引。 有谁知道/看到更好的方法? 回答: 这是一个结合了快速运行和最少内存使用的版本。 我使用fwrite / fread以便您仍然可以使用parfor (这一次,我确保它可以工作:)) %# assume data is loaded an k is known %# find the index pairs for comparisons. This could be done more elegantly, I guess. %# I'm constructing a lower triangular array, ie an array that has ones wherever %# we want to compare i (row) and j (col). Then I use find to get i and j [iIdx,jIdx] = find(tril(ones(k,k),-1)); %# create a directory to store the comparisons mkdir('H_matrix_elements') savePath = fullfile(pwd,'H_matrix_elements'); %# loop through all comparisons in parallel. This way there may be a bit more overhead from %# the individual function calls. However, parfor is most efficient if there are %# a lot of relatively similarly fast iterations. parfor ct = 1:length(iIdx) %# make the comparison - do double b/c there shouldn't be a memory issue currentComparison = compare(data{iIdx(ct)},data{jIdx{ct}); %# create save-name as H_i_j, eg H_104_23 saveName = fullfile(savePath,sprintf('H_%i_%i',iIdx(ct),jIdx(ct))); %# save. Since 'save' is not allowed, use fwrite to write the data to disk fid = fopen(saveName,'w'); %# for simplicity: save data as vector, add two elements to the beginning %# to store the size of the array fwrite(fid,[size(currentComparison)';currentComparison(:)]); % ' #SO formatting %# close file fclose(fid) end %# to read eg comparison H_104_23 fid = fopen(fullfile(savePath,'H_104_23'),'r'); tmp = fread(fid); fclose(fid); %# reshape into 2D array. data = reshape(tmp(3:end),tmp(1),tmp(2)); 更多&回答... |
![]() |
![]() |