MATLAB爱好者论坛-LabFans.com - 查看单个帖子

poster · 2019-12-10, 20:48

我想一步将一个（相当大的）日志文件读入MATLAB字符串单元中。我使用了通常的方法：

s={}; fid = fopen('test.txt'); tline = fgetl(fid); while ischar(tline) s=[s;tline]; tline = fgetl(fid); end 但这很慢。我发现

fid = fopen('test.txt'); x=fread(fid,'*char'); 是更快的方法，但是我得到了一个nx1 char矩阵x 。我可以尝试将x转换为字符串单元格，但随后进入char编码地狱。行定界符似乎是\ n \ r，或者是ASCII中的 10和56（我已经看过第一行的末尾），但是这两个字符常常彼此不跟随，甚至有时单独显示。

是否有一种简单快速的方法就可以一步将ASCII文件读入字符串单元格，或将x转换为字符串单元格？

通过fgetl读取：

Code Calls Total Time % Time tline = lower(fgetl(fid)); 903113 14.907 s 61.2% 通过fread读取：

>> tic;for i=1:length(files), fid = open(files(i).name);x=fread(fid,'*char*1');fclose(fid); end; toc Elapsed time is 0.208614 seconds. 我已经测试了预分配，但没有帮助:(

files=dir('.'); tic for i=1:length(files), if files(i).isdir || isempty(strfind(files(i).name,'.log')), continue; end %# preassign s to some large cell array sizS = 50000; s=cell(sizS,1); lineCt = 1; fid = fopen(files(i).name); tline = fgetl(fid); while ischar(tline) s{lineCt} = tline; lineCt = lineCt + 1; %# grow s if necessary if lineCt > sizS s = [s;cell(sizS,1)]; sizS = sizS + sizS; end tline = fgetl(fid); end %# remove empty entries in s s(lineCt:end) = []; end toc 经过的时间是12.741492秒。

比原始速度快大约10倍：

s = textscan(fid, '%s', 'Delimiter', '\n', 'whitespace', '', 'bufsize', files(i).bytes); 我必须将'whitespace'设置为'' ，以保留前导空格（我需要进行解析），并将'bufsize'设置为文件的大小（默认值4000引发缓冲区溢出错误）。

回答：

第一个例子很慢的主要原因是s在每次迭代中都会增长。这意味着重新创建一个新数组，复制旧行，然后添加新行，这会增加不必要的开销。

为了加快的东西，你可以预先指定s

%# preassign s to some large cell array s=cell(10000,1); sizS = 10000; lineCt = 1; fid = fopen('test.txt'); tline = fgetl(fid); while ischar(tline) s{lineCt} = tline; lineCt = lineCt + 1; %# grow s if necessary if lineCt > sizS s = [s;cell(10000,1)]; sizS = sizS + 10000; end tline = fgetl(fid); end %# remove empty entries in s s(lineCt:end) = []; 这是预分配可以为您做什么的一个小例子

>> tic,for i=1:100000,c{i}=i;end,toc Elapsed time is 10.513190 seconds. >> d = cell(100000,1); >> tic,for i=1:100000,d{i}=i;end,toc Elapsed time is 0.046177 seconds. >> 编辑

作为fgetl的替代fgetl ，您可以使用TEXTSCAN

fid = fopen('test.txt'); s = textscan(fid,'%s','Delimiter','\n'); s = s{1}; 此读取的线test.txt作为串入单元阵列s一气呵成。

更多&回答...

2019-12-10, 20:48	#1
poster 高级会员注册日期: 2019-11-21 帖子: 3,008 声望力: 66	一次将整个文本文件读取到MATLAB变量中我想一步将一个（相当大的）日志文件读入MATLAB字符串单元中。我使用了通常的方法： s={}; fid = fopen('test.txt'); tline = fgetl(fid); while ischar(tline) s=[s;tline]; tline = fgetl(fid); end 但这很慢。我发现 fid = fopen('test.txt'); x=fread(fid,'char'); 是更快的方法，但是我得到了一个nx1 char矩阵x 。我可以尝试将x转换为字符串单元格，但随后进入char编码地狱。行定界符似乎是\ n \ r，或者是ASCII中的 10和56（我已经看过第一行的末尾），但是这两个字符常常彼此不跟随，甚至有时单独显示。是否有一种简单快速的方法就可以一步将ASCII文件读入字符串单元格，或将x转换为字符串单元格？通过fgetl读取： Code Calls Total Time % Time tline = lower(fgetl(fid)); 903113 14.907 s 61.2% 通过fread读取： >> tic;for i=1:length(files), fid = open(files(i).name);x=fread(fid,'char1');fclose(fid); end; toc Elapsed time is 0.208614 seconds. 我已经测试了预分配，但没有帮助:( files=dir('.'); tic for i=1:length(files), if files(i).isdir \|\| isempty(strfind(files(i).name,'.log')), continue; end %# preassign s to some large cell array sizS = 50000; s=cell(sizS,1); lineCt = 1; fid = fopen(files(i).name); tline = fgetl(fid); while ischar(tline) s{lineCt} = tline; lineCt = lineCt + 1; %# grow s if necessary if lineCt > sizS s = [s;cell(sizS,1)]; sizS = sizS + sizS; end tline = fgetl(fid); end %# remove empty entries in s s(lineCt:end) = []; end toc 经过的时间是12.741492秒。比原始速度快大约10倍： s = textscan(fid, '%s', 'Delimiter', '\n', 'whitespace', '', 'bufsize', files(i).bytes); 我必须将'whitespace'设置为'' ，以保留前导空格（我需要进行解析），并将'bufsize'设置为文件的大小（默认值4000引发缓冲区溢出错误）。回答：* 第一个例子很慢的主要原因是s在每次迭代中都会增长。这意味着重新创建一个新数组，复制旧行，然后添加新行，这会增加不必要的开销。为了加快的东西，你可以预先指定s %# preassign s to some large cell array s=cell(10000,1); sizS = 10000; lineCt = 1; fid = fopen('test.txt'); tline = fgetl(fid); while ischar(tline) s{lineCt} = tline; lineCt = lineCt + 1; %# grow s if necessary if lineCt > sizS s = [s;cell(10000,1)]; sizS = sizS + 10000; end tline = fgetl(fid); end %# remove empty entries in s s(lineCt:end) = []; 这是预分配可以为您做什么的一个小例子 >> tic,for i=1:100000,c{i}=i;end,toc Elapsed time is 10.513190 seconds. >> d = cell(100000,1); >> tic,for i=1:100000,d{i}=i;end,toc Elapsed time is 0.046177 seconds. >> 编辑作为fgetl的替代fgetl ，您可以使用TEXTSCAN fid = fopen('test.txt'); s = textscan(fid,'%s','Delimiter','\n'); s = s{1}; 此读取的线test.txt作为串入单元阵列s一气呵成。更多&回答...