如何使用textscan读取文件？ - MATLAB爱好者论坛-LabFans.com

poster · 2019-12-10, 20:48

我有一个大的制表符分隔文件（10000行，15000列），并想将其导入Matlab。

我尝试通过以下方式使用textscan函数导入它：

function [C_text, C_data] = ReadDataFile(filename, header, attributesCount, delimiter, attributeFormats, attributeFormatCount) AttributeTypes = SetAttributeTypeMatrix(attributeFormats, attributeFormatCount); fid = fopen(filename); if(header == 1) %read column headers C_text = textscan(fid, '%s', attributesCount, 'delimiter', delimiter); C_data = textscan(fid, AttributeTypes{1, 1}, 'headerlines', 1); else C_text = ''; C_data = textscan(fid, AttributeTypes{1, 1}); end fclose(fid); AttributeTypes {1，1}是描述每个列的变量类型的字符串（在这种情况下，有14740 float和260个字符串类型变量，因此AttributeTypes {1，1}的值为'％f％f ..... 。％f％s％s ...％s，其中％f重复14740次，％s重复260次）。

当我尝试执行

>> [header, data] = ReadDataFile('data/orange_large_train.data.chunk1', 1, 15000, '\t', types, size); 标头数组似乎是正确的（列名已正确读取）。

数据是1 x 15000数组（仅导入了第一行而不是10000），并且不知道是什么原因导致这种现象。

我猜问题出在这一行：

C_data = textscan(fid, AttributeTypes{1, 1}); 但不知道可能出什么问题，因为帮助参考中描述了类似的示例。

如果您当中有人建议解决此问题-如何读取所有10000行，我将非常感谢。

回答：

我相信您所有的数据都在那里。如果查看data内部，则每个单元格应包含整列（10000x1）。您可以将第i个单元格提取为data{i}的数组。

您可能要分隔双精度和字符串数据。我不知道什么是attributeFormats ，您可能可以使用此数组。但是，您也可以使用AttributeTypes{1, 1} 。

isdouble = strfind(AttributeTypes{1, 1}(2:2:end),'f'); data_double = cell2mat(data(isdouble)); 要将字符串数据合并到一个字符串单元格数组中，可以执行以下操作：

isstring = strfind(AttributeTypes{1, 1}(2:2:end),'s'); data_string = horzcat(data{isstring});

更多&回答...

2019-12-10, 20:48	#1
poster 高级会员注册日期: 2019-11-21 帖子: 3,025 声望力: 67	如何使用textscan读取文件？我有一个大的制表符分隔文件（10000行，15000列），并想将其导入Matlab。我尝试通过以下方式使用textscan函数导入它： function [C_text, C_data] = ReadDataFile(filename, header, attributesCount, delimiter, attributeFormats, attributeFormatCount) AttributeTypes = SetAttributeTypeMatrix(attributeFormats, attributeFormatCount); fid = fopen(filename); if(header == 1) %read column headers C_text = textscan(fid, '%s', attributesCount, 'delimiter', delimiter); C_data = textscan(fid, AttributeTypes{1, 1}, 'headerlines', 1); else C_text = ''; C_data = textscan(fid, AttributeTypes{1, 1}); end fclose(fid); AttributeTypes {1，1}是描述每个列的变量类型的字符串（在这种情况下，有14740 float和260个字符串类型变量，因此AttributeTypes {1，1}的值为'％f％f ..... 。％f％s％s ...％s，其中％f重复14740次，％s重复260次）。当我尝试执行 >> [header, data] = ReadDataFile('data/orange_large_train.data.chunk1', 1, 15000, '\t', types, size); 标头数组似乎是正确的（列名已正确读取）。数据是1 x 15000数组（仅导入了第一行而不是10000），并且不知道是什么原因导致这种现象。我猜问题出在这一行： C_data = textscan(fid, AttributeTypes{1, 1}); 但不知道可能出什么问题，因为帮助参考中描述了类似的示例。如果您当中有人建议解决此问题-如何读取所有10000行，我将非常感谢。回答：我相信您所有的数据都在那里。如果查看data内部，则每个单元格应包含整列（10000x1）。您可以将第i个单元格提取为data{i}的数组。您可能要分隔双精度和字符串数据。我不知道什么是attributeFormats ，您可能可以使用此数组。但是，您也可以使用AttributeTypes{1, 1} 。 isdouble = strfind(AttributeTypes{1, 1}(2:2:end),'f'); data_double = cell2mat(data(isdouble)); 要将字符串数据合并到一个字符串单元格数组中，可以执行以下操作： isstring = strfind(AttributeTypes{1, 1}(2:2:end),'s'); data_string = horzcat(data{isstring}); 更多&回答...