Labfans是一个针对大学生、工程师和科研工作者的技术社区。 | 论坛首页 | 联系我们(Contact Us) |
![]() |
![]() |
#1 |
高级会员
注册日期: 2019-11-21
帖子: 3,006
声望力: 66 ![]() |
![]()
我有一个文本文件,其格式为
gene complement(22995..24539) /gene="ppp" /locus_tag="MRA_0020" CDS complement(22995..24539) /gene="ppp" /locus_tag="MRA_0020" /codon_start=1 /transl_table=11 /product="putative serine/threonine phosphatase Ppp" /protein_id="ABQ71738.1" /db_xref="GI:148503929" gene complement(24628..25095) /locus_tag="MRA_0021" CDS complement(24628..25095) /locus_tag="MRA_0021" /codon_start=1 /transl_table=11 /product="hypothetical protein" /protein_id="ABQ71739.1" /db_xref="GI:148503930" gene complement(25219..26802) /locus_tag="MRA_0022" CDS complement(25219..26802) /locus_tag="MRA_0022" /codon_start=1 /transl_table=11 /product="hypothetical protein" /protein_id="ABQ71740.1" /db_xref="GI:148503931" 我想将文本文件读入Matlab,并以line基因的信息作为列表中每个项目的起点来列出一个列表。因此,在此示例中,列表中将包含3个项目。我已经尝试了一些方法,但无法使其正常工作。有人对我能做什么有任何想法吗? 回答: 这是算法的快速建议:
这是一些代码。我将您的示例保存为“ test.txt”。 % open file fid = fopen('test.txt'); % parse the file eof = false; geneCt = 1; clear output % you cannot reassign output if it exists with different fieldnames already output(1:1000) = struct; % you may want to initialize fields here while ~eof % read lines till we find one with CDS foundCDS = false; while ~foundCDS currentLine = fgetl(fid); % check for eof, then CDS. Allow whitespace at the beginning if currentLine == -1 % end of file eof = true; elseif ~isempty(regexp(currentLine,'^\s+CDS','match','once')) foundCDS = true; end end % looking for CDS if ~eof % read (and remember) lines till we find 'gene' collectedLines = cell(1,20); % assume no more than 20 lines pere gene. Row vector for looping below foundGene = false; lineCt = 1; while ~foundGene currentLine = fgetl(fid); % check for eof, then gene. Allow whitespace at the beginning if currentLine == -1; % end of file - consider all data has been read eof = true; foundGene = true; elseif ~isempty(regexp(currentLine,'^\s+gene','match','once')) foundGene = true; else collectedLines{lineCt} = currentLine; lineCt = lineCt + 1; end end % loop through collectedLines and assign. Do not loop through the % gene line for line = collectedLines(1:lineCt-1) fieldname = regexp(line{1},'/(.+)=','tokens','once'); value = regexp(line{1},'="?([^"]+)"?$','tokens','once'); % try converting value to number numValue = str2double(value); if isfinite(numValue) value = numValue; else value = value{1}; end output(geneCt).(fieldname{1}) = value; end geneCt = geneCt + 1; end end % while eof % cleanup fclose(fid); output(geneCt:end) = []; 更多&回答... |
![]() |
![]() |