MATLAB爱好者论坛-LabFans.com - 查看单个帖子

poster · 2019-12-10, 20:48

我应该每次取两个句子并计算它们是否相似。类似地，我的意思是在语法和语义上。

输入1：奥巴马签署法律。奥巴马签署了一项新法律。

INPUT2：总线在此处停止。一辆车停在这里。

输入3：纽约大火。纽约被烧毁了。

输入4：纽约大火。 50人死于纽约大火。

我不想将本体树当作灵魂。我写了一段代码来计算句子之间的Levenshtein距离（LD），然后确定第二句是否：

可以忽略（INPUT1和2），
应该替换第一句（输入3），或者
与第一个句子（INPUT4）一起存储。

我对代码不满意，因为LD仅计算语法级别（还有哪些其他方法？）。语义如何合并（就像公共汽车有点像车辆？）。

代码在这里：

%# As the difference is computed, a decision is made on the new event %# (string 2) to be ignored, to replace existing event (string 1) or to be %# stored separately. The higher the LD metric, the higher the difference %# between two strings. Of course, lower difference indices either identical %# or similar events. However, the higher difference indicates the new event %# as a fresh event. %#......................................................................... %# Calculating the LD between two strings of events. %#......................................................................... L1=length(str1)+1; L2=length(str2)+1; L=zeros(L1,L2); %# Initializing the new length. g=+1; %# just constant m=+0; %# match is cheaper, we seek to minimize d=+1; %# not-a-match is more costly. % do BC's L(:,1)=([0:L1-1]*g)'; L(1,:)=[0:L2-1]*g; m4=0; %# loop invariant %# Calculating required edits. for idx=2:L1; for idy=2:L2 if(str1(idx-1)==str2(idy-1)) score=m; else score=d; end m1=L(idx-1,idy-1) + score; m2=L(idx-1,idy) + g; m3=L(idx,idy-1) + g; L(idx,idy)=min(m1,min(m2,m3)); % only minimum edits allowed. end end %# The LD between two strings. D=L(L1,L2); %#.................................................................... %# Making decision on what to do with the new event (string 2). %#................................................................... if (D=5 && D

2019-12-10, 20:48	#1
poster 高级会员注册日期: 2019-11-21 帖子: 3,025 声望力: 67	如何计算两个句子（句法和语义）之间的相似度我应该每次取两个句子并计算它们是否相似。类似地，我的意思是在语法和语义上。输入1：奥巴马签署法律。奥巴马签署了一项新法律。 INPUT2：总线在此处停止。一辆车停在这里。输入3：纽约大火。纽约被烧毁了。输入4：纽约大火。 50人死于纽约大火。我不想将本体树当作灵魂。我写了一段代码来计算句子之间的Levenshtein距离（LD），然后确定第二句是否：可以忽略（INPUT1和2），应该替换第一句（输入3），或者与第一个句子（INPUT4）一起存储。我对代码不满意，因为LD仅计算语法级别（还有哪些其他方法？）。语义如何合并（就像公共汽车有点像车辆？）。代码在这里： %# As the difference is computed, a decision is made on the new event %# (string 2) to be ignored, to replace existing event (string 1) or to be %# stored separately. The higher the LD metric, the higher the difference %# between two strings. Of course, lower difference indices either identical %# or similar events. However, the higher difference indicates the new event %# as a fresh event. %#......................................................................... %# Calculating the LD between two strings of events. %#......................................................................... L1=length(str1)+1; L2=length(str2)+1; L=zeros(L1,L2); %# Initializing the new length. g=+1; %# just constant m=+0; %# match is cheaper, we seek to minimize d=+1; %# not-a-match is more costly. % do BC's L(:,1)=([0:L1-1]g)'; L(1,:)=[0:L2-1]g; m4=0; %# loop invariant %# Calculating required edits. for idx=2:L1; for idy=2:L2 if(str1(idx-1)==str2(idy-1)) score=m; else score=d; end m1=L(idx-1,idy-1) + score; m2=L(idx-1,idy) + g; m3=L(idx,idy-1) + g; L(idx,idy)=min(m1,min(m2,m3)); % only minimum edits allowed. end end %# The LD between two strings. D=L(L1,L2); %#.................................................................... %# Making decision on what to do with the new event (string 2). %#................................................................... if (D=5 && D