Hi Vanda,
Not sure what you're looking for but varchar/nvarchar can be used with fuzzy search as long as you have fulltext index on. That's in the documentation.
If I get this right you're tring to compare the columns of your rows ( one at a time since contains does not allow column value as second parameter). Despite the usage of 'and' on your code (seems like you actually want to combine those with 'or' predicate instead) it looks you got it going. Did I get this wrong?
Seems the scenario could be done with like_regexp as well... not sure if you have looked into that already.
I tried to build up some data and use your procedure to get the desired results. Sligthly changed it to get there. But honestly I'm not sure this is what you're looking for. This is what I got so far:
drop table population cascade; create column table population ( id integer, maiden varchar(100), last1 varchar(100), last2 varchar(100) ); create fulltext index mindex on population(maiden); create fulltext index l1index on population(last1); create fulltext index l2index on population(last2); insert into population values(1,'Johnson' , 'Smith' ,'Johnson-Smith'); insert into population values(2,'Johnson' , 'Johnson-Smith' ,'Smith' ); insert into population values(3,'Jones' , 'Frank' ,'Jones' ); insert into population values(4,'Anderson', 'Miller' ,'Miller' ); insert into population values(5,'Jenny' , 'Miller' ,'Sanders' ); insert into population values(6,'Susan' , 'Harris' ,'' ); drop table populationtemp; create global temporary table populationtemp ( id integer, maiden varchar(100), last1 varchar(100), last2 varchar(100), score double ); do begin declare cursor cur for select * from population; truncate table populationtemp; for i as cur do insert into populationtemp select *, score() as score from population m where m.id = i.id and -- compare only same id, and optional scenarios (m.last1 <> '' or m.last2 <> '') and -- ignore when last1 or last2 are empty ( contains(m.last1,i.last2, fuzzy(0.8)) or -- last2 contained in last1 contains(m.last2,i.last1, fuzzy(0.8)) or -- last1 contained in last2 contains(m.maiden,i.last1, fuzzy(0.8)) or -- maiden contained in last1 contains(m.maiden,i.last2, fuzzy(0.8)) ); -- maien contained in last2 end for; select * from populationtemp; end;
For the test I got the result below:
ID | MAIDEN | LAST1 | LAST2 | SCORE |
1 | Johnson | Smith | Johnson-Smith | 0.32500001788139343 |
2 | Johnson | Johnson-Smith | Smith | 0.34285715222358704 |
3 | Jones | Frank | Jones | 0.5 |
4 | Anderson | Miller | Miller | 0.7071067690849304 |
I guess it's better for you to elaborate your scenario a bit more. A few sample data, the create statement of your population table and the desired output would possibly help out.
BRs,
Lucas de Oliveira