Quantcast
Channel: SCN: Message List
Viewing all articles
Browse latest Browse all 3273

Re: Fuzzy search

$
0
0

Hi Vanda,

 

Not sure what you're looking for but varchar/nvarchar can be used with fuzzy search as long as you have fulltext index on. That's in the documentation.

 

If I get this right you're tring to compare the columns of your rows ( one at a time since contains does not allow column value as second parameter). Despite the usage of 'and' on your code (seems like you actually want to combine those with 'or' predicate instead) it looks you got it going. Did I get this wrong?

 

Seems the scenario could be done with like_regexp as well... not sure if you have looked into that already.

 

I tried to build up some data and use your procedure to get the desired results. Sligthly changed it to get there. But honestly I'm not sure this is what you're looking for. This is what I got so far:

 

drop table population cascade;
create column table population (  id integer,  maiden varchar(100),  last1  varchar(100),  last2  varchar(100)
);
create fulltext index mindex on population(maiden);
create fulltext index l1index on population(last1);
create fulltext index l2index on population(last2);
insert into population values(1,'Johnson' , 'Smith'   ,'Johnson-Smith');
insert into population values(2,'Johnson' , 'Johnson-Smith'  ,'Smith'        );
insert into population values(3,'Jones'   , 'Frank'   ,'Jones'        );
insert into population values(4,'Anderson', 'Miller'   ,'Miller'       );
insert into population values(5,'Jenny' , 'Miller'   ,'Sanders'       );
insert into population values(6,'Susan' , 'Harris'   ,''       );
drop table populationtemp;
create global temporary table populationtemp (  id integer,  maiden varchar(100),  last1  varchar(100),  last2  varchar(100),  score double
);
do  begin  declare cursor cur for select * from population;  truncate table populationtemp;  for i as cur do  insert into populationtemp   select *, score() as score from population m where m.id = i.id and  -- compare only same id, and optional scenarios  (m.last1 <> '' or m.last2 <> '') and -- ignore when last1 or last2 are empty  ( contains(m.last1,i.last2, fuzzy(0.8)) or               -- last2 contained in last1   contains(m.last2,i.last1, fuzzy(0.8)) or -- last1 contained in last2   contains(m.maiden,i.last1, fuzzy(0.8)) or -- maiden contained in last1   contains(m.maiden,i.last2, fuzzy(0.8)) ); -- maien contained in last2  end for;  select * from populationtemp;  end;

For the test I got the result below:

 

IDMAIDENLAST1LAST2SCORE
1JohnsonSmithJohnson-Smith0.32500001788139343
2JohnsonJohnson-SmithSmith0.34285715222358704
3JonesFrankJones0.5
4AndersonMillerMiller0.7071067690849304

 

I guess it's better for you to elaborate your scenario a bit more. A few sample data, the create statement of your population table and the desired output would possibly help out.

 

BRs,

Lucas de Oliveira


Viewing all articles
Browse latest Browse all 3273

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>