Jump to content

percentage difference between strings


joe92

Recommended Posts

Hmm, not quite sure how to do this one...

 

If i have lots of strings saved in a mysql table and one was...

"A man and a dog took a nice walk in the park"

 

If a user then wanted to input another string into the table, but i wanted to check the new string was at least 5% different to all other in the table, how would i about doing this?

 

Is it possible to do this via mysql, or would i have to pull out all the strings into a php array and process it that way... somehow?

 

Link to comment
Share on other sites

First, you will have to define "different"

a MAN AND A DOG TOOK A NICE WALK IN THE PARK

that string is 100% different from the one you supplied ... or it is 100% the same. Or is it somewhere in between?

 

 

Once you have defined "different", you have to decide how to quantify the difference. To calculate a percentage, you have to be able to count the "differences" and divide.

 

 

After you have defined and quantified "different", then we can answer the question as to whether or not we can calculate and test the amount of difference using SQL.

 

Link to comment
Share on other sites

There is an algorithm called levenshtein which does exactly this... somewhat.

 

What it does is give you "the minimal number of characters you have to replace, insert or delete to transform str1 into str2".  Once you have this number, you can then compare it with the actual length of the string.

 

So for instance,  the levenshtein between "I eat food" and "I ate food" would be 3 (I believe).  You would then take that number and divide it by the length of "I eat food".. AKA str1. 

3/10 = 30-33% different.  At least that's how I'd do it, I'm sure there are better ways.

 

Link to comment
Share on other sites

This thread is more than a year old. Please don't revive it unless you have something important to add.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.