Skip Navigation LinksHome - Raja Venkatesh | DamLevDis Description

Search for String containing spelling mistakes in SQL Server

Assumption

1.   Have MS SQL Server with data. Having use case to search across data but data could contain spelling mistakes.

2.   Have basic knowledge of writing SQL Select statements.

3.   Permission to add SQLCLR objects to SQL Server.  You may contact your DBA to seek access to import SQL CLR into your SQL Server.

4.   You have Visual Studio 2015 Community edition (Free) or better in order to compile provided source code to generate SQL CLR object.

Clarification

1.   The idea of this exercise is to create a tool inside SQL server to be used within your SQL query which can be called via SSMS or from another program via ADO.NET/JDBC.  It addresses only spelling mistakes and does not address abbreviations, word swaps etc.  Ex: “McDonald” against “MacDonald” can give a better percentage match.  But expecting a match of “Sgt John” against “Sergeant John” may not be right criteria.

Expected Usage

A query as below in SSMS may return “71.43” stating that the passed two strings match 71.43%

select dbo.DamLevDisCLR('raja venkatesh','raju vankudesh',1)

The third parameter may be 1 or 0.  1 is to Ignore Case.  0 is to Match Case.

Getting it work

You may download the DLL attached with this project and use it within SQL or copy the source code given here, compile using Visual Studio 2015 Community or better edition, create the DLL and use it.

Using the downloaded DLL.

Once you download the DLL, you may have to right click it in Windows explorer, go to properties screen and unblock the security warning.

In SSMS, Goto Database\Programmability\Assemblies

Right click Assemblies node, and click “New Assembly”.  You’ll be prompted with a window as below.

Use Browse button and choose the downloaded DLL, key in appropriate permissions, etc. and click OK.

If everything is fine, you should see newly added DLL as in below screen in SSMS.

At start (in SQL statement), it is expected to throw error as below.

Msg 6263, Level 16, State 1, Line 1

Execution of user code in the .NET Framework is disabled. Enable "clr enabled" configuration option.

 

Enabling CLR

In SSMS run below SQL commands to enable CLR.  You may have to get this done by DBA and she is not expected to enable CLR either.  You may have to show the source code and relevant documentation to get approvals from relevant authorities in your organization.

 

sp_configure 'show advanced options', 1;

Go

Reconfigure;

Go

sp_configure 'clr enabled', 1;

Go

Reconfigure;

Go

 

Configuration option 'show advanced options' changed from 0 to 1. Run the RECONFIGURE statement to install.

Configuration option 'clr enabled' changed from 0 to 1. Run the RECONFIGURE statement to install.

 

Use the function

In SSMS, run the below SQL queries.

 

select dbo.DamLevDisCLR('raja venkatesh','raju vankudesh',1) as MatchRel

select dbo.DamLevDisCLR('Cat shot bat', 'The Bat kills Cat',1) as MatchRel

 

Match Relevance: 71.43 %

Match Relevance: 35.29%

Source Code

1.   Fire new Visual Studio 2015 (Community or any edition).

2.   In Tools menu, goto Options.  In Options window goto Environment\Keyboard and choose “Visual Basic 6” as keyboard mapping scheme.  ß This step is not necessary though.

3.   Click File\New\Project.  In new Project window, choose “SQL Server” under Templates.  If you don’t have “SQL Server”, you may have to figure out how to get the templates.

4.   Rename the Solution and Project default names to “DamLevDisCLR” respectively.

5.   Double click the properties in Solution Explorer.  In Project Settings, make sure you have selected “Target Platform” as “SQL Server 2014” or above.

6.   In the same page, click “Database Settings” and in the appearing window, go to third tab “Miscellaneous” and ensure “Enable full text search” is selected.

7.   In Properties\SQLCLR page, change the language from “C#” to “Visual Basic”.

8.   In Solution Explorer, right click the project “DamLevDisCLR” and add new item via Add\New Item. In the appearing list of templates, under section “SQL CLR VB”, selected template “SQL CLR VB User Defined Function”.  Name the item as “DamLevDisCLR.vb”

9.   Write the code as below inside the new class.

Imports System

Imports System.Data.SqlTypes

Imports System.Math

Partial Public Class UserDefinedFunctions

    <Microsoft.SqlServer.Server.SqlFunction(IsDeterministic:=True, IsPrecise:=False)>

    Public Shared Function DamLevDisCLR(Leftstr As SqlString, RightStr As SqlString, IgnoreMatchCase As SqlBoolean) As SqlDouble

        Try

            If Leftstr.IsNull OrElse RightStr.IsNull Then Throw New ArgumentNullException

            Dim InLeftStr As String = Leftstr.Value.Trim.ToString

            Dim InRightStr As String = RightStr.Value.Trim.ToString

            Dim LeftLen As Integer = InLeftStr.Length

            Dim RightLen As Integer = InRightStr.Length

            Dim PrcMatrix As Integer(,) = New Integer(LeftLen, RightLen) {}

            Dim PrcCost As Integer = 0

            If LeftLen + RightLen = 0 Then

                Return 100

            ElseIf LeftLen = 0 Then

                Return 0

            ElseIf RightLen = 0 Then

                Return 0

            End If

            Dim LeftIdx As Integer = 0

            Dim RightIdx As Integer = 0

            Do While LeftIdx <= LeftLen

                PrcMatrix(LeftIdx, 0) = LeftIdx

                LeftIdx += 1

            Loop

            Do While RightIdx <= RightLen

                PrcMatrix(0, RightIdx) = RightIdx

                RightIdx += 1

            Loop

            LeftIdx = 1

            Do While LeftIdx <= LeftLen

                RightIdx = 1

                Do While RightIdx <= RightLen

                    If String.Compare(InLeftStr(LeftIdx - 1), InRightStr(RightIdx - 1), IgnoreMatchCase.Value) = 0 Then

                        PrcCost = 0

                    Else

                        PrcCost = 1

                    End If

                    PrcMatrix(LeftIdx, RightIdx) = Math.Min(Math.Min(PrcMatrix(LeftIdx - 1, RightIdx) + 1, PrcMatrix(LeftIdx, RightIdx - 1) + 1), PrcMatrix(LeftIdx - 1, RightIdx - 1) + PrcCost)

                    RightIdx += 1

                Loop

                LeftIdx += 1

            Loop

            Return Math.Round((1.0 - (Convert.ToDouble(PrcMatrix(LeftLen, RightLen)) / Convert.ToDouble(Max(LeftLen, RightLen)))) * 100, 2)

        Catch Ex As Exception

            Return -1

        End Try

    End Function

End Class

Note: All the C# code for the same algorithm available in internet would use FOR loop.  Transliterating the same code would also yield a FOR loop.  However, I’ve changed the FOR loop to “DO WHILE” as they are implemented differently in each language.  You can notice DO WHILE loop performs faster than FOR loop in VB.

10. Hope the code compiles fine.  Now you have to publish the Assembly to SQL Server in order to use it.  Right click the project and click “Publish” menu.  You should see a screen as below.

You may Create/Save this profile for future publishes.

11. Once published, the newly Assembly should be available within SSMS.  Go to section “Use the function” in this document to find how to use this function.

Additional queries

1.   Let us say we have a table [BioData] with one of the column called “LName” as String datatype (VARCHAR).  The table is Full Text Indexed and the column “LName” is part of the FTS.  You may want to search for a person whose spelling is partially known to you or partially correct in another table.  A simple FTS query in such case can be

 

select * from BioData where dbo.DamLevDisCLR(LName,'schwarznagger',1)>=80

 

In this case names matching with spelling tolerance of 20% (match with 80%) is listed.

 

What’s next?

In next topic, we shall do a “Unstructured” Data storing in SQL server, perform

1.   Full Text queries against the unstructured data.

2.   Store Non-Text data like Images, Audio etc. and perform Full text query against the same data.

3.   Combine the current “DamLevDisCLR” function with the same and perform a FTS query against unstructured data with spelling mistakes.