Making documents searchable (Full Text search & Indexing) on Sql Server – and the use of iFilters July 8, 2009
Posted by boopalanj in .NET, Sql Server.Tags: Full text Indexing, iFilter, Sql Server Full Text search, XML iFilter
add a comment
You would know that you can store documents in a Sql Server table [with the use of image data type column]. Sql server does not usually have any problem in Indexing and searching MS Office document types such as .doc, .xls, .docx, .ppt , etc [if they are stored in the table].
We can search the documents using CONTAINS keyword as shown in the following query:
select o.ID, o.Description, o.File_Content, o.File_Extension, o.[File_Name] from Files o
where contains(o.File_Content,’boopalan‘)
This would return all the (document) records of the table, where the word boopalan is found in the content column.
But before this you have something to do. A few steps. That is, you should enable the database for full text search and indexing so that indexes are created (by crawling) and the documents are made searchable. Next, you should create a full text catalog. Third, we should identify the just-created full text catalog with the table that contains the documents. Fourth, we should identify the full text column (content column) in the table that holds the documents.
Following set of queries will happily do it. [Make sure, your sql server Full Text service installed.] Comments are included below.
exec sp_fulltext_database ‘enable’ — enables the full text in the database
exec sp_fulltext_catalog ‘cFiles’, ‘create’ — creates a catalog called cFiles
exec sp_fulltext_table ‘Files’, ‘create’, ‘cFiles’,'PKFiles’ — Files table is identified with the cFiles catalog. PKFiles is the primary key created in the Files table
exec sp_fulltext_column ‘Files’, ‘File_Content’, ‘add’, 1033, ‘File_Extension’ – File_Content column is identified as the content column and File_Extension column is identified as the extension column which holds the file extensions.
After this, we need to run the below set of procedures for starting the crawling and indexing process.
exec sp_fulltext_table ‘Files’, ‘activate’ — activates the full text in the table
exec sp_fulltext_table ‘Files’, ’start_change_tracking’ — change tracking is enabled
exec sp_fulltext_table ‘Files’, ‘Start_background_updateindex’ — updating index is started
Fine. Now, after sometime you may start searching the documents with the first query (CONTAINS) .
Use of iFilters
Sometimes, we need to store documents other than MS Office formats. For ex., there was a scenario for us to store XML documents / HTML documents. [To be precise, they are WordXML documents, i.e., Word documents stored as XML]
Fine, now no doubt that you’ll want to search through these documents as well. In that case, with the normal indexing abilities, the sql server will return all the XML records, if you give the key word ‘XML‘ because all these documents contain / start with <XML tag. You are not interested in searching through the XML nodes’ names, rather you want to search through the content that the XML nodes contain. You get it?
As another example, you store the HTML documents (in HTML format). When you see a HTML in browser, it has some text and an Image. The text is about something on Healthcare. With the normal indexing, if I enter the keyword ‘img‘ and search, it would return most of the records including the above said file, because it contains an Image [which is included by a <img HTML tag], although you were not interested in seeing a text of Healthcare. The same kind of thing happens when you enter keywords like ‘link‘, ‘meta‘ etc.
So, you need to filter out the XML / HTML nodes / tags and search the actual content. And that is what iFilters do. IFilters when installed / registered, help the indexing process according to the extension of file. For ex., an XML iFilter will help indexing the actual content of XML files leaving out the XML tags.
An XML iFilter is available in the internet – XMLFilter.dll [Microsoft XML Content Filter for Sharepoint Portal Server 2001].
- We need to register it (using regsvr32) on the server machine where Sql Server is.
- We need to rebuild the catalog [created in the previous half of this post] using the following queries.
exec sp_fulltext_database ‘disable’ — disables the full text in the database
exec sp_fulltext_database ‘enable’ — re-enables the full text in the database
sp_fulltext_catalog ‘cFiles’, ‘rebuild’; — rebuilds the catalog
sp_fulltext_catalog ‘cFiles’, ’start_full’; — starts full crawl
sp_fulltext_table ‘Files’, ‘update_index’; — updates the indexes again
This enables you to search through XML files as well using the contains query mentioned above.
If you want to know what the full text catalogs are that were created already in a database, you may use the below query.
exec sp_help_fulltext_catalogs
See ya!
How we developed a Proxy site (kinda Anonymizer) Part II.. July 6, 2009
Posted by boopalanj in .NET, ASP.NET.Tags: Anonymizer in ASP, Proxy site for file download, WebClient class .NET
add a comment
Aarthi is yet another good resource from our team. [Aarthi is sitting besides me. Readers are requested to use their discretion while reading the previous sentence considering my situation while writing it ;-)].
You know what, she needed help in downloading a file which sized more than 5 MB [6.8 MB]. In our corporate, downloading is restricted to 5 MB according to the text it shows [although it does not allow even 2 MB download sometimes. Such a mad filter!]. And she wanted to check out if our so-called Model anonymizer [in the second post of this blog] can download that and give it to her.
Normally, we tech-kiddos, obey an organization’s policy more than anything. But what Aarthi said was – she got impressed with our blog. Impressed? by our technical Blog? For us, it is a rare event. Forget Organization’s policies, and forget the organization too, if someone is going to be impressed with what we are doing. We don’t want to refuse that this site is kinda aladdin’s genie that would bring the file for her.
So, I sat to mofify the existing code – to download the files and throw it to the response, if url is given. There, I came to know, that, there is one more great class – WebClient - which allows downloading of file to a local folder from a Url. All these WebRequest, Webresponse and WebClient classes are available under System.Net namespace. The class WebClient also has a method – DownloadFile(string url, string filepath). What else is needed!
I quickly dragged a Label, TextBox and Button and in the OnClick event, I wrote the following code.
public void btnGo_OnClick(object sender, EventArgs e)
{
string strURL = txtUrl.Text;
if (strURL.Equals(String.Empty) || (!strURL.StartsWith(“www.”) && !strURL.StartsWith(“http:”) && !strURL.StartsWith(“https:”)))
{
return;
}
WebClient wbc = new WebClient();
wbc.DownloadFile(strURL, Server.MapPath(“ImagesDyn/”+ strURL.Substring(strURL.LastIndexOf(“/”)+1)));
FileStream fs = new FileStream(Server.MapPath(“ImagesDyn/” + strURL.Substring(strURL.LastIndexOf(“/”) + 1)), FileMode.Open);
Byte[] btFile = new Byte[fs.Length];
fs.Read(btFile, 0, (int)fs.Length);
Response.AddHeader(“Content-Disposition”, “attachment;filename=” + strURL.Substring(strURL.LastIndexOf(“/”) + 1));
Response.Buffer = true;
Response.ContentType = “application/octet-stream”;
Response.BinaryWrite(btFile);
fs.Dispose();
wbc.Dispose();
File.Delete(Server.MapPath(“ImagesDyn/” + strURL.Substring(strURL.LastIndexOf(“/”) + 1)));
}
The file is downloaded to the local folder ‘ImageDyn‘ with the same file name. If I point to the Url of the Downloaded File using Response.Redirect(“”), it would get blocked through the download filter(although I did not check that. So, please don’t mind if it is wrong). So I used FileStream, read the bytes into the byte array, and pushed it to the response. [I also delete the file at the end of process, but the hosting server did not allow to do this, so commented the last line.]
The result – below: [It took hell lot of time before this dialog appeared. [Downloading to its local folder was slow]]
How we did Image Processing in ASP.NET – Part II.. July 5, 2009
Posted by boopalanj in .NET, ASP.NET, Image Processing.Tags: Bitmap Class in .NET, GDI+, Image Processing in ASP.NET
add a comment
How we did Image Processing in ASP.NET.. July 5, 2009
Posted by boopalanj in .NET, ASP.NET.Tags: Bitmap Class in .NET, GDI+, Image Processing in ASP.NET
1 comment so far
There is something often said among IT professionals. [Warning: If you are a girl from my team and if you are weak-hearted, Please skip reading the next line.] Pretty girls always work in your neighbour team / project [and not in your (my) team]. Similarly, it is always the project of someone else, which will be very interesting to hear and work. You always happen to be in a montonous, boring-to-discuss and die-rather-to-code project. Even if you adjust with it as you do with your wife, it keeps on showing the ugly yellow face [page] to you, whatever the degree of your adventure in coding.
Such a project was one I discussed with my friend Mr.Anand. Have you seen Zazzle.com? If not, please drop by once. It is a piece of good work. It is an online selling portal – which sells things such as t-shirts, cups, shoes, and what not, after allowing the users to customize it with desired image / text. Customization can be done in the site. The great thing is, if you upload an image and try to put into the t-shirt or any item, the image makes itself twisted / twirled / folded / rotated according to the background [a male / female model usually] body’s edges, i.e., folds or bends in the body. Anand had a similar requirement in a .NET project.
It is always the technology / challenge that motivates a good developer, and you see, we fall into the category of ‘very-good’ developer [ahem..!], so I started the development of it on my own interest being eager to know how to twist / resize images as per our wish programmatically. Here I give the code pertaining to the basic manipulation of an image using GDI+ [or whatever the hell..]. The end-result I achieved for the above requirement is something more than this ‘basic’ manipulations. It’ll be a good start for you to learn the processing of Image using ASP.NET.
There is this class in .NET – Bitmap, which allows us to create an image object in the memory, and load a physical image / create our own image. Available under System.Drawing Namespace. [you'll also need System.Drawing.Drawing2D, System.Drawing.Imaging namespaces for some features of processing]. Using Graphics class, you can avail several functionalities such as – Rotate, Resize (Scale), Clip, Draw shapes such as Ellipse, Rectangle, Path, Polygon, another Image, etc, Fill on an Image surface. These are things that come out-of-box, and you may learn with few tutorials.
There are two important methods under a Bitmap object. GetPixel and SetPixel. As you may understand, an Image is a large collection pixels ordered regularly / irregularly to produce a pattern. Each pixel has its own color. GetPixel allows you to get the color at point X,Y, whereas SetPixel allows you to set a color at Point X,Y. By manipulating these two methods, you may produce distorted effects in an image, such as bending, changing the shape into a parallelogram, or sine wave etc. Let’s see an example.
Load an physical image into Bitmap class.
Bitmap b = new Bitmap(Server.MapPath(“AJ49075_5.jpg”));
Create a new Bitmap class to hold the changes. It may be somewhat bigger in size than the above bitmap, since distortions (like twisting) may need more space. For ex, if you make a rectangle into a parallelogram, the end-to-end width of the image is more than the actual width of the rectangle.
Bitmap bNew = new Bitmap(100 + b.Width, 150 +b.Height, PixelFormat.Format64bppArgb);
And, let us assume that, you want to make the rectangular image into a parallelogram. We can achieve this, by having a slope in the X-axis of the image. To have a slope in an image, the X particles need a displacement by some distance horizontally when their Y particles stay in their actual position. Let us do that. Before doing it, let us make the new image’s background as transparent, otherwise the background will look black in the result. Add the below code.
for (int x = 0; x < bNew.Width; ++x)
{
for (int y = 0; y < bNew.Height; ++y)
{
bNew.SetPixel(x, y, Color.Transparent);
}
}
Now, Let us displace the particles through their X-axis. We have to get the pixel and its color from the old image (b), displace it, and paste it in the new image (bNew).
for(int y = 0; y < b.Height; ++y)
{
for (int x = 0; x < b.Width; ++x)
{
Color c1 = new Color();
c1 = b.GetPixel(x, y);
int xpix, ypix;
xpix = x/2 + y;
ypix = y;
bNew.SetPixel(xpix, ypix, c1);
}
}
Throw it to the Response, as shown below.
Response.ContentType = “image/jpeg“;
bNew.Save(Response.OutputStream, ImageFormat.Jpeg);
You can save it to a physical location, or even to a Stream object. ImageFormat can be changed to GIF, PNG, BMP, etc.
Now, the results:
Actual Image:

Result Image after displacement code above:
How we used the Query for Paging Technique to take the ‘N’th Row – Sql July 4, 2009
Posted by boopalanj in Sql Server.Tags: Over keyword, Paging in Sql query, Row_Number(), Sql Server Nth Row
2 comments
In one way, ‘tech kiddo jith’ is an explorer. He does not hesitate to immediately get into exploration if someone throws at him a challenge which is of greater interest to him, and in the process he does not hesitate to pester me to find out the answer.
One of such things was – how to find out ‘N’th row in a table [or in a set of rows resulted by a query]. He was also adamant not to use any techniques such as ‘taking top N rows – reverse it- take first row -some bla bla‘.
Fortunately, I was aware of this query used for Paging technique[Which I learnt from one of my other friends], i.e, to take records between a starting number and ending number. Ex., between 10 to 20, 20 to 30 , etc.
SELECT ArticleId, ArticleName, ArticleContent
FROM (
SELECT A.ArticleId, A.ArticleName, A.ArticleContent,
ROW_NUMBER() OVER(ORDER BY A.ArticleId) AS RowNumber
FROM Articles A) ArticlesRows
WHERE RowNumber > @StartRow
AND RowNumber < @EndRow
I modified this and showed it to him.
SELECT ArticleId, ArticleName, ArticleContent
FROM (
SELECT A.ArticleId, A.ArticleName, A.ArticleContent,
ROW_NUMBER() OVER(ORDER BY A.ArticleId) AS RowNumber
FROM Articles A) ArticlesRows
WHERE RowNumber = @Number
By using Row_Number() OVER(ORDER BY ColumnName) what we do here is, we number each row (like giving a serial number) after ordering the column ‘ColumnName‘ [either ascending / descending as desired]. Ordering is done so that the row number does not change for a row while paging. [If a new row is added to the table in the middle, numbers will change however, but will not affect paging.]
Using the same Row_Number() function, we take the ‘N’th Row.
How we developed a Model Proxy site (Anonymizer) with ASP.NET.. July 4, 2009
Posted by boopalanj in .NET.Tags: Anonymizer in ASP.NET, Download Images in code, Proxy site, Read web page in .NET
15 comments
Being part of a corporate network sucks big time often. Whether you flirt with a girl through an online messenger, or you intend to visit a job portal although you joined here only a week back, or you want to check out personal mails which have your ugly friends’ group photos, or you want to go through a news site to find out how the actor planned out the rape of his maid, or you need to download a ‘blocked’ image / document, etc., You really hate when the corporate puts you a handcuff using their filter software.
We too did. ‘Fellow tech-kiddo’ jith too did. But not for all those dirty reasons stated above. Unlike you, jith had a genuine reason. Jith does not always have a reason, and now that he has a ‘genuine’ reason – to read a post that explains how to use a ListView control and its features. Now you believe it? Even God does not.
In our corporate network, even some genuine technical stuff too get blocked often, just because the url contained a ‘blog’ word, or because the network admin did not understand it.
Jith wanted to read a post from ScottGu’s blog. It is not fully blocked, yet he could read the content of it, but it had nearly 20 explanatory Images (whose Urls were blocked) to demonstrate the steps without which (or even with which) our jith was not able to understand it properly. He came to me who was sitting almost jobless by that time. Jith wanted to have a proxy site [anonymizer] which will be useful for that moment [and will be very useful for the later moments of the things said in the first paragraph].
If I was sitting almost jobless at a time, it meant that all the anonymizers known to me were blocked by the network. Wonder why jith did not comprehend this. So having said explicitly that I did not have one then, I also suggested him that it was better to go with our own proxy site for the purpose. Even though I said for fun, he, like a curious & exploring kid, asked how to do that. You see, we two simply love wasting time.
I said,
“- There are few methods in .NET to read a web page when given the URL, and you can push it to the response.
- So build an aspx page which does that, and host it in a web hosting site (which is obviously outside of our network).
- Since your page is not blocked by the network, it’ll load and run the code to read the blocked-page and show you the content of the page.
- If Images (with their URLs) are blocked, we’ll have to download the images to the local folder of the site and map the local Url in the HTML source of the page.
- We also have to keep switching the DNS / IP one after another so that When our network blocks one of ours, we can move to the next.“
Jith the techie, cried “Great“. Flattery comes naturally for us. [If I am bored, I'll ping jith in messenger and say "Jith, You are a born genious!", and in reply he would say "Boo, you are the greatest!". After enough of it, we'll get back to work]. He was quite amused by sound of the idea and he wanted to check out the implementation immediately. I like it. I love the spirit of Jith, not for that he wanted to implement an interesting idea, but for that he is a pro in finding out reasons to defer the official tasks at hand.
Ok. Let me explain the code we did. There are two classes in .NET, WebRequest and WebResponse which help us to read a web page when given with a URL. You may give the URL of a web page, image, or even a file. It is going to return you a Stream.
The method WebRequest.Create(“http:// www.URL.com/Page.aspx“) will create a programmatic request to Page.aspx under URL.com site, and will return an object of WebResponse type.

While this code shown in the Image will perfectly read a web page and show it to us, our intention was not that.
When you host the aspx page which contains this code in a webhosting site [free / paid service], since this page under the hosting provider is not blocked by the corporate network, this code will run and it’ll push the content of the blocked site [HTML] to your browser window.
But, our very purpose of doing all these, was to see the Images which were blocked [with their URLs]. Now, We have to examine through the HTML source of the page programmatically, get the URLs of images wherever found, download them to Local folder in the hosted site, change the pointing image Url in the HTML source to Url of local folder Images.
See, It was not easy for me to type the above passage, and I was sure, it was not going to be easy to implement it in .NET. It is always easier said than coded.
Because, to download the images mean – that you scan through the HTML source of the page, and find out all the image URLs.
- Sometimes they are static Urls. Sometimes they get generated dynamically.
- Image HTML tags may be in any of these forms - <Img src=”file” or <Img src=file or <Img src=’file’
- CSS syntax is also similar to any of the above forms. [We omitted to download CSS files and apply]
- Some Images may be inside Style attribute, for ex: style=”background-image:url(../image.jpg);”
- Some Images may come inside CSS files, again wrapped by ‘url(..Image.jpg)‘ like above.
But, as you understand and our title suggests, we are building a ‘Model’ anonmyizer [!] and we decided to go with only the first case. Finding out all images through their Img HTML tag and downloading them. I had to scan through all the SOURCE HTML of the dynamically-read page, and spot out all the src=”http://site/image.jpg” attributes using string processing [IndexOf, Substring functions helped] .
After Reading the URLs, again it was the turn to download them one by one. We again used WebRequest and WebResponse, and with the use of Bitmap class we saved it to local folder. We again replaced the old site URLs of the images in the source to point to our Local folder Images so that they don’t get blocked.
Humphhh… It was really tough. Ok, finished, Ctrl + F5 the page to see the output.. Loading… Loading.. Waiting.. Loading.. Hello Mr.Browser, are you out of your mind? Come on! It is almost 28 Images of reasonable size, and it loaded finally after taking hell lot of time to download.
What can be done? Hmm. Yes, Threading.. Bhuhu Haa Haa Haa.. [Evil laughter..]. Downloading function was made to run in another (background) thread. That is all man!! Boo-Jith. It worked! You built a model anonymizer to read the page and images which were blocked inside the network.
But, after downloading the images, did Jith learn the topic of listview which he was in need of? Hmm.. Why do you even ask that?
Here is the sample aspx.cs file we did: [In word format, since wordpress does not allow zip / cs media type]
Model anonymizer site using ASP.NET
Note:
- A sample implementation can be found here [http://aspspider.info/surjith/DefaultBoo.aspx]. ‘Surjith’ is a free account under aspspider, and will expire after 90 days. So, don’t be searching for it, if you get to read this post after 3 montsh of its post-date.
- It is important that you deploy (host) this page outside of your network. Any free / paid web hosting service will do.
- And, There has to be a local folder ‘ImagesDyn’ in your project to store the images. You may change the name in the code as well, if needed.
- And Images download to relative Url – so, please take time to change the ’surjith’ in the following line
strResult = strResult.Replace(strItem,
“/surjith/ImagesDyn/” + iFileCount.ToString() + strExtension); //[Line No. 84]
See ya..!
Response.Write(“Hello world!”); July 3, 2009
Posted by boopalanj in Intro.add a comment
Title sounds weird?
Then we understand that you are not familiar with ASP.NET or even .NET or even the ” . ” (dot). We are afraid that this blog can hardly serve anything to you and we believe you can hardly serve anything to this (our blog) world [other than an extra hit to the blog]. So take a break and come after 3 or 4 years. If we have a skill-set change because of sunken economy and If wordpress still exists by then, we may be blogging in the technology of your interests
Title looks familiar?
Then you are for it. We see that you have at least an acquaintance with .NET. Don’t worry, we also have nothing more than that. Only difference is we write it in a blog, and you don’t. Probably this blog may fail to dish you out a .NET feast, but it may provide a side-dish, whether it suits the main course or not, for sure.
We suggest you go through it, take a read on the topics that interests you, and see if it helps you in any way. And, we don’t refuse if you choose to donate this blog for the work it did and it is going to do
Till we start to write something, We.Wish.GoodDay();











