Enabling Document Content Search
Who is this article for?IT Administrators looking to setup Document Content Search.
SQL Server access is required.
Document Content Search is a feature that allows you to extend the scope of the client search to find matches inside the content of a document. This article explains how to set up the functionality.
- Technical Specifications
- Search Capabilities
- Installing Filter pack for Office 2010 and PDF
- Enabling Document Content Search
- Registering Filters with SQL Server
- Performing Content Searches
- Troubleshooting
1. Technical Specifications
The Document Content Search feature makes use of SQL Server's Full-Text search capabilities. Due to this, you must have a version of SQL Server that supports Full-Text indexing and this feature must be installed and enabled for the instance of SQL Server. Most editions of SQL Server support Full-Text indexing. Fore example, SQL Server Express with Advanced Services and SQL Server Enterprise have full support.
2. Search Capabilities
By default, SQL Server supports Full-Text searching of the document file types shown further below. For example, it is not possible to perform content searches of PDF files and Word 2007/10 (.docx) files using SQL Server's default Full-Text functionality. As SQL Server is responsible for performing content searches, it is not possible to perform content searches of non-embedded file attachments. If you store your document files in network folders, you are not able to search within the content of those network files using content searches.
ASP, ASPX, CSS, HTM, HTML, HHT, HTW, HTX, ODC, STM, DOC, DOT, POT, PPS, PPT, XLB, XLC, XLS, XLT, TXT, ASM, BAT, C, CPP, CXX, CMD, DEF, DIC, H, HPP, XML, RTF, EML
Support for additional file types such as PDF and Word 2007/10 can be enabled by installing the required filter packs to support content searching of those formats. Filter packs are add-on software packages that allow SQL Server and other applications to interface with specific file types.
The procedure for installing filter packs varies depending on the version of Windows Server, SQL Server and the file type itself. The specific steps for your environment should be researched before attempting any installation. It's also important to note that filter packs are developed by 3rd parties and not all filter packs are supported in all environments. For example, if you are using SQL Server 2012 64-bit and Adobe hasn't developed a PDF filter compatible with this version of SQL Server, you will not be able to perform content searches for PDF files.
When performing content searches, Ideagen Quality Management uses the CONTAINS operator in SQL Server to search for matching document records. More information about the behaviour of this operator can be found in the the CONTAINS article linked to in the Further Reading section of this article.
3. Installing Filter pack for Office 2010 and PDF
- Download and install the Office 2010 filter pack from https://www.microsoft.com/en-my/download/details.aspx?id=17062
-
Download and install the PDF iFilter Pack from https://portaldownloads.s3.amazonaws.com/AdobePDF%20iFilter64%20Setup.zip
- Copy the path to the bin folder of the "Ifilter" installation
- Open Control Panel > System & Security > System > Advanced system settings > Advanced tab > click "Environment Variables…" button
-
In the System variables section select Path > Edit > New > paste the bin folder path > press OK to save on all open windows.
If you have installed PDF 11 iFilter Pack you must set your system PATH environment variable to the "bin" folder of the "Ifilter" installation.
4. Enabling Document Content Search
To enable Document Content Search, copy all of the SQL queries below into a New Query window in SQL Server and execute against the Ideagen Quality Management database you wish to enable this feature for.
Ensure you have clean and up-to-date database backups before attempting this. This is important as a number of changes are made to the database.
EXEC Sp_fulltext_database
'enable';
EXEC Sp_fulltext_catalog
'DocumentBodyCatalog',
'create';
EXEC Sp_fulltext_table
'RevisionAttachmentItem',
'create',
'DocumentBodyCatalog',
'PK_RevisionAttachmentItem';
EXEC Sp_fulltext_column
'RevisionAttachmentItem',
'FileData',
'add',
@type_colname = 'Extension';
EXEC Sp_fulltext_table
'RevisionAttachmentItem',
'activate';
EXEC Sp_fulltext_table
'RevisionAttachmentItem',
'start_full';
--This turns on change tracking
EXEC Sp_fulltext_table
'RevisionAttachmentItem',
'Start_change_tracking';
--This tells database to index any new/changed documents as soon as they are added/changed.
EXEC Sp_fulltext_table
'RevisionAttachmentItem',
'start_background_updateindex';
After executing the above SQL against the database, you should see the warning shown below in the Messages area of SQL Server. This warning is normal and indicates that Document Content Search has been enabled for the database. If you see a red error, please consult the Troubleshooting section of this article.
Table or indexed view 'RevisionAttachmentItem' has full-text indexed columns that are of type image, text, or ntext. Full-text change tracking cannot track WRITETEXT or UPDATETEXT operations performed on these columns.
5. Registering Filters with SQL Server
- Use sp_fulltext_service to load newly installed word breakers and filters in the server instance, as follows:
EXEC sp_fulltext_service @action='load_os_resources', @value=1;
- Use sp_fulltext_service to update the list of languages, as follows:
EXEC sp_fulltext_service 'update_languages';
After registering please restart the SQL service. Use the sp_help_fulltext_system_components system stored procedure to view a list of the filters that are currently registered to verify the installted filters have been successfully registered, as follows:
EXEC sp_help_fulltext_system_components 'filter';
6. Performing Content Searches
When the above steps have been performed to enable content searching, you can perform content searches in the Documents module. Other modules are not supported. To perform a content search, open the Documents module and click the small black triangle to the right of the Keywords field. When clicking this, you will notice a new Content search option in the list of searchable fields as shown below.
This Content search option is enabled (checked) by default and users will need to log back in to Ideagen Quality Management before they can see this option. If this option is checked, Keywords searches will search within file attachments and matching document records will be returned in the search results. If you don't wish to search within file attachments, uncheck the Content option before performing a Keywords search.
Using Ideagen Quality Management Web?
This is a legacy feature that was removed with Ideagen Quality Management Web v7. You can still perform the search in the Windows client.
7. Troubleshooting
Disable Document Content Search
If you encountered an error when enabling Document Content Search or if you're unsure if it enabled successfully, you can execute the SQL below against the Ideagen Quality Management database to safely roll-back any changes that were made. As before, ensure you have clean and up-to-date database backups before attempting this. It is normal to see one or more errors such as 'index does not exist' when executing the SQL queries below.
ALTER fulltext INDEX ON revisionattachmentitem disable;
ALTER fulltext INDEX ON revisionattachmentitem DROP (filedata);
DROP fulltext INDEX ON revisionattachmentitem;
DROP fulltext catalog documentbodycatalog;
EXEC Sp_fulltext_database
'disable';
Test Full-Text in SQL Server
You can execute SQL queries such as the one below to test if the Full-Text search is working in SQL Server. For example, in the query below, replace keyword with a search keyword that you know is contained within an embedded document attachment of a file type that's supported by a filter pack. The document should be returned in the query results.
SELECT D.number,
R.title,
R.revisionreference AS 'Revision',
RA.displayfilename AS 'Attachment',
R.attachmentname AS 'Description',
RA.modifieddate AS 'Modified Date'
FROM revisionattachmentitem RA
INNER JOIN revision R
ON RA.revisionid = R.id
INNER JOIN document D
ON D.id = R.documentid
WHERE CONTAINS (RA.filedata, 'keyword')
AND RA.isembedded = 1
ORDER BY number,
revision;
Full-Text Status
To determine the current status of Full-Text, filter packs and whether or not embedded PDF and DOCX files exist, you can execute the SQL shown further below against the database. The status result shown will be similar to that shown below.
SELECT @@VERSION AS 'SQL Server',
CASE
WHEN (
Fulltextserviceproperty('IsFullTextInstalled')) = 1 THEN 'Full-Text Search is installed.'
ELSE 'Full-Text Search is not installed or the configuration is invalid.'
END AS 'Full-Text',
CASE
WHEN EXISTS
(
SELECT NAME
FROM sys.fulltext_catalogs
WHERE NAME = 'DocumentBodyCatalog') THEN 'The ''DocumentBodyCatalog'' Full-Text catalog exists.'
ELSE 'The ''DocumentBodyCatalog'' Full-Text catalog has not be created.'
END AS 'Catalog',
CASE
WHEN
(
SELECT Object_name(object_id) from sys.fulltext_index_columns) = 'RevisionAttachmentItem' THEN 'The ''RevisionAttachmentItem'' Full-Text index exists.'
ELSE 'The ''RevisionAttachmentItem'' Full-Text index has not been created.'
END AS 'Index',
CASE
WHEN EXISTS
(
SELECT id
FROM revisionattachmentitem
WHERE isembedded = 1
AND displayfilename LIKE '%.docx') THEN 'YES'
ELSE 'NO'
END AS 'Embedded DOCX?',
CASE
WHEN EXISTS
(
SELECT id
FROM revisionattachmentitem
WHERE isembedded = 1
AND displayfilename LIKE '%.pdf') THEN 'YES'
ELSE 'NO'
END AS 'Embedded PDF?';
IF EXISTS (SELECT document_type
FROM sys.fulltext_document_types
WHERE document_type IN ( '.pdf', '.docx' ))
BEGIN;
SELECT CASE
WHEN document_type = '.docx' THEN 'Word 2007/10 (.docx)'
WHEN document_type = '.pdf' THEN 'PDF (.pdf)'
END AS 'File Format',
manufacturer AS 'Vendor',
version AS 'Version',
path AS 'Installation Path'
FROM sys.fulltext_document_types
WHERE document_type IN ( '.pdf', '.docx' )
ORDER BY [file format];
END;
ELSE
SELECT
'PDF and or Word iFilters are not installed or the iFilter configurations are invalid.'
AS 'iFilter Status';
Improve Content Search Performance
Over time, you may notice that document searches take longer to finish and in some cases this can result in timeout errors. Longer running searches over time is normal if you are regularly adding new documents with embedded file attachments. Regular adding of embedded attachments can reduce the performance of the Full-Text catalog stored in the Ideagen Quality Management database.
You may also experience similar search slowdowns if general performance on the database server is reduced. For example, if the server has inherited a number of other busy databases in SQL Server, the Ideagen Quality Management database will operate more slowly.
To improve the performance of Content searches, the database's Full-Text catalog can be re-organised. Re-organising the catalog optimises the catalog contents to provide optimal search performance.
The T-SQL database query below can be executed against a Ideagen Quality Management database within SQL Server to re-organise the catalog.
ALTER fulltext catalog documentbodycatalog reorganize;
This query may take some time to finish depending on the number of embedded file attachments you have and the general performance of the server where the Ideagen Quality Management database is stored. Due to this, only execute this query at a time when all users are logged out of Ideagen Quality Management, after ensuring you have up-to-date backups of all databases and after having reviewed the REORGANIZE argument guidance included in the ALTER FULLTEXT CATALOG article linked to in the Further Reading section of this article.