Why is file content not indexed in DataGravity Searches?

Hi all,

Someone was asking me questions around why file contents were not indexed.  Which means the file could be found if you search for it by name, but not if you search for it by contents.  The background is they wanted to know what files were not indexed.  When files are copied to a DataGravity appliance - whether inside a VM or to an SMB share they are cracked and indexed.  There are things that are not indexed - such as password protected files.

At this point in time there is not a way to easily search for all files that are not indexed.  You might assume that is something we are working on.

But we can sort of do it now.  Let me show you how.

Do a search - in my case I said to show me the Excel files, but you could in fact use any search term you like.

We can see a few things here that are interesting.  The first is the second file - LockedFile.xlsx - which has a padlock in the far right.  This means the file contents are not searched - specifically because it is encrypted or password protected.

If we access the file we see the following screen.

We see a lot of information but in File Details - top right we can expand it and see a little more.

So we can see here that this file is NOT indexed by the Searchable No field.  But no why.  For that we need to look back at the first screenshot and notice there is an Export button.  When we export the search results we get something interesting.

If you click on this screenshot you will see a bunch of useful info.  You see fingerprints for each file, but you also see a contentstate column.  Notice how you see Full Content - meaning that file is indexed, or you see Password protected or Encrypted, and so now you know what that file is not indexed.

Here is a different search.

Here now we see something different.  These font files are not indexed.  So they show the triangle.  Use the Export button and you will see more info.

Notice the column contentstate again.  Now you see the other reason why files are not indexed.  No filter available.  This will not happen often as there is really a large number of filters available.  Be sure to let me know if you need a filter that we don’t have.

These exported files were exported to CSV and opened in Excel.  And you could search or sort in any way that you want now the info is in Excel.

So generally speaking there is these two reasons why files are not indexed.  Which, BTW, means you can find the file by name, but you cannot find it by searching for the contents.  We will likely make it much easier to find files that are not indexed.

I hope that this helps but if it doesn’t let me know.  Comments and questions are always welcome!

BTW, here is a link for all the technical DataGravity articles I do.

Michael

=== END ===

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.