Cleaning Up Unstructured Data

Many organizations are transforming or cleaning up, organizing, and managing information located in their shared drives, SharePoint libraries and other unstructured data repositories as part of their overall program of gaining control over their electronic records. This is often an activity performed as part of an Information Governance or Privacy Compliance strategy.

A proper Information Governance strategy around shared drives, SharePoint and other unstructured data is focused on:

  • Putting databases, install files, and apps on dedicated servers separate from electronic records
  • Putting the good content into structures that allow individuals to manage, classify, or purge
  • Getting rid of useless, non-business content so managing relevant content becomes easier

Index And Auto-classification Tools

These tools represent a fairly new generation of technologies available to help with the cleanup and categorization of unstructured content, which is any type of digital content that is not inside a database, from office documents and images to applications. They can support multiple taxonomy facets which can be applied to content across the enterprise. The actions can range from deleting and moving, to outputting migration scripts, to moving files and metadata into content repositories. Transforming shared drives requires much more than just installing these tools though.

Getting Buy-In

The first step in cleaning a shared drive is to make sure the  right people are involved, as it requires input from a number of constituencies. All of these groups will benefit from an organized share drive, so they should all be willing to participate in defining requirements and policies for the project.

  • Information technology (IT)
  • Human resources (HR)
  • Litigation support team

Some Common Approaches

  • Basing root level, first-tier, and second-tier folders on a functional model will help efforts to organize content
  • Eliminating odd characters in file names
  • Avoiding long file paths
  • Validating and removing duplicates
  • Establishing common acronyms and spellings for common entities (controlled vocabulary)
  • Structuring the use of dates to be consistent across the organization.
  • Formalizing the use of “draft,” “ver.#,” “superseded,” and “final.”
  • Considering using postfix tags on folders indicating a lifecycle management state and business function.