Harnessing Analytical Insights and Illuminating the Physical Realm of Dark Data – An Interview with Markus Lindelow of Iron Mountain
Eighth in a series of in-depth interviews with innovators and leaders in the fields of Risk, Compliance and Information Governance across the globe.
Markus Lindelow leads the IG and Content Classification Practice Group at Iron Mountain, the world’s largest information management company, where he’s been pioneering breakthrough analytic techniques for over a decade. He holds a Master of Science degree in Computer Information Systems from Saint Edwards University and consults across a broad set of industries. I interviewed him in November to discuss his thoughts on the evolution of metadata, content classification, AI, and how organizations are using the new pillars of data science to break down their silos, help customers get lean and discover the hidden value in their big data sets.
Markus, you work with all kinds of companies to help them better understand and address the often incomplete metadata tied to some of their most valuable information assets in the form of historical paper records and materials retained over decades. In many cases, institutional memory has been completely lost and they’re struggling to figure out whether to dispose of these business records, balancing costs of over retention with risks of untimely destruction. How does your team leverage diagnostic, predictive and prescriptive analytics to make sense of what little data they might have to make informed decisions?
Our content classification process focuses on making the best use of the available metadata. This means classifying records with meaningful metadata as well as analyzing the classified inventory in order to create classification rules for records with little or no metadata. We have identified a number of attributes within the data that tend to correlate with classification conclusions. We assess the classified records associated with an attribute to create a profile that may inform a rule to classify the unclassified records sharing that same attribute…
If, for example, there are 100 cartons associated with pickup order XYZ, 90 of those cartons have been classified, and furthermore all 90 are classified to ABC100, can we create a rule to classify to ABC100 the 10 unclassified cartons belonging to pickup order XYZ? Clients may need to weigh the risk when applying this type of classification rule and the process may include a random sampling of cartons for physical inspection in order to verify the classification.
There’s usually a disconnect between the needs of information managers and legislatures which set retention periods for records. We see this in regulations where the granularity of both fixed and event based retention triggers complicates the practical management of records. Over the years, strategies like “big buckets” have attempted to lessen this challenge but even the best efforts are imperfect and carry their own risks. What can be done to better bridge the divide between the need for due diligence in retaining records and the business case for a more practical solution?
There are two pieces to the puzzle of records management: classification and retention. A records retention schedule needs to be straightforward enough to implement so that users can apply record codes to records. But the retention periods for the record classes need to be specific enough so that some types of records are not being over or under-retained because they are being grouped with other records…
It is a balancing act and big buckets can only be taken so far. If a record class contains record types that user or legal requirements suggest should have a different retention period, then a new record class should be created. The other area in which the business can affect retention is a more prudent implementation of the retention schedule into the records management system. For example, a record class may be defined as event-based in the retention schedule, but if it is known that all records for that record class are sent to storage only after the trigger event has occurred then the storage date can be used as the base date to calculate retention. Classifying records is only part of the puzzle. This becomes all too familiar for clients with a lot of active records in offsite storage. These records need to be reviewed periodically to identify records whose event triggers have transpired.
Recently, concerns about privacy risks and specifically exposure of personally identifiable information (PII) have driven some companies to bake protections into their software development lifecycle strategy. In light of this and the ongoing disruption to traditional ECM systems, is it even possible to future proof any content management system to support these controls?
Companies are anxious about PII stored in their metadata as a result of GDPR and the right to be forgotten….
Business is increasingly global so it is not an issue isolated to European companies. It is important to track PII so that it can be redacted or destroyed upon request.-Markus Lindelow, Iron Mountain
Policies should be implemented related to metadata input. Whereas record descriptions are often entered as one long string, it is important to separate PII into distinct, trackable fields. For example, a box description field should contain only a general description of the record types contained therein.
Although Moore and Kryder’s law’s on semiconductor power and storage capacity have held up over the years, some think they don’t tell the whole story, while others don’t really care because storage remains relatively cheap. So why not keep everything forever from an analytics standpoint? In the context of paper we can measure the cubic real estate and quantify the cost benefit, but what about electronic space? How does a business strike the right balance between the benefits of big data mining and culling its data sets, so that employees don’t lose productivity hunting for needles in digital haystacks?
The keep-everything culture has evolved as a byproduct of cumbersome retention periods and reduced storage costs. Rather than make the tough decisions about particular record sets, records are often tagged with an overly conservative retention period. This leads to an ever-increasing repository of data that becomes difficult to access and analyze, as well as an increased exposure to potential litigation. It is important to identify and remove ROT (redundant, obsolete, and trivial information) and retire/archive applications in a timely manner. Collection, classification, and retention processes should be assessed and optimized. Collect only the minimum data required for a task. Build a manageable records classification scheme so that it is straightforward for users to apply a record code. Implement the retention schedule in a prudent manner and review retention periods periodically.
How are big multinationals like Iron Mountain beginning to leverage artificial intelligence techniques and machine learningto help clients measure and monetize their information assets? Are we seeing any tangible benefits yet that might trickle down to mid-size and smaller entities?
Companies know that the vast repositories of data they generate and store are valuable but extracting that value is difficult. Iron Mountain has entered into a partnership with Google Cloud to analyze unstructured data. The Iron Mountain InSight solution combines Iron Mountain’s experience in data analytics and Information Governance with Google’s Artificial Intelligence and Machine Learning capabilities. Iron Mountain is also working with MicroFocus and Active Navigation to offer content analytics. With classification using Machine Learning and retention applied using Iron Mountain’s privacy and policy expertise these partnerships hope to shine some light on dark data.
You’re a big supporter of your local math pentathlon and youth academics. What guidance do you have for a young personjust getting started in preparing for a career in big data, consulting and analytics or even an individual re-entering the work force and seeking a careerin this developing field of technology and business?
Math and logic games are a fun way for children to learn critical thinking in a competitive but supportive environment. With the preponderance of screens in children’s lives a great way to channel this interest is beginner programming classes. The Scratch programming language created by MIT is a visual way to learn by programming with code blocks. If a child is attached to her iPad, an app called Swift Playgrounds can help her learn Apple’s Swift programming language. Programming requires logic, patience, and focus. At its heart programming is problem-solving, and running code to perform a function is very satisfying. Programming is becoming a second language and an important tool for young people entering the workforce. An understanding of database design and development is also helpful to understand how information is organized and accessed.