Keeping up with the data stream

After many years of immersion in technical work I still marvel at how an organization can become mired in raw data. Smart people can easily succumb to the notion that data equals knowledge. Especially in circumstances where data is accumulated faster than it can be assimilated.

It is relatively easy to collect data in a chemical lab. You take a set of samples and prep them for testing, load the sample vials into the sample tray, and let the automated sampling widget move through its paces. In a few minutes or hours the software has accumulated files bulging with data points.  It is even possible to construct graphs with all sorts of statistical manipulations on the data, but still not morph the data into usable knowledge. I’ve been to meetings where graphs are presented but were not backed up with interpretation. What was the presenters point in showing the graph?

Computerized chromatography stations will spew data all day long onto hard drives based on selections from a cafeteria-style menu. With hyphenated instrumentation, an innocent looking 2-dimensional chromatogram is actually just a part of a higher dimensional data set with corresponding mass spectra or UV/Vis spectra.

The task for the technical manager is to get control of this stream of data and render some of it into higher level knowledge that will help people run the organization and get product or research out the door. This is the true work product of the experimental scientist: knowledge woven from a data cross-fire and supported by accepted theory.

I do not know what others do when confronted by a data tsunami. I can only speak for myself on this. When the data flow gets ahead of me, it usually means that I am spread too thin. It indicates that I am not taking enough time to properly devise experiments for maximum impact and am skimping on the analysis in favor of other duties.

Another issue relating to managing diverse data output is the matter of storing accumulated data and knowledge for easy retrieval. It is easy to throw things into folders and file away. But in a few months, the taxonomy used for filing a given bundle of data becomes murky. Soon, one is forced to rummage through many files to find data because you’ve forgotten details on how you organized the filing system.

There are ways around this problem. Laboratory Information Systems (LIMS) are offered by numerous vendors. A good LIMS package goes a long way towards managing data and distributing knowledge. We have a homebrew LIMS system (built in MS Access) that seems to work rather well for analytcial data. However, it was not constructed with process safety information in mind.

What I have constructed for my process safety work is an Access-based application that structures various kinds of information graphically into regions on a form. Within each region is a set of data fields that are subordinate to a given heading or context. The form is devised to prompt the user to consider many types of thermokinetic experiments and provides fields that are links to specific documents. The form provides both actual data and links to source documents. It can be used to enter data or to retrieve it.

This is what Access is designed to do, so I have described absolutely nothing conceptually new. Access allows me to aggregate related kinds of experimental results, reports (the knowledge part), and source documents in one field of view so as to allow the users visual processing capability the chance to browse more efficiently.

An example of “related kinds of experimental data” would be DSC, TGA, ARC, and RC1 reports. What connects these fields is the domain of thermal sensitivity of a compound or reaction mixture.

Another aggregation of fields would be the conditions related to an incident. I like to select key descriptors to an incident so as to aid in incident type studies at a  later date. It is useful to be able to sort incidents resulting from a blown rupture disk or a spill, fire, triangulated drum, etc.

A database is rather like a garden. In order to be useful it must be planted and then cultivated. Ignore it and it will lose its comprehensiveness, casting into doubt its continued use.

Next up is the development of an in-house Wikipedia style browser application for aggregating product, process, and safety information. This offers the best opportunity yet for making information and diverse data available to employees. It can be written in narrative form so as to impart knowledge and history. Why was a particular vendor chosen or how did we decide on that specification? What was the rationale for the process change in step 4.2?  The ability to explain and link to in-house source documents from a familiar and single point of access is key to potential success.

3 thoughts on “Keeping up with the data stream

  1. John Spevacek

    What really drives me up the wall is data without interpretation. My graduate advisor let me do that once. Once. Never again even if I haven’t seen the guy in 20 years.

    O.k., I lied. Sometimes I have to do that because our client is being secretive: “I need the G’, G” curves between 0.1 and 10,000 Hz at 170 C”. In a case like that, all I can do is give them what they want, but I hate writing up those reports.

    Reply
  2. RTW

    I recognized that one needed other tools to manage Medicinal Chemistry project data, and knowledge distilled from it long ago. I saw a lot of otherwise talented scientists wasting a lot of their time reproducing reports the same information over and over again, laboriously reorganizing data for weekly, monthly, and summary reports for management, because there wasn’t a suitable means other than meeting presentations to disseminate that knowledge. So I created an early web based system that forced some structure on the users, but otherwise allowed them to post their data, reports, and links to the system very easily and without the need of a dedicated web master to build or initiate new project categories… It was quite a successful system built by a Medicinal Chemist (me) and used nearly universally by members of my department, until the Research IT department took it over and messed with a good thing. This later led me into Electronic Notebook systems, which is what I am currently engaged in.

    It sounds to me like you could use a generalized Electronic Notebook that could allow you to also organize your knowledge along with your data. I build some aspects of the web system I originally created to manage med chem projects into ELN systems I now customize and deploy to several clients.

    The Problem with Wiki’s and Blogs from my perspective is that they require a great deal of management, and are not very chemically intelligent, or searchable.

    At Any rate – you appear to have analyzed your situation well, and just need to implement some of the software infrastructure you need to get started. Have you thought about hiring a consultant, and purchasing a comercial solution or have one customized and installed for you?

    Good luck and best regards,

    Tom Winters
    Director, Professional Services
    CambridgeSoft Corp

    Reply
  3. Chris Frostt

    LIMS Software has become the accepted regulatory norm for all enterprises laboratories. The need to comply with industry regulation has led to a new wave of web based LIMS Software to hit the market. STARLIMS is one of the leading companies developing
    laboratory management information systems that are both effective and incredibly cost efficient.

    Reply

Leave a reply to John Spevacek Cancel reply