InChI Tales

Th’ Gaussling has been dabbling in the strange land of cheminformatics lately. I’m trying to develop some productivity tools in on various platforms to make chemical information more accessible to fellow staff members.

One particularly useful tool is the InChI, or International Chemical Identifier. The InChI is a character string that is derived from a chemical structure. This string can be hashed (irreversibly) into a shorter string of alphabetic characters called the InChIKey. Using ChemSketch, one can draw a structure and generate an InChI string and an InChIKey string. What you’ve done here is to jump the gap from chemical structure to a searchable character string. These InChIKeys can be planted into documents such as Excel spreadsheets, Word files, and Access databases. A search for the InChI character string can find all of the documents in a folder containing the string or to a record in a database containing it.

Granted, this can be done in other ways. A chemical name can be searched as can a CASRN. Names are subject to syntactical variation and could complicate the search. If you have generated a new structure that is not listed in CAS and the nomenclature is complex, then an InChIKey identifier can serve as an unambiguous term for subsequent searches.

If you hate using the Java based drawing module in SciFinder, an InChI string or SMILES string can be used instead. Just open the structure drawing module and look in the upper left hand corner of the window. There will be a screwy looking button to select for pasting in an InChI or SMILES string. This will cause the Java module to draw the structure for you. It’s pretty handy.

2 thoughts on “InChI Tales

  1. Joe Loughry

    That’s neat! The hash collision problem can be completely taken care of, from a programmer’s perspective, by keeping a standard list of known collisions (like the list of bad checks at a cash register); they will probably never be more than a handful of these and they can be treated as special cases in software. (In fact, that list would be of great interest to crypto researchers.) From your perspective, the hash would appear never to collide. By the way, there’s no reason to put up with the hash being irreversible, either; “rainbow tables” exist to solve that problem, and could be made up if the need exists.

    Reply

Leave a reply to Joe Loughry Cancel reply