Tag Archives: IUPAC

The Wine Lactone: A Dive into Chemical Names

As I bumble and tumble through the chemical literature I frequently run into interesting chemicals and chemistry. Today’s moment of chemistry is with the “Wine Lactone”, so called because it is found in, well, wine. Interestingly it was first identified in koala urine. I saw that this was an opportunity also to dissect the chemical name of the Wine Lactone and perhaps answer questions that you didn’t know you had.

There are numerous forms of the wine lactone that have seemingly minor differences but have different odors. Some of the other “forms” are called stereoisomers and others positional isomers. The atomic composition is the same, but the atoms and their bonds are arranged in a slightly different way. It is not uncommon for these differences to result in a change to the odor or some other property.

The problem with chemical names (nomenclature) for people outside of chemistry is that they seem to be over-complicated polysyllabic tongue twisters with numbers and sometimes Greek letters that are impossible to pronounce or remember. Indeed, they are very often complex and seem to have a mysterious origin. This is where chemistry has strayed away from medieval naming “habits” and supplanted it with a systematic naming system that describes the exact atomic composition, how the atoms are connected and, if necessary, the particular shape in three dimensions.

For thoroughness I’ll point out the molecular formula style like CxHyNzOt where x y, z and t are variable numbers. Other elements were left out for convenient description here. Any organic molecule can be described by the numbers of carbon, hydrogen, nitrogen, oxygen and other atoms present. While the molecular formula is an accurate representation and is necessary for calculating molecular weight, as a unique identifier it is not very useful. Any given polyatomic molecule may have more than one structure that fits the molecular formula.

There are several groups that have been influential in chemical databases and nomenclature around the world. German chemists were on top of this early on with the German language Beilstein database and system of nomenclature (1881) for organic substances, now maintained by Elsevier Information Systems in Frankfurt. For inorganic and organometallic substances, there is the Gmelin database (1817) which is maintained by Elsevier MDL.

The systematic nomenclatures I will be referring to are IUPAC (International Union of Pure and Applied Chemistry) and CAS (Chemical Abstracts Service) supported by the American Chemical Society. I am unaware of the volume of usage of Beilstein and Gmelin databases today. They appear to be ongoing. Not being a German speaker, I’ll use first CAS then IUPAC in that order of priority. CAS and the few other databases use a numbering system for each unique substance in addition to the name. The CAS registry number, CASRN, is used around the world for authoritative identification of chemical substances. This includes academic R&D, industry, Safety Data Sheets, transportation, emergency response and not just in the USA. CAS also manages the TSCA registry list for EPA.

3-D model by PubChem. Line structure by Gaussling.

Many chemicals have names that pre-date systematic modern naming conventions like toluol or methylbenzol (methylbenzene, toluene) or vinegar acid (acetic or ethanoic acid). These older, trivial names are deeply entrenched in common usage and the secret cabal of nomenclature mandarins lets it pass uncontested.

Above is a ball and stick 3-D model of the Wine Lactone and next to it is a diagram of the numbering system for the molecule. While any fool could number the atoms, it takes a special one to make it official. The heading of the graphic gives the IUPAC name of the lactone as done by a chemical graphics application called ChemSketch. For comparison, the CAS name is given as well. The CAS database entry for the structure gives a very slightly different version of the same thing.

R&S designations can be omitted if they are not known. Adding R&S to the structure gives a spatially accurate view. It is not uncommon for a structure to be disclosed and given a CASRN before any R or S features are known.

The starting point for assigning a name is to decide what the core structure is, noodle through its numbering and then begin identifying the fragments on it. Somebody in the murky depths of time determined that the core structure of the Wine Lactone is a variety of 5-membered ring called a “furanone” (FYUR an own). The C=O (carbonyl, CAR bun eel) part could be in two places so we’ll have to account for that. With non-carbon atoms in the ring, the non-carbon atom is usually given the place number of “1”.

Both CAS and IUPAC have publications on organic ring structures, however in my experience IUPAC does not show the numbering scheme as CAS would. CAS holds a list of all known ring systems.

Before we go on, we notice that a hexagonal 6-membered ring is attached at two adjacent places to the 5-membered ring. This is a “ring fusion” and fused 6-membered rings are often given the radical “benzo”. So, the core structure is a type of “benzofuranone”. Oh yes, here a radical is a word fragment added to a name to indicate the presence of something.

Starting with oxygen at position 1 we go around the edge of the fused ring skeleton clockwise and attach numbers to the carbon atoms that are not part of the ring fusion. In the graphic above you can see that there were ring atoms that received simple digits. The atoms that make up the fusion are named by taking the number of the atom that precedes it and adding the character “a” to it.

So, what do we know already? We have a benzofuranone with C=O (carbonyl) at position 2. The “one” radical of furanone indicates that the furan ring has a carbonyl group in it.

Next we must account for the way in which the molecule is arranged in 3-dimensions. Carbon atoms need to have 4 bonds (lines) connected to them. If all of the lines are single, the carbon has 4 atoms arranged around it in the shape of a tetrahedron with the attached atoms at the 4 vertices. A wedged line means that the atom at the end is jutting up and out of the plane of the page. Dashed lines indicate that the group on the end is jutting down below the plane of the page, but the artistic license here is that the dases are omitted. Notice that there are 3 wedged lines at positions 3, 3a and7a. The two hydrogen atoms (H) are projecting up out of the page as is the CH3 (methyl) group. This tells us that the two rings are jutting behind the page, so this molecule is not flat but bent. The name of the molecule has to indicate this.

Molecular handedness. While the two molecules have the molecular formula and 2-dimensional connectivity, one cannot be superimposed on the other to give the identical shape in 3-D, like your hands or gloves.

The carbon atoms at 3, 3a, and 7a are called stereocenters because they have molecular handedness. Note that each is connected to four different groups in the molecule. It sounds like crazy talk but it is quite important. We won’t burrow into details here. Suffice it to say that these atoms will have an extra letter to designate what kind of “handedness” they have. R is for rectus meaning right-handed and S is for sinister meaning left-handed. There are rules for determining R vs S which we will not go into here.

Handedness in a molecule isn’t important except in how they interact with other molecules with handedness. The two nonsuperimposable (chiral) mirror images are said to be “enantiomers” (eh NAN tee oh mers). This is an issue for crystal structure and for many biomolecules. Outside of this, it isn’t much of a concern.

We now have (3S, 3aS, 7aR) to be plopped into the name. This group is shown in parentheses.

Next, we tackle the “tetrahydro” radical- it indicates 4 more hydrogen atoms are present than what would otherwise not be there. In nomenclature they start with rings that are unsaturated in hydrogen, meaning that the carbon skeleton is not connected to as many hydrogen atoms as it could. The four positions where a single hydrogen has appeared are 3a, 4, 5, 7a on what would otherwise be double bonds. There is one more to account for. The namesake furan molecule would have a double bond at position 3. In this molecule there is a hydrogen atom in place of the double bond, so 3H is added with the CH3 group.

Graphic by Gaussling

So far we have (3S, 3aS, 7aR) and 3a, 4, 5, 7a-tetrahydro and 2-benzofuranone.

At positions 3 and 6 there are two CH3 or methyl groups. To account for position and the fact there are two of them leads to this part of the name- “3,6-dimethyl-“. Elsewhere in the name we denote the R or S configuration, if any. The CH3 at carbon 6 is flat so it lies in the plane pf the page- it is neither R nor S. But the CH3 at carbon 3 juts out of the page at us rather then pointing downward. It has been given the S configuration.

Putting it all together in the CAS name, the configurations at relevant atoms are given first followed by a hyphen then the hydrogen locations followed by a hyphen then the word “tetrahydro”. After tetrahydro radical and a hyphen, the methyl positions 3,6 are added followed by a hyphen then radical “di” attached to the radical “methyl” followed by a hyphen then the core structure 2(3H)-Benzofuranone. The “2(3H)” feature indicates that the carbonyl is at position 2 and an H is at position 3, indicating that the furan ring is connected by single bonds.

I describe here the name of the Wine Lactone in its extended CAS form rather than the parsed form. If you want to sort numbered chemical names alphabetically, leading digits just complicate the sorting. So if you sort alphabetically by the core structure, you rearrange the name to lead with Benzofuranone followed by the details trailing off in the distance as in the first graphic.

I’m sure that deep within the lower catacombs at Chemical Abstracts in Columbus, OH, there are grizzled old nomenclature wizards who may quibble with my explanations, but let them materialize before me in a puff of smoke and discuss the error of my ways.

Chemical Nomenclature, Enantiomers and Polarized Light

[Reissue under better title]

Due to a recent hospital stay with pneumonia, I found myself staggeringly bored. To stave off some of this I began to look into an antibiotic I was given that I had never heard of- Levofloxacin. The structure of this antibiotic was different from antibiotics I was previously familiar with. Natural I suppose, considering that I’ve been immersed in organo-transition metal chemistry for most of my industrial career. Metal-carbon bonds are quite useful in some sectors but not as drugs.

Levofloxacin is a good place to go deep diving into some of the murkier depths of chemical nomenclature. The complicated-looking chemical naming system exists to unambiguously represent the composition and shape of molecules. Certain features and properties of a molecule confer important attributes that need categorizing, thus requiring descriptive names rather than just a number. Every different chemical substance is, well, different and their chemical names must reveal a unique identity. Two or more substances with the same name leads to nothing but trouble.

Chemical substances can be grouped into categories to associate them with related aspects. We have noble gases, transition metals, hydrocarbons, pnictogens, polymers, acids, and bases etc. But the categories allow for variation when particular attributes are under discussion.

The names of chemical substances can be very off-putting to non-chemists and often does lead them to abandon their search for information. A few have even suggested that if you cannot pronounce the name it must be bad. Even worse than the polysyllabic and numbered character strings are the various synonyms. Consider simple toluene which is actually not so bad-

Directly from Chemical Abstract’s SciFinder.

In chemical nomenclature there is just a bit of flexibility in how numbers, syllables and name fragments can be assembled as the toluene example above shows, if you don’t read the rules too closely. The plethora of names come from historical trade names or long-time industrial use or may just predate systematic nomenclature now in use. There is also the German Beilstein and Gmelin organic and inorganic nomenclature as well, but these seem to be outdated.

As always, a proper chemical name describes the composition and 3-dimensional connectivity of the chemical structure of a molecule. These names are commonly listed in one of the two dominant styles of chemical nomenclature in the world- International Union of Pure and Applied Chemists (IUPAC) and Chemical Abstracts Service (CAS). IUPAC tends to be taught in undergraduate chemistry because it always has been and is maybe a trifle easier.

The CAS databases contain more than 200 million organic and inorganic chemical substances and about 70 million protein and nucleic acid sequences. There are two search platforms available in CAS- SciFinder and STN. STN is much more cryptic and harder to learn than SciFinder. Some say there are weaknesses in patent searching in SciFinder alone. For IP work I use SciFinder, Google Patents and the USPTO in combination. All three offer different kinds of searching capability.

Levofloxacin is a biocidal antibiotic effective against both gram-positive and gram-negative bacteria. It is an inhibitor of both DNA gyrase and topoisomerase IV enzymes which are involved in shaping the geometry of bacterial plasmids, or rings of bacterial DNA. Plasmids have to fit inside the bacterial cell wall and those that are not made compact enough are too long to allow successful formation of daughter cells in reproduction resulting in cell death. Other kinds of antibiotics are bacteriostatic and often work better in one or the other of Gram-Negative or Gram-Positive bacteria. Gram stains are effective with certain types of bacterial cell walls and not with others. The ability of a dye to stain a colony of bacteria a particular way is used to help identify bacteria.

Consider the name of Levofloxacin from IUPAC: (-)-(S)-9-fluoro-2,3-dihydro-3-methyl-10-(4-methyl-1-piperazinyl)-7-oxo-7H-pyrido[1,2,3-de]-1,4-benzoxazine-6-carboxylic acid hemihydrate. The name is a string of characters with numbers indicating attachment points. The core of the structure is a 1,4-benzoxazine ring system which is festooned with a carboxylic acid and a few other groups. The core structure was identified and numbered previously by someone according to rules. The IUPAC name also specifies that it is a hemihydrate, meaning that there is one molecule of water associated with every two (hemi) molecules of Levofloxacin. For some reason the CAS name does not include the hemihydrate in the name, probably because it was not mentioned in the composition when registered with CAS. How it is in the IUPAC name is not known to me.

More pain. The IUPAC name above indicates “(-)-(S)-“. Molecules with “handedness” are said to be chiral and are not superimposable with their mirror images, similar to a right-hand being shape-incompatible with a left glove. These molecules can be prepared as individuals of single handedness or all of the way to a 50:50 mixture of left and right-handed. A 50:50 mixture of left and right-handed is called a “racemate” (RASS eh mate). Each handedness version is a type of isomer called an “enantiomer“. A substance consisting of a pure enantiomer is said to be “enantiomerically pure.”

Isolated enantiomers have the ability to rotate plane polarized light as measured by a polarimeter. Plane polarized light is a light beam where the electric field vectors of the electromagnetic radiation are all vibrating in a single plane. Obviously the magnetic vectors are polarized as well, but it is the electric field that is usually mentioned. The angle of the oscillating ray’s electric fields along the axis can be tilted one way or the other depending on the interaction with matter. Reflected light and skyglow are polarized as well. Molecules with handedness rotate the vibrational plane and by an angle dependent on the light frequency and the amount of chiral mass traveled through. Light that is rotated counterclockwise, or levorotary, has a (-) sign and signified with an “l” and light that is rotated clockwise is dextrorotary and has a (+) sign and signified with a “d.” If a molecule rotates plane polarized light, the substance is said to be “optically active.” The amount of rotation is dependent on the light frequency, frequently the sodium D line (actually a close doublet) which is often used as the standard source for this. Mercury lines, e.g., 354 nm, can be used if the D line results in a low measured rotation. Substances that do not rotate plane polarized light are often designated “dl” as an abbreviation for racemic.

D-Glucose, or dextrose, solutions rotate plane polarized light in the clockwise, dextrorotary direction, thus the “D” in the name.

Commercial L-lactic acid derived from fermentation is “L” for levorotary. This enantiomerically enriched lactic acid is used to make the lactide monomer for poly(lactic acid), PLA. Only the lactide dimer from L-lactic acid gives the desired PLA isomer. The racemic form of lactic acid is not useful for PLA due to undesirable physical properties in the polymer.

A ratio can be taken from an experimental sample that may range from 50:50 racemate to 100 % of a single enantiomer to give the optical purity of the chiral material, representing the proportion of pure enantiomer. Often the measure % ee, or percent enantiomeric excess is used to describe enantiomeric purity. A 95:5 mixture of enantiomers would have a 90 % excess enantiomeric of one enantiomer. Chemical synthesis of 99 % ee can be quite difficult.

A racemate does not have a net rotation of plane polarized light. The (-) sign represents the “levo” part of the levofloxacin, referring to counterclockwise rotation of plane polarized light. Prior to the appearance of reliable analytical methods for the determination of enantiomeric purity, polarimetry and optical rotation were the method of choice. Today, Gas Chromatography (GC) and High Performance Liquid Chromatography (HPLC) columns and chiral shift reagents for 1H-NMR that can provide baseline separation of enantiomers.

The (S) character in the name indicates the handedness of a molecule as determined by standard selection rules defined by an organization for assigning absolute configuration. “S” stands for the Latin word “sinister” meaning left-handed. There is no simple calculation to go between absolute configuration and sign. The (-) sign can indicate which particular enantiomer is under consideration with an easy measurement if it has been previously correlated. (-)-(R) and (+)-(S) enantiomers can and do occur. The “(S)” defines only the precise configuration of atoms about an asymmetrically situated atom in a molecule based on a few simple rules. The mirror image of (-)-(S)- would be (+)-(R)-, “R” for rectus meaning right-handed in Latin.

The first task in assigning a name to a molecule is to determine the “core” structure. This is the basis of the name. Your molecule will be a variety of “the core structure.” This is not so easy because IUPAC or CAS will have already done this and your choice may or may not match. Referring to the CAS name below, you can see that some structural fragments end in “-yl,” “-ic” or “-o”. These signal that the fragments are not the core structure, they are attachments. The core structure onto which everything else is attached is the “1,4-benzoxazine”. It is a standalone chemical name which may be modified. This is a very obscure fact that most won’t know, but the “-ine” suffix indicates that the core structure is an amine, full stop. Other nitrogen indicators like azo, aza, amino, ammonium, nitro, azido, etc, suggest a nitrogen group attachment to something else.

Does it help to have a college degree in chemistry to know this stuff? Sorry but yes. In the set of all worldly knowledge, this is pretty obscure.

The CAS name for levofloxacin is 7H-Pyrido[1,2,3-de]-1,4-benzoxazine-6-carboxylic acid, 9-fluoro-2,3-dihydro-3-methyl-10-(4-methyl-1-piperazinyl)-7-oxo-, (3S)-. The core structure seems to be the 1,4-benzoxazine. CAS has a ring-system handbook that defines and numbers all of the known ring systems. The significance of CAS is that they assign and maintains the official CAS registry number, CASRN, which is depended upon world-wide for the exact composition and connectivity and geometry of substances. There is a very extensive rule book that rigidly defines a chemical name with rooms of CAS experts sitting in a building in Columbus, OH, to assign these names. For levofloxacin the CASRN is 100986-85-4. Today, CASRNs are usually directly searchable on Google. The final digit “4” is a check digit for error entry detection.

General comments about chemical features on Levofloxacin.

Yet more pain. The more formal official CAS name, however, does not indicate the direction of rotation of plane polarized light. I suppose this is considered experimental data not needed in the name. The CAS nomenclature only shows “(3S)-” in the name, indicating the absolute “S” configuration at position 3 of the molecule. The business of handedness or shape in 3-space of a molecule is called “stereochemistry” and arises in several ways. The rules for assigning the absolute configurations of R or S enantiomers may depend on the features of the molecule.

This business of molecular handedness is mostly an issue for biochemistry and pharmaceuticals. A great many- most?- biomolecules have handedness themselves and are therefore subject to interactions with other biomolecules or drugs that depend on the precise shapes for their interactions. This is very important for the interaction between molecules like an enzyme and substrate or ligand.

In the absence of other chiral molecules, two enantiomers will have the same chemical properties when individually pure. However, a racemate consists of a pair of enantiomers. The interactions between R and R or S and S enantiomers, will be different than the interactions between a racemic mixture of R and S or S and R enantiomers. If there is more than one chiral feature in a molecule- say two- then the molecule could be R, R or R, S, or S, S, or S, R. This gives two pairs of enantiomers, each called a “diastereomer.” For instance, one substance with R, S and another with R,R will be substances chemically and physically different called diastereomers. The presence of a diastereomer in an enantiomeric drug product would likely be deemed a contaminant and removed.

Source: David Darling

A druggable disease-state is one that can be positively influenced by a drug molecule. This commonly involves the drug molecule docking with an enzyme to activate it or deactivate it. These enzymes are very large diastereomers having many chiral atoms giving them complex shapes that can result if the formation of a pocket in the protein structure called the “active site.” This active site has very particular shape and charge features provided by the chiral amino acid chain of the protein. An active site will have a shape that is compatible with the close fitting of a drug or other molecule similar to a hand in a glove. Many of these active sites bind the shape and charge of one enantiomer of a drug molecule more effectively than the other for a better fit. The drug, or substrate, may just sit there and block the action of an enzyme, shutting it down or activate it continuously. Other active sites may bind a drug and change the shape of the enzyme causing the enzyme to speed up or slow down for a throttling or accelerating effect elsewhere on the enzyme. This is called the allosteric effect.

So, you may be asking- big deal, what does it matter? In the world of pharmaceuticals, many drug substances can exist as single enantiomers, racemates or diastereomers. Racemates may be easiest to manufacture, but very often one of the enantiomers is more biologically active than the other. In fact, one enantiomer may be disastrously harmful. The classic example is Thalidomide. The S form caused birth defects and the R form did not. Pure R enantiomer was safe from teratogenicity but a racemic mixture of R and S was not.

Conclusion. A superficial look at a chemical name opens up insights into the chemical nature of a substance. What makes each chemical substance unique is their distribution of charge in 3-dimensions. The distribution is affected by the types of the atoms present, geometric features of the 3-dimensional shape and the ability of the system to allow charge to accumulate in particular places of the molecule. These attributes mentioned also set up the type and vigor of reactivity the molecule will display.