<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE script:module PUBLIC "-//OpenOffice.org//DTD OfficeDocument 1.0//EN" "module.dtd">
-<script:module xmlns:script="http://openoffice.org/2000/script" script:name="Main" script:language="StarBasic">' Matt McCutchen's SuperbChemistry for OpenOffice, version 2.1
+<script:module xmlns:script="http://openoffice.org/2000/script" script:name="Main" script:language="StarBasic">' SuperbChemistry version 2.2
+' http://mattmccutchen.net/schem/
+' Written and maintained by Matt McCutchen <matt@mattmccutchen.net>
'
-' Applies superscript and subscript formatting to chemical formulas in text.
+' Applies superscript and subscript formatting to chemical formulas in
+' OpenOffice.org Writer documents.
'
' Rules:
' - Quantities [0-9]+ and charges [0-9]*[-+−] are recognized after an element
-' symbol [A-Z][a-z]? or a closing delimiter [])}] . Hyphens are converted
+' symbol [A-Z][a-z]? or a closing delimiter [\])}] . Hyphens are converted
' into real minus signs.
' - A charge sign [-+−] is ignored if it is followed by a letter, digit,
' opening delimiter, or [<>] . (Charges should appear only at the end of a
' - When digits followed by a charge sign are recognized, the last digit
' becomes part of the charge and the remaining digits become the quantity.
' (Charges rarely have absolute value more than 9.)
-' - Exception: If a single digit follows O or a closing delimiter, that digit
-' is always the quantity. (Handle NO3- and Fe(OH)2+. I think oxygen is the
-' only element that frequently has a quantity as part of a +/-1 ion. A group
-' is rarely parenthesized unless it has a quantity.)
+' - In cases like X2-, we have to guess whether the digit is an atom/group
+' quantity or a charge amount. We guess atom/group quantity if X is H (NH4+),
+' O (NO3-), a halogen (SbF6-, AlCl4-, etc.), or a closing parenthesis
+' (Fe(OH)2+; the group likely would not have been parenthesized unless it had
+' a quantity). Otherwise we guess charge amount (Fe3+). This heuristic
+' should be right most of the time.
'
' Examples:
' C12345 ==> C_{12345}
' Fe3+ ==> Fe^{3+}
' SO42- ==> SO_4^{2-}
' C1232+ ==> C_{123}^{2+}
-' N2- ==> N^{2-}
+' N3- ==> N^{3-}
+' N|_3^- not recognized (| represents "no-width no break")
+' NH4+ ==> NH_4^+
' NO3- ==> NO_3^-
-' Fe(OH)2- ==> Fe(OH)_2^-
+' AlCl4- => AlCl_4^-
+' Fe(OH)2+ ==> Fe(OH)_2^+
' O12 ==> O_{12}
-' y4- ==> y4-
-' x2 ==> x2
-' Foo2 ==> Foo2
-' TI-89 ==> TI-89
+' y4- not recognized
+' x2 not recognized
+' Foo2 not recognized
+' TI-89 not recognized
+'
+' To format the current document, run the FormatDocument macro: go to Tools ->
+' Macros -> Run Macro... -> My Macros -> SuperbChemistry -> Main ->
+' FormatDocument -> Run. I realize that this is ugly. I tried to make the
+' package install a menu item to format the document, but the resulting package
+' caused OpenOffice.org to crash regularly (I didn't investigate why), so I
+' abandoned that idea. Note that you can add a menu item as a user
+' customization (Tools -> Customize), and I recommend it if you plan to use
+' SuperbChemistry frequently.
+'
+' FormatDocument uses a sequence of regular expression find-and-replace
+' operations since that was easy to implement and makes the rules easy to
+' change. The operations appear in the undo history, so you can undo a
+' formatting run by undoing the block of "Replace" entries at the top of the
+' history.
+'
+' I would like to support formatting a selection, but the OpenOffice.org API
+' does not appear to support replace-all within a selection. I could find
+' within the selection and implement the replacing myself, but that is more
+' work than I want to do.
+'
+' If SuperbChemistry makes a mistake (e.g., recognizes a "formula" that isn't
+' or formats a formula incorrectly), you can correct the formatting yourself
+' and prevent future runs of the macro from recognizing the offending text by
+' inserting a "No-width no break" character in the middle of it. This character
+' is available in the "Insert -> Formatting Mark" menu when "Tools -> Options ->
+' Language Settings -> Languages -> Enhanced language support ->
+' Enabled for complex text layout (CTL)" is enabled.
+
+' ==============================================================================
' Regular expression replace in the document,
' creating superscripts if superb > 0 or subscripts if superb < 0.
-' Used by SuperbChemistry.
+' Used by FormatDocument.
sub SuperbReplace(doc as object, searchStr as string, replaceStr as string, superb as integer)
dim rd as object
' Idiom: Match something and tag it on the left or right with @x@
' for further processing. If the replacement text could use
-' backreferences, this would be easier.
+' backreferences, this would be easier. (I think backreferences were added
+' since I originally wrote this code, but I see no need to rewrite it to take
+' advantage of them. - Matt 2008-10-26)
' Tag candidate charges following symbols or ), but not in compound words, etc.
' Acceptable next character. (Has to be before end of line to avoid matching @g@ tag itself.)
' End of line.
SuperbReplace(ThisComponent, "([A-Z][a-z]?|[\])}])[0-9]*[-+−]$", "&@g@", 0)
-' O and )]} grab a single digit as quantity.
-SuperbReplace(ThisComponent, "[\])}O][0-9]", "&@n@", 0)
+' Some groups grab a single following digit as a quantity rather than a charge amount.
+' See detailed rationale above.
+SuperbReplace(ThisComponent, "(H|O|F|Cl|Br|I|\))[0-9]", "&@n@", 0)
' Real minus signs in charges.
SuperbReplace(ThisComponent, "-@g@", "−@g@", 0)