From: Matt McCutchen Date: Thu, 10 Sep 2020 17:39:47 +0000 (-0400) Subject: SuperbChemistry 3.0 X-Git-Url: https://mattmccutchen.net/superbchemistry/superbchemistry.git/commitdiff_plain/2eec932f44a3bdc4d2f331a5e48c15dc8632bb31 SuperbChemistry 3.0 - Update for compatibility with modern versions of OpenOffice and LibreOffice. One of the regular expressions contained an unescaped [ in a character class, which no longer works because it now denotes a named character class. Also, work around a bug in LibreOffice undo. - Put all actions taken during a formatting pass in an undo context so they can be undone and redone as a unit. - Change the file extension to oxt and add full extension metadata and a menu item. - Add support for formatting a selection rather than the entire document, with a minor limitation. - Make the rules for recognizing sequences slightly more conservative. --- diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..7cc7f81 --- /dev/null +++ b/.gitignore @@ -0,0 +1 @@ +/SuperbChemistry.oxt diff --git a/Makefile b/Makefile new file mode 100644 index 0000000..59b2de9 --- /dev/null +++ b/Makefile @@ -0,0 +1,11 @@ +OXT:=SuperbChemistry.oxt + +all: $(OXT) + +# No dependencies declared. Clean in order to make it again. +$(OXT): + rm -f $@ + cd extension && zip -r ../$(OXT) * + +clean: + rm -f $(OXT) diff --git a/README.md b/README.md new file mode 100644 index 0000000..38e420c --- /dev/null +++ b/README.md @@ -0,0 +1,61 @@ +# SuperbChemistry + +This is an extension for [OpenOffice](https://www.openoffice.org/) and +[LibreOffice](https://www.libreoffice.org/) that applies superscript and +subscript formatting to chemical formulas in bulk in Writer documents. See [the +web page](https://mattmccutchen.net/superbchemistry/) for user documentation. + +The rest of this document refers to LibreOffice, but the same remarks apply to +OpenOffice. + +## Development + +Run `make clean all` to generate `SuperbChemistry.oxt`, which you can install in +LibreOffice via "Tools" -> "Extension Manager". + +By default, the library is installed read-only to prevent users from making +changes that would be lost without warning if they uninstall the extension. For +development, you can change `library:readonly="true"` to +`library:readonly="false"` in `extension/SuperbChemistry/*.xlb` so you can edit +the library in LibreOffice. Then you can copy the edited `Main.xba` from your +profile (a typical location is +`~/.config/libreoffice/4/user/uno_packages/cache/uno_packages/*.tmp_/SuperbChemistry.oxt/SuperbChemistry/Main.xba`, +but this will depend on your operating system and application version) back to +this working tree. + +`SuperbChemistry-test.odt` is a test suite that exercises most cases of the +formatting rules. + +## Formatting rules + +- An _item_ is an element symbol (`[A-Z][a-z]?`) or text that starts or ends + with a group symbol `()[]{}` (depending on which side of the item we're + looking at). +- A _charge symbol_ is `+`, `-` (U+002D hyphen-minus), or `−` (U+2212 minus + sign). +- We recognize the following sequences: + - An item followed by one or more digits and one of the following: another + item, space or tab, closing group symbol `)]}`, sentence punctuation + `.,:;?!'"`, or end of line. + - An item followed by zero or more digits, a charge symbol, and any of the + things allowed in the previous case except for another item (to avoid + matching compound words like `A-B`). This means we won't match something + like `Na+Cl-` , but formulas are rarely written that way. +- A recognized charge symbol of `-` is replaced with `−`. +- In a recognized sequence with no charge symbol, all of the digits are treated + as a quantity. +- In a recognized sequence with a charge symbol, the last digit is treated as a + charge amount and the rest are treated as a quantity. For example, in `Fe3+`, + the digit should be a charge amount. Exception: if there is only one digit, + it is treated as a quantity in two cases: + - If the item is `H`, `O`, `F`, `Cl`, `Br`, or `I`, because these elements + often occur in quantities greater than 1 and rarely have a charge greater + than 1. + - If the item ends in a closing group symbol, because it probably wouldn't + have been enclosed in group symbols if its quantity were 1. + +## License + +I, Matt McCutchen, the sole author of SuperbChemistry, waive my copyright to it, +placing it in the public domain. SuperbChemistry comes with absolutely no +warranty. diff --git a/SuperbChemistry-test.odt b/SuperbChemistry-test.odt index 9740bbc..33f2e1c 100644 Binary files a/SuperbChemistry-test.odt and b/SuperbChemistry-test.odt differ diff --git a/extension/Addons.xcu b/extension/Addons.xcu new file mode 100644 index 0000000..91cd233 --- /dev/null +++ b/extension/Addons.xcu @@ -0,0 +1,20 @@ + + + + + + + + vnd.sun.star.script:SuperbChemistry.Main.FormatSelectionOrDocument?language=Basic&location=application + + + SuperbChemistry: Format selection or document + + + com.sun.star.text.TextDocument + + + + + diff --git a/extension/META-INF/manifest.xml b/extension/META-INF/manifest.xml index 184035e..e414e74 100644 --- a/extension/META-INF/manifest.xml +++ b/extension/META-INF/manifest.xml @@ -1,4 +1,5 @@ - \ No newline at end of file + + diff --git a/extension/SuperbChemistry/Main.xba b/extension/SuperbChemistry/Main.xba index 530cf8d..960dc6c 100644 --- a/extension/SuperbChemistry/Main.xba +++ b/extension/SuperbChemistry/Main.xba @@ -1,147 +1,224 @@ -' SuperbChemistry version 2.2 -' http://mattmccutchen.net/schem/ -' Written and maintained by Matt McCutchen <matt@mattmccutchen.net> -' -' Applies superscript and subscript formatting to chemical formulas in -' OpenOffice.org Writer documents. -' -' Rules: -' - Quantities [0-9]+ and charges [0-9]*[-+−] are recognized after an element -' symbol [A-Z][a-z]? or a closing delimiter [\])}] . Hyphens are converted -' into real minus signs. -' - A charge sign [-+−] is ignored if it is followed by a letter, digit, -' opening delimiter, or [<>] . (Charges should appear only at the end of a -' formula, and we want to avoid matching ordinary hyphens in text.) -' - When digits followed by a charge sign are recognized, the last digit -' becomes part of the charge and the remaining digits become the quantity. -' (Charges rarely have absolute value more than 9.) -' - In cases like X2-, we have to guess whether the digit is an atom/group -' quantity or a charge amount. We guess atom/group quantity if X is H (NH4+), -' O (NO3-), a halogen (SbF6-, AlCl4-, etc.), or a closing parenthesis -' (Fe(OH)2+; the group likely would not have been parenthesized unless it had -' a quantity). Otherwise we guess charge amount (Fe3+). This heuristic -' should be right most of the time. -' -' Examples: -' C12345 ==> C_{12345} -' H+ ==> H^+ -' Cl- ==> Cl^- -' Fe3+ ==> Fe^{3+} -' SO42- ==> SO_4^{2-} -' C1232+ ==> C_{123}^{2+} -' N3- ==> N^{3-} -' N|_3^- not recognized (| represents "no-width no break") -' NH4+ ==> NH_4^+ -' NO3- ==> NO_3^- -' AlCl4- => AlCl_4^- -' Fe(OH)2+ ==> Fe(OH)_2^+ -' O12 ==> O_{12} -' y4- not recognized -' x2 not recognized -' Foo2 not recognized -' TI-89 not recognized -' -' To format the current document, run the FormatDocument macro: go to Tools -> -' Macros -> Run Macro... -> My Macros -> SuperbChemistry -> Main -> -' FormatDocument -> Run. I realize that this is ugly. I tried to make the -' package install a menu item to format the document, but the resulting package -' caused OpenOffice.org to crash regularly (I didn't investigate why), so I -' abandoned that idea. Note that you can add a menu item as a user -' customization (Tools -> Customize), and I recommend it if you plan to use -' SuperbChemistry frequently. -' -' FormatDocument uses a sequence of regular expression find-and-replace -' operations since that was easy to implement and makes the rules easy to -' change. The operations appear in the undo history, so you can undo a -' formatting run by undoing the block of "Replace" entries at the top of the -' history. -' -' I would like to support formatting a selection, but the OpenOffice.org API -' does not appear to support replace-all within a selection. I could find -' within the selection and implement the replacing myself, but that is more -' work than I want to do. -' -' If SuperbChemistry makes a mistake (e.g., recognizes a "formula" that isn't -' or formats a formula incorrectly), you can correct the formatting yourself -' and prevent future runs of the macro from recognizing the offending text by -' inserting a "No-width no break" character in the middle of it. This character -' is available in the "Insert -> Formatting Mark" menu when "Tools -> Options -> -' Language Settings -> Languages -> Enhanced language support -> -' Enabled for complex text layout (CTL)" is enabled. - -' ============================================================================== - -' Regular expression replace in the document, -' creating superscripts if superb > 0 or subscripts if superb < 0. -' Used by FormatDocument. -sub SuperbReplace(doc as object, searchStr as string, replaceStr as string, superb as integer) - -dim rd as object -rd = doc.createReplaceDescriptor() - -rd.SearchCaseSensitive = true -rd.SearchRegularExpression = true -rd.setSearchString(searchStr) -rd.setReplaceString(replaceStr) - -if superb <> 0 then - dim replaceAttrs(1) as new com.sun.star.beans.PropertyValue - replaceAttrs(0).Name = "CharEscapement" - if superb > 0 then - replaceAttrs(0).Value = 33 - else - replaceAttrs(0).Value = -9 - end if - replaceAttrs(1).Name = "CharEscapementHeight" - replaceAttrs(1).Value = 58 - rd.setReplaceAttributes(replaceAttrs) -end if - -doc.replaceAll(rd) - -end sub - -' Formats the current document -sub FormatDocument - -' Idiom: Match something and tag it on the left or right with @x@ -' for further processing. If the replacement text could use -' backreferences, this would be easier. (I think backreferences were added -' since I originally wrote this code, but I see no need to rewrite it to take -' advantage of them. - Matt 2008-10-26) - -' Tag candidate charges following symbols or ), but not in compound words, etc. -' Acceptable next character. (Has to be before end of line to avoid matching @g@ tag itself.) -SuperbReplace(ThisComponent, "([A-Z][a-z]?|[\])}])[0-9]*[-+−][^[({A-Za-z0-9<>]", "&@G@", 0) -' Retag in front. -SuperbReplace(ThisComponent, ".@G@", "@g@&", 0) -' End of line. -SuperbReplace(ThisComponent, "([A-Z][a-z]?|[\])}])[0-9]*[-+−]$", "&@g@", 0) - -' Some groups grab a single following digit as a quantity rather than a charge amount. -' See detailed rationale above. -SuperbReplace(ThisComponent, "(H|O|F|Cl|Br|I|\))[0-9]", "&@n@", 0) - -' Real minus signs in charges. -SuperbReplace(ThisComponent, "-@g@", "−@g@", 0) - -' Make charges: at most one digit. -SuperbReplace(ThisComponent, "[0-9]?[−+]@g@", "@q@&", 1) - -' Remove the O and ) markers in case of O57. -SuperbReplace(ThisComponent, "@n@", "", 0) - -' Tag quantities: as many digits as we can still grab. -SuperbReplace(ThisComponent, "([A-Z][a-z]?|[\])}])[0-9]+", "&@n@", 0) - -' Make quantities. -SuperbReplace(ThisComponent, "[0-9]+@n@", "&", -1) - -' Clean up all markers. -SuperbReplace(ThisComponent, "@[gGnq]@", "", 0) - -end sub +Option Explicit + +Function HaveSelection(doc As Object) + Dim sel as Object + sel = doc.CurrentController.Selection + If sel.Count > 1 Then + HaveSelection = True + Exit Function + End If + Dim s0 + s0 = sel.getByIndex(0) + If s0.Text.compareRegionStarts(s0.Start, s0.End) = 0 Then + HaveSelection = False + Else + HaveSelection = True + End If +End Function + +' Regular expression replace in the document, creating superscripts if +' superb > 0 or subscripts if superb < 0. +Sub SuperbReplace(doc As Object, searchRegex As String, replacePattern As String, superb As Integer) + + Dim rd As Object + rd = doc.createReplaceDescriptor() + + rd.SearchCaseSensitive = true + rd.SearchRegularExpression = true + rd.setSearchString(searchRegex) + rd.setReplaceString(replacePattern) + + If superb <> 0 Then + Dim replaceAttrs(1) As New com.sun.star.beans.PropertyValue + replaceAttrs(0).Name = "CharEscapement" + If superb > 0 Then + replaceAttrs(0).Value = 33 + Else + ' The default escapement for subscripts is -33, which looked bad to me in + ' chemical formulas. This looks better. Modify to your taste. + replaceAttrs(0).Value = -9 + End If + replaceAttrs(1).Name = "CharEscapementHeight" + replaceAttrs(1).Value = 58 + rd.setReplaceAttributes(replaceAttrs) + End If + + doc.replaceAll(rd) + +End Sub + +Sub ReplaceInSelection(doc As Object, searchRegex As String, replacePattern As String) + + Dim frame As Object, dispatcher As Object + frame = doc.CurrentController.Frame + dispatcher = createUnoService("com.sun.star.frame.DispatchHelper") + + Dim args(6) As New com.sun.star.beans.PropertyValue + args(0).Name = "SearchItem.AlgorithmType" + args(0).Value = 1 + args(1).Name = "SearchItem.SearchFlags" + args(1).Value = &H1800 ' Search in selection + args(2).Name = "SearchItem.SearchString" + args(2).Value = searchRegex + args(3).Name = "SearchItem.ReplaceString" + args(3).Value = replacePattern + args(4).Name = "SearchItem.Command" + args(4).Value = 3 + args(5).Name = "SearchItem.AlgorithmType2" + args(5).Value = 2 + args(6).Name = "Quiet" + args(6).Value = true + + dispatcher.executeDispatch(frame, ".uno:ExecuteSearch", "", 0, args()) + +End Sub + +Global formatSelectionWarningShown As Boolean +' I haven't found a way to initialize the variable, but it looks like its +' default value is treated as false in an if statement. +'formatSelectionWarningShown = False + +Sub FormatSelectionOrDocumentDebug() + ' Replacing with an empty replacement string triggers a bug in LibreOffice + ' (https://bugs.documentfoundation.org/show_bug.cgi?id=136577), so we must + ' avoid it. Fortunately, avoiding it is pretty straightforward. + + ' Insert @m@ between an item and a number or charge. + If HaveSelection(ThisComponent) Then + ' doc.replaceAll is not capable of searching a selection, while the + ' dispatch-based replace API uses the current format options in the + ' "Find & Replace" dialog and does not let us change them, which means that + ' (1) existing settings can break things and (2) we cannot make superscripts + ' and subscripts. We do the best we can by using one dispatch-based replace + ' to tag all the sequences we want to operate on (which loses the selection in + ' OpenOffice anyway) and then proceed with replaceAll calls on the whole + ' document, which will only operate on the already tagged sequences. + If Not formatSelectionWarningShown Then + MsgBox "Due to limitations in the OpenOffice/LibreOffice API, the " & _ + """Format selection"" command may not process some chemical formulas " & _ + "in the selection or may apply incorrect formatting if any format " & _ + "options are active in the ""Find & Replace"" dialog box. If this " & _ + "happens, just undo the command (if any changes were made), clear the " & _ + "format options in the ""Find & Replace"" dialog box (focus the " & _ + """Find"" field, click ""No Format"", and repeat for the ""Replace"" " & _ + "field), and run ""Format selection"" again." & Chr$(13) & Chr$(13) & _ + "This message is always shown on the first ""Format selection"" " & _ + "command in each OpenOffice/LibreOffice session because " & _ + "SuperbChemistry has no way to detect whether format options are " & _ + "active in ""Find & Replace"".", _ + 0, "SuperbChemistry ""Format selection"" notice" + formatSelectionWarningShown = True + End If + ReplaceInSelection(ThisComponent, "(?<=[A-Z][a-z]?|[\])}])[-+−0-9]+", "@m@&") + Else + SuperbReplace(ThisComponent, "(?<=[A-Z][a-z]?|[\])}])[-+−0-9]+", "@m@&", 0) + End If + + ' Insert @c@ after a charge. + SuperbReplace(ThisComponent, "(?<=@m@)([0-9]*[-+−])(?=[ \t\])}.,:;?!'""]|$)", "&@c@", 0) + + ' Real minus signs in charges. + SuperbReplace(ThisComponent, "-@c@", "−@c@", 0) + + ' Some groups grab a single following digit as a quantity rather than a charge amount. + ' Insert @sq@ marker to prevent the charge from grabbing the digit. + SuperbReplace(ThisComponent, "(?<=(H|O|F|Cl|Br|I|[\])}])@m@)[0-9]", "&@sq@", 0) + + ' Each charge grabs at most one digit and moves the @c@ in front to prevent the + ' quantity from grabbing the digit. + SuperbReplace(ThisComponent, "([0-9]?[−+])@c@", "@c@$1", 1) + + ' Remove any @sq@ markers so items can grab all the digits that follow for the quantity. + SuperbReplace(ThisComponent, "(.)@sq@", "$1", 0) + + ' At this point, we have only @m@ and @c@ markers left. + + ' Format quantities: as many digits as we can still grab. + ' We have to allow @ as a following character for our own @c@ tag. + SuperbReplace(ThisComponent, "(?<=@m@)[0-9]+(?=[@A-Z \t\])}.,:;?!'""]|$)", "&", -1) + + ' Clean up @c@ markers. We know there is a charge sign after each. + SuperbReplace(ThisComponent, "@c@(.)", "$1", 0) + + ' Clean up @m@ markers. We know there is some character before each. + SuperbReplace(ThisComponent, "(.)@m@", "$1", 0) + +End Sub + +Dim madeChanges As Boolean + +Sub UndoListener_undoActionAdded() +End Sub +Sub UndoListener_actionUndone() +End Sub +Sub UndoListener_actionRedone() +End Sub +Sub UndoListener_allActionsCleared() +End Sub +Sub UndoListener_redoActionsCleared() +End Sub +Sub UndoListener_resetAll() +End Sub +Sub UndoListener_enteredContext() +End Sub +Sub UndoListener_enteredHiddenContext() +End Sub +Sub UndoListener_leftContext() + madeChanges = True +End Sub +Sub UndoListener_leftHiddenContext() +End Sub +Sub UndoListener_cancelledContext() +End Sub + +Sub FormatSelectionOrDocument() + + Dim undoActionName As String + If HaveSelection(ThisComponent) Then + undoActionName = "SuperbChemistry: Format selection" + Else + undoActionName = "SuperbChemistry: Format document" + End If + ThisComponent.UndoManager.enterUndoContext(undoActionName) + + On Error Goto ErrorHandler + FormatSelectionOrDocumentDebug + On Error Goto 0 + + ThisComponent.UndoManager.leaveUndoContext() + Exit Sub + +ErrorHandler: + + ' If our undo context is nonempty, we want to undo the generated action. + ' If not, we do not want to undo as that would undo the user's previous + ' action. If we just check whether the title of the last undoable action + ' is "SuperbChemistry: Format (selection|document)", that might be wrong if + ' the user ran FormatDocument twice in a row: probably unlikely, but the + ' completely correct check is not that hard. + Dim listener As Object + listener = CreateUnoListener("UndoListener_", "com.sun.star.document.XUndoManagerListener") + madeChanges = False + ThisComponent.UndoManager.addUndoManagerListener(listener) + ThisComponent.UndoManager.leaveUndoContext() + ThisComponent.UndoManager.removeUndoManagerListener(listener) + If madeChanges Then + ThisComponent.UndoManager.undo() + ThisComponent.UndoManager.clearRedo() + End If + + MsgBox "SuperbChemistry encountered an unexpected error:" & Chr$(13) & Chr$(13) & _ + "Code " & Err & ": " & Error$ & Chr$(13) & Chr$(13) & _ + "Any changes made so far have been undone." & Chr$(13) & Chr$(13) & _ + "SuperbChemistry needed to catch the error in order to leave your undo " & _ + "history in a consistent state. If the problem is reproducible and you " & _ + "want to see the precise error location, run the FormatDocumentOrSelectionDebug " & _ + "macro, but be advised that it may generate multiple entries in the undo " & _ + "history and will not undo them on error.", _ + 0, "SuperbChemistry internal error" + On Error Goto 0 +End Sub \ No newline at end of file diff --git a/extension/SuperbChemistry/dialog.xlb b/extension/SuperbChemistry/dialog.xlb index 8d924f4..7609d23 100644 --- a/extension/SuperbChemistry/dialog.xlb +++ b/extension/SuperbChemistry/dialog.xlb @@ -1,3 +1,3 @@ - \ No newline at end of file + \ No newline at end of file diff --git a/extension/SuperbChemistry/script.xlb b/extension/SuperbChemistry/script.xlb index 815dca2..c3d89f3 100644 --- a/extension/SuperbChemistry/script.xlb +++ b/extension/SuperbChemistry/script.xlb @@ -1,5 +1,5 @@ - + \ No newline at end of file diff --git a/extension/description.xml b/extension/description.xml new file mode 100644 index 0000000..7ca5367 --- /dev/null +++ b/extension/description.xml @@ -0,0 +1,16 @@ + + + + + + Matt McCutchen + + + SuperbChemistry + + + + + diff --git a/extension/extension-description.txt b/extension/extension-description.txt new file mode 100644 index 0000000..c9b4c2e --- /dev/null +++ b/extension/extension-description.txt @@ -0,0 +1 @@ +Applies superscript and subscript formatting to chemical formulas in bulk in Writer documents. \ No newline at end of file