Stylemetry Authorship Analysis

In this article I am going to discuss a recent technique that is used for determining authorship and the application of this technique to the Book of Mormon. I will leave most of the details for later articles if there is any interest.

Recently computer analysis techniques have been used to establish authorship of several disputed documents. An example is the Federalist papers. Although they were published anonymously, the author of 73 of these was determined to be John Jay (5) and the rest divided between Alexander Hamilton and James Madison. There were twelve that were left open to question. Using frequency of usage of the small filler words, they found overwhelming evidence favoring Madison as the author of all twelve disputed papers.

A second example deals with an unfinished novel by Jane Austin when she died in 1817. A skilled author completed the novel and had it published. Although she duplicated the style she failed to duplicate the subconscious habits of detail. When these habit patterns were examined, the difference was clearly evident.

The noncontextual words which have been most successful in discriminating among authors are the filler words of the language such as prepositions and conjunctions, and sometimes adjectives and adverbs. Authors differ in their rates of usage of these filler words.

Three different types of "wordprints" or stylometry were used in examining the authors of the Book of Mormon: (1) frequency of letters, (2) frequency of commonly occurring non-contextual words, (3) frequency of rarely occurring noncontextual words. Three types of statistical methods will be used with this data: Multivariate Analysis of Variance (MANOVA), Cluster Analysis, and Discriminant or Classification Analysis.

Most of the Book of Mormon was abridged by Mormon and his son Moroni. A section of plates called the small plates of Nephi include the writing of mainly Nephi and Jacob. Additionally several sections appear to be quoting from other authors. These are included as additional authors. We end up with a total of 22 authors that are represented by at least 1000 words.

By comparing the 10 most frequent words (and, the, of, that, to unto, in, it, for and be) the statistical odds of a single author was found to be 1 in 100 billion. "However, this number should not be taken too literally. It depends on several assumptions, one of which is that we have a random sample of each author's writings." Using all 38 frequently occurring words, 42 uncommon words and frequency of letters a similar result was obtained.

Writing of Joseph Smith and his contemporaries was also included. Ninety blocks of words were used from Joseph Smith, W. W. Phelps, Oliver Cowdery, Parley P. Pratt, Sidney Rigdon, and Solomon Spaulding. Two important points came out of this comparison: (1) There is some evidence of a wordprint time trend within the Book of Mormon; i.e. writers are more similar to their contemporaries than to writers in other time periods. (2) None of the Book of Mormon selections resembled the writing of any of the suggested nineteenth-century authors. This remained true even when formal words such as hath, unto, etc. were removed from the analysis. The results depended as much on words such as [and, for, of] as on any other of the words.

The preceding information was derived using MANOVA. Cluster analysis tries to group similarities in a multidimensional comparison. Using 9 words which discriminated best from the MANOVA, the cluster analysis yielded some of the following: (1) Nephi's word blocks paired with those of his father, Lehi and they together paired with Nephi's brother Jacob and Isaiah, the prophet most quoted by Nephi and Jacob, (2) Alma's word blocks grouped with those of Amulek, his missionary companion; and they both paired with Abinadi, the man who converted Alma's father, (3) Samuel the Lamanite and Nephi, son of Helaman grouped together and they were contemporaries. When the nineteenth-century authors were added, in general, word blocks of Book of Mormon authors clustered with those of Book of Mormon authors, and word blocks of non-Book of Mormon authors clustered with those of non-Book of Mormon authors.

Disriminant Analysis was used with 2000 word groups comparing 18 words. 93.3 percent of the blocks were correctly identified. This is very high for this many authors and was unexpected. If the block was left out while computing the classification functions and then classified the results were still in the 70 and 80 percent range.

Another test was to compute two discriminant functions that allowed the authors to be plotted in a two-dimensional plot. The Book of Mormon and non-Book of Mormon authors were clearly separate and a straight line could be drawn between both groups.

Joseph Smith was also compared with the four major Book of Mormon authors and plotted in two dimensions. Again each author clustered by themselves and Joseph Smith's writing is very definitely distinct from that of the authors in the Book of Mormon.

