The Non-Traditional Case for the Authorship of the Twelve Disputed "Federalist" Papers: A Monument Built on Sand?

Joseph Rudman

jr20@andrew.cmu.edu

Carnegie Mellon


Introduction

This paper discusses the controversy over the authorship of twelve
      of the "Federalist" papers as seen and studied by over twenty non-traditional authorship attribution practitioners. The "Federalist" 
      papers were written during the years 1787 and 1788 by Alexander 
      Hamilton, John Jay, and James Madison. These 85 propaganda tracts 
      were intended to help get the U.S. Constitution ratified. They 
      were all published anonymously under the pseudonym, "Publius." The 
      general consensus of traditional attribution scholars (although 
      varying from time to time) is that Hamilton wrote 51 of the papers, 
      Madison wrote 14, Jay wrote 5, while 3 papers were written jointly 
      by Hamilton and Madison, and 12 papers have disputed authorship — 
         either Hamilton or Madison.

In 1964, Frederick Mosteller and David Wallace, building on the
      earlier unpublished work of Frederick Williams and Frederick 
      Mosteller, published their non-traditional authorship attribution 
      study, "Inference and Disputed Authorship: The Federalist." It is 
      arguably the most famous and well respected example from all of the
      non-traditional attribution studies. It is the most statistically 
      sophisticated non-traditional study ever carried out. There even 
      has been a 40 page paper explicating the statistical techniques of 
      the Mosteller and Wallace study (Francis). Since then, hundreds 
      of papers have cited the Mosteller and Wallace work and over two 
      dozen non-traditional attributiion practitioners have analyzed 
         and/or conducted variations of the original study.

These practitioners wanted to test their statistical approaches 
      against the Mosteller and Wallace touchstone study. Mosteller and 
      Wallace set the boundry conditions for the subsequent work — e.g., 
      not using the Jay articles as a control. Their experimental design 
      and overall report is never questioned. Most of these later 
      practitioners do not select or prepare the input text as rigorously  
      as Mosteller and Wallace — whose own selection and preparation 
         was not as rigorous and complete as it should have been.



Text Selection


(1) "Federalist" Papers

This section discusses the way the Federalist papers were 
          originally published (76 in newspapers and 8 in the book 
          compilation) and which editions the practitioners chose 
          for their non-traditional studies — how 84 papers became 
          85 and how some papers had different numbers in different 
          editions. The effect that the lack of Hamilton and Madison 
          holigraphs had on the studies is discussed. The choice of 
          edition has the potential of profoundly changing the results 
             of the studies."Project Gutenberg Etexts are usually created
                from multiple editions, all of which are in the
                Public Domain in the United States, unless a 
                copyright notice is included. Therefore we do NOT 
                keep these books in compliance with any particular 
                paper edition, usually otherwise." (Front Material of Gutenberg Etext #1404)

The compounding problem of down-loading texts via the internet 
          is explicated — e.g., one of the texts includes every variant 
          of every paragraph.  It is shown why none of the Federalist 
          studies used a valid text of the Federalist papers. The 
          question, "Does this incorrect input data invalidate the final 
          'answer?'" is discussed.



(2) The Control Texts


(a) The "Known" Hamilton Sample

This sample cannot contain questionable Hamilton writings.
              This sample must also fulfill the other criteria of a
              valid sample — e.g., same genre, same constricted time 
              frame. There also should be a sub-set of this sample set
              aside for later analysis in order to guard against the 
              charge of cherry picking the style-markers. This is not 
                 the same as the Mosteller and Wallace "training sample."



(b) The "Known" Madison Sample

In addition to discussing the way the Madison sample was
              constructed, what was said about the Hamilton sample will 
                 be applied here.




Does the lopsided number of Hamilton papers over Madison
               papers (51 to 14) pose a problem for the studies? Were the 
               Hamilton and Madison control texts from outside the Federalist 
               papers chosen correctly? Why are these "outside" controls not 
               used by most of the other practitioners? This section goes on 
               to discuss the control problems that arose with the Mosteller 
               and Wallace study and have been perpetuated through the 
               subsequent studies. This section also discusses the other 
               control problems introduced in these studies.




Text Unediting, De-editing, and Editing

"The cumulative effect of NEARLY A THOUSAND 
                 SMALL CHANGES [emphasis mine] has been to 
                 improve the clarity and readability of the 
                 text without changing its original argument." (Scigliano, lii)


(1) The "Little Book of Decisions"

In the Mosteller and Wallace study, a "little book of decisions"
      is mentioned. This "book," originally constructed by Williams and
      Mosteller, contained an extensive list of items that Mosteller 
      and Wallace unedited, de-edited, and edited before beginning the 
      statistical analysis of the texts — items such as quotations and 
      numerals. Unfortunately, neither Williams and Mosteller nor Mosteller 
      and Wallace published the contents of this "little book of decisions" 
      and only mention five of their many decisions in the published work.
      [Mosteller and Wallace 7, 16, 38-41] The little book  has been lost 
      and cannot be recovered or even reconstructed [Mosteller]. This paper 
      goes on to discuss the many ramifications of the "little book" on 
      their study and the subsequent studies. Also, how the loss of the
      "little book" casts a shadow of "scientific invalidity" over the 
      Mosteller and Wallace work — i.e., it cannot be replicated. Their 
      "little book" was not used by any of the following studies — making
         meaningful comparisons moot. 



(2) Other Decisions

This section goes on to list many of the unediting, de-editing,
      and editing items that need to be considered. It lists several of 
      the mistakes made by the many practitioners and what these mistakes 
         mean to the validity of the studies (e.g.): 

1. 
      Wrong letters

2. 
      Quotes — e.g., 131 words of Federalist 5 are a quote from
                 Queen Ann, 334 words of Federalist 9 are a quote from 
                 Montesque

3. 
      Footnotes — the author's and the editors'

4. 
      Numbers

5. 
      Foreign languages

6. 
      Spelling

7. 
      Homographic forms

8. 
      Contracted forms

9. 
      Hyphenation

10. 
      Word determination

11. 
      Disambiguation

12. 
      Editorial intervention — internal (e.g., Hamilton on
                 Madison) and external (e.g., from the first newspaper
                 copy editor to present day editors)




Conclusion


(1) Acceptance of Results by Non-Traditional Practitioners

Are practitioners (statisticians and non-statisticians) so 
          blinded by the statistical sophistication that the other 
          elements of a valid non-traditional authorship study are 
             ignored?



(2) Acceptance of Results by History Scholars

Do professional historians accept, deny, or show indifference
          to the body of work that supports the Mosteller and Wallace
          study? Why did I spend hours searching for a Mosteller and
            LAWRENCE study of the Federalist papers?



(3) Do the multiple flaws in all of these non-traditional studies 
        invalidate the results.

Is the case put forth by Mosteller
          and Wallace and buttressed by the other non-traditional 
          practitioners nothing but a "Monument" built on sand? What
          effect does showing the flaws in the Federalist studies have
          on non-traditional studies in general — i.e., if the best is
          suspect, what about the rest!




Bibliography



Avalon Project
1.2 Part I and 1.3 Part II
97-122 Ind 235-264
Yale Law School

Adair, Douglass
The Authorship of the Disputed Federalist Papers
The William and Mary Quarterly
1.2 Part I and 1.3 Part II
97-122 and 235-264
1944

Bosch, Robert A.
Smith, Jason A.
Separating Hyperplanes and the
                       Authorship of the Disputed Federalist Papers
The American Mathematical Monthly
105.7
601-607
1998

Bourne, E.G.
The Authorship of the Federalist
The American Historical Review
2.3
443-460
1897

Collins, Jeff, et al.
Detecting Collaborations in Text: Comparing
                       the Authors' Rhetorical Language Choices in the Federalist
                       Papers
Computers and the Humanities
38.1
15-36
2004


constitution.org

Cooke, Jacob E.
The Federalist
Meridian Books (The World Publishing Company)
Cleveland
1956

Davis, George
RE: Gutenberg edition of Federalist
Private E-mail
20 November 2003 18:46:51

Engeman, Thomas S., et al.
The Federalist Concordance
Wesleyan University Press
Middletown, Connecticut
1980

Farringdon, Jill
Analysing for Authorship
Cardiff
The University of Wales Press
1966

Farringdon, Michael G.
Morton, Andrew Q.
Fielding and the Federalist
Department of Computing Science Research Report
University of Glasgow
Glasgow
1990/R6

Forsyth, Richard S.
Stylistic Structures: A Computational Approach to Text Classification
Diss. University of Nottingham
1995

Francis, Ivor S.
An Exposition of a Statistical Approach to the Federalist Dispute
Leed, Jacob
The Computer and Literary Style
Kent State University Press
Kent Ohio
1966
38-78

Fung, Glenn
The Disputed Federalist Papers: SVM Feature Selection via Concave Minimization
Proceedings of the 2003 Conference on Diversity in Computing
Atlanta, Georgia
2003
42-46

Fung, Glenn
CS 635 Project
Spring Semester 1999

Fung, Glenn
Mangasarian, Olvi L.
The Disputed Federalist Papers: SVM Feature Selection via Concave Minimization
Paper delivered at the CSNA 2002 Conference, Madison, Wisconsin
15 June 2002

Hamilton, Alexander, et al.
Scigliano, Robert
The Federalist: A Commentary on the Constitution of the United States
The Modern Library (Random House)
New York
2000

Hart, Michael
RE: Gutenberg edition of Federalist
Private E-mail
21 November 2003 12:59:08

Hilton, Michael L.
Holmes, David I.
An Assessment of Cumulative Sum Charts for Authorship Attribution
Literary and Linguistic Computing
8.2
73-80
1993

Holmes, David I.
Forsyth, Richard S.
The Federalist Revisited: New Directions in Authorship Attribution
Literary and Linguistic Computing
10.2
111-127
1995

Khmelev, Dimitri V.
Tweedie, Fiona J.
Using Markov Chains for Identification of Writers
Literary and Linguistic Computing
16.3
299-307
2001

Kjell, Bradley
Authorship Determination Using Letter Pair Frequency Features with Neural Network Classifiers
Literary and Linguistic Computing
9.2
119-124
1994

Kjell, Bradley, et al.
Discrimination of Authorship Using Visualization
Information Processing & Management
30.1
141-150
1994

Martindale, Colin
McKenzie, Dean
On the Utility of Content Analysis in Author Attribution: The Federalist
Computers and the Humanities
29
259-270
1995

McColly, William
Weier, Dennis
Literary Attribution and Likelihood-Ratio Tests: The Case of the Middle English Pearle-Poems
Computers and the Humanities
17
65-75
1983

Merriam, Thomas
An Experiment with the Federalist Papers
Computers and the Humanities
23.3
251-254
1989

Mitchell, Ann F.S.
Payne, Clive D.
A Conservative Confidence Interval for a Likelihood Ratio
Journal of the American Statistical Association
66.336
861-866
1971

Mosteller, Frederick
Wallace, David L.
Applied Bayesian and Classical Inference: The Case of the Federalist Papers
Springer-Verlag
New York
1984

Mosteller, Frederick
Wallace, David L.
Inference in an Authorship Problem. A Comparative Study of Discrimination Methods Applied to the Federalist Papers
Journal of the American Statistical Association
58
275-309
1963

Mosteller, Frederick
Wallace, David L.
Notes on an Authorship Problem
Proceedings of a Harvard Symposium on Digital Computers and their Applications
Harvard University Press
Cambridge, Massachusetts
163-197
1962

Piaia, Jesse
[For Frederick Mosteller]
Private E-mail
Tuesday 22 July 2003, 11:48:04

Piaia, Jesse
[For Frederick Mosteller]
Private E-mail
Tuesday 22 July 2003, 10:57:38

Pennebaker, James W.
[no title]
Private E-mail
Wednesday 09 July 2003, 15:32:59

Pennebaker, James W.
[no title]
Private E-mail
Wednesday 09 July 2003, 14:45:34

Pennebaker, James W.
The Federalist
Unpublished preliminary work


Project Gutenberg

Rokeach, Milton, et al.
A Value Analysis of the Disputed Federalist Papers
Journal of Personality and Social Psychology
16.2
245-250
1970

Roland, Jon
RE: The Federalist on constitution.org
Private E-mail
11 September 2003, 10:24:36

Rudman, Joseph
Unediting, De-Editing, and Editing in Nontraditional Authorship Attribution Studies: With an Emphasis on the Canon of Daniel Defoe
Papers of the Bibliographical Society of America
99:1
March 2005

Sarndal, Carl-Erik
On Deciding Cases of Disputed Authorship
Applied Statistics
16.3
251-268
1967

Stamatatos, E.
Fakotakis, N.
Kokkinakis, G.
Computer-Based Authorship Attribution Without Lexical Measures
Computers and the Humanities
35
193-214
2001

Stamatatos, E.
Fakotakis, N.
Kokkinakis, G.
Text Genre Detextion Using Common Word Frequencies
COLING 2000: Proceedings of the 18th International Conference on Computational Linguistics
II
808-814
2000

Tankard, Jim
The Literary Detective
BYTE
11.2
231-238
1986

Tweedie, Fiona J.
Singh, S.
Holmes, D.I.
Neural Network Applications in Stylometry: The Federalist Papers
Computers and the Humanities
30.1
1-10
1996

Wachal, Robert Stanley
Linguistic Evidence, Statistical Inference, and Disputed Authorship
Dissertation, University of Wisconsin
1966

Waugh, Sam
Adams, Anthony
Tweedie, Fiona
Computational Stylistics Using Artificial Neural Networks
Literary and Linguistic Computing
15.2
187-197
2000

Yang, Albert C.C., et al.
Information Categorization Approach to Literary Authorship Disputes
PHYSICA A
To be published