17Aug16: JT has created a new diagnostics page (http://jenkins.hcmc.uvic.ca/job/moses/lastSuccessfulBuild/artifact/trunk/utilities/diagnostics.html) which looks for the following errors in complete, edited, and light-edited files:
-entries with glosses in cits
-entries with no gloss at all (and no name, persName, placeName, orgName or label)
-entries with no def
-entries containing more than one xr (These need to be concatenated into multiple refs within one xr.)
-pron:segs with parentheses in them (These need to be analyzed by a human, and made into either two seg type=n's, or two separate form elements, as necessary.)
-use of n-CTL t-TR Ø-OBJ n-SUBJ or n-CTL t-TR Ø-OBJ s-SUBJ on a word-final -n or -s if preceded by other transitive morphology (n-CTL, t-TR, stu, xit, ɬ-DIR, ɬ-EP,min, nun, tuɬ.)
26Sep16: MDH has added the following additional diagnostics:
-entries with the same string gloss-tagged more than once
-placeName entries that "duplicate" regular entries in <form> (e.g. entry for "deadfall"), so SMK and ECH can review them and make sure they are handled consistently.
-refs in xrs containing non-phonemic characters ... that is, anything BUT these characters
ʔ a á à ạ c ʼ ə h ḥ ʷ i í ì ị k l ḷ ˀ ɬ ƛ m n p q r s ṣ t u ú ù ụ w x y ʕ combining acute accent, combining grave accent, combining dot below, whitespace.
We also need to remember to deal with glosses which end in the English inflections -ing, -ed, -en or -s, as well as ablaut forms like "blow/blew". We can check these with the Find function once all the files are edited, and as we are proofing the print dictionary.
MDH had also asked about diagnostics for what makes an entry "complete", to improve the statistics report. The answer is simply:
-no Not Yet Edited Comment, AND either
-a completed hyph with no m:UNASSIGNED, OR
-a root or stem <fs>
For the time being (26Sep16), MDH has just moved the counts of total entries and entries with no Not Yet Edited Comment into the Statistics section of the new diagnostics page.
If EJD needs to edit the diagnostics further in the future, the file is: trunk/utilities/diagnostics.xsl
Once SMK and EJD have completed a first pass of editing all the files, and resolved all the problems identified by the diagnostics, MDH can make them part of the schematron to make sure we don't introduce new mistakes subsequently.