Revisiting normalization
Posted by mholmes on 23 May 2012 in Activity log
Tested out Franscriptor.com with some sample text from our content, to see what it's doing and to try to deduce how (it's a black box). It offers to "dissimiler" and "détilder" the text, but it's not clear exactly what that means. This is what I've learned:
- It does nothing with long s, so that has to be normalized before submission.
- It expands ligatures such as œ.
- It does quite a good job with u/v normalization, although it failed with "oeuures".
- Many anacronistic spellings survive unchanged ("luy", "bastir", "tousjours"), so it's clearly not trying to do modernization.
This entry was posted by Martin and filed under Activity log.