root/BADataMunger/trunk/wordnormalizer.py


Mode:

Legend:

Added
Modified
Copied or renamed
Rev Chgset Date Author Log Message
(edit) @1427 [1427] 09/12/08 15:45:31 thomase sticking a fork in the aspect of BADataMunger that is the creation of …
(edit) @1302 [1302] 05/03/08 12:03:35 thomase no need for doctype here; etree will not serialize to output anyway
(edit) @1299 [1299] 05/02/08 17:16:57 thomase added standard headers and improved/added docstrings
(edit) @1297 [1297] 05/02/08 16:43:35 thomase working tests for character normalization
(edit) @812 [812] 05/23/07 13:13:23 thomase added horizontal ellipsis to the list of things that gets normalized, and …
(add) @784 [784] 04/16/07 17:39:26 thomase Strip unwanted Word formatting. Normalize non-breaking hyphens and spaces …
Note: See TracRevisionLog for help on using the revision log.