Changeset 812

Show
Ignore:
Timestamp:
05/23/07 13:13:23 (2 years ago)
Author:
thomase
Message:

added horizontal ellipsis to the list of things that gets normalized, and also commented what each replacement value pair is for benefit of humans.

Files:

Legend:

Unmodified
Added
Removed
Modified
Copied
Moved
  • BADataMunger/trunk/wordnormalizer.py

    r784 r812  
    55 
    66norms = [ 
    7     ('‑','-'), 
    8     (' ',' '), 
    9     (' ',' ') 
     7    ('‑','-'),    # non-breaking hyphen 
     8    (' ',' '),     # non-breaking space 
     9    (' ',' '),     # non-breaking space bis 
     10    ('ߪ','...')   # horizontal ellipsis 
    1011] 
    1112