• Jump To … +
    case_ignore.foma fallback.EN.foma fallback.ES.foma hyphenate.EN.foma hyphenate.ES.foma morfo.EN.foma morfo.ES.foma normalise.foma tokenize.foma verbs.ES.foma
  • ¶

    English Fallback

  • ¶

    This script provides a series of fallback transducers that try to tag unknown entities in the input.

  • ¶

    We detect proper nouns NNP, NNPS because they start with a capital letter

    define Capital [A|B|C|D|E|F|G|H|I|J|K|L|M|N|O|P|Q|R|S|T|U|V|W|X|Y|Z];
    regex Capital ?* "+NNP":0;
    regex Capital ?* "+NNPS":s;
  • ¶

    We tag compound words as adjectives JJ, because in English when a word cluster is used as an adjective it should be joined by a hyphen (e.g seven-year)

    regex ?+ "-" ?+ "+JJ":0;
  • ¶

    If it ends in sh we tag it as an adjective, since it’s a common suffix

    regex ?* s h "+JJ":0;
  • ¶

    Words ending in ly we tag as adverbs RB

    regex ?* l y "+RB":0;
  • ¶

    The comparative and superlative endings for adjectives and adverbs are the same, so we just output an ambiguous guess

    regex ?* "+JJR":{er};
    regex ?* "+RBR":{er};
    regex ?* "+JJS":{est};
    regex ?* "+RBS":{est};
  • ¶

    If it uses a common verb ending, then we assume it’s an inflected verb form VBD, VBG

    regex ?* "+VBD":{ed};
    regex ?* "+VBG":{ing};
  • ¶

    We’re running out of option, so if it ends in an s, guess it’s either a plural noun NNS or a third-person singular VBZ

    regex ?* "+NNS":s;
    regex ?* "+VBZ":s;
  • ¶

    If everything else fails, tag it as a noun NN, since it’s the most common open class of words

    regex ?* "+NN":0;