1. Fixing XMonad startup time

    I am a big fan of XMonad and I use it as my window manager inside of Gnome. My only problem was a terrible startup time; it took me more than one minute to log into my session. I knew it was a problem with my configuration and I finally ...

    Tagged as : xmonad
  2. Evaluating Tika language detection on tweets

    Tika language detection is not designed for short texts like tweets or Facebook status, as acknowledged in its documentation1. Nonetheless, I wanted to know what to expect when detecting the language of short documents like tweets.

    In a nutshell

    I compared the language identified by Twitter to the language ...

    Tagged as : tika nlp
  3. Using Ruta in a maven project

    For those who are unfamiliar with UIMA and its ecosystem, Ruta (for RUle-Based Text Annotation) is a tool for rule-based information extraction. For example, a very simple date extractor could look like:

    PACKAGE com.textjuicer.ruta.date;
    DECLARE Date;
    DECLARE Day;
    DECLARE Month;
    DECLARE Year;
    // A date is a month ...
    Tagged as : ruta uima uimafit
  4. Words lists in a shell

    I often want to manipulate set of words that I want to compare. This post present some of the one lines that I frequently use to manipulate such lists.

    Get a set of words from a text file

    If you start from a text file, the following command will convert ...

    Tagged as : shell

