<?xml version="1.0" encoding="utf-8"?>
<feed xmlns="http://www.w3.org/2005/Atom">

  <title>Alexandre Patry</title>
  <link href="http://textjuicer.com/atom.xml" rel="self"/>
  <link href="http://textjuicer.com/"/>
  <updated>2012-11-10T00:49:18-05:00</updated>
  <id>http://textjuicer.com/</id>
  <author>
    <name>Alexandre Patry</name>
    
      <email>alex@textjuicer.com</email>
    
  </author>

  
  <entry>
    <title>Building a tweet corpus</title>
    <link href="http://textjuicer.com/blog/2012/11/10/building-a-tweet-corpus/"/>
    <updated>2012-11-10T00:01:00-05:00</updated>
    <id>http://textjuicer.com/blog/2012/11/10/building-a-tweet-corpus</id>
    <content type="html">&lt;p&gt;I wanted to play around with some tweets, but I quickly discovered
that getting a hand on a corpus is not that easy because of
&lt;a href=&quot;https://dev.twitter.com/terms/api-terms&quot;&gt;twitter terms of service&lt;/a&gt;. It is up
to every one to create is own corpus.&lt;/p&gt;

&lt;p&gt;Luckily, twitter has an
&lt;a href=&quot;https://dev.twitter.com/docs/api/1.1/get/statuses/sample&quot;&gt;API to sample tweets randomly&lt;/a&gt;. I
created a
&lt;a href=&quot;https://github.com/apatry/twitter-sampler&quot;&gt;small application&lt;/a&gt; over it
that can be used following these steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Register a twitter application on
&lt;a href=&quot;https://dev.twitter.com/apps/new.&quot;&gt;https://dev.twitter.com/apps/new.&lt;/a&gt; Application name
is not important, you only want to get its credentials.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Download the last version of
&lt;a href=&quot;https://github.com/apatry/twitter-sampler/downloads&quot;&gt;twitter-sampler&lt;/a&gt;.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Download
&lt;a href=&quot;https://raw.github.com/apatry/twitter-sampler/master/credentials.clj&quot;&gt;credentials.clj&lt;/a&gt;
and fill in the blanks with the credentials of your application.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Run the following command:&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;&lt;code&gt;
java -jar twitter-sampler-1.0.0-SNAPSHOT-standalone.jar -c credentials.clj -n 1000 tweets.json
&lt;/code&gt;&lt;/p&gt;

&lt;p&gt;where &lt;code&gt;credentials.clj&lt;/code&gt; is the file containing your credentials,
&lt;code&gt;1000&lt;/code&gt; is the number of tweets you want to download and &lt;code&gt;tweets.json&lt;/code&gt;
is the file where the tweets should be saved.&lt;/p&gt;

&lt;p&gt;You should now have a corpus of tweets to play with.&lt;/p&gt;
</content>
  </entry>
  
  <entry>
    <title>Words lists in a shell</title>
    <link href="http://textjuicer.com/blog/2012/03/03/words-lists-in-a-shell/"/>
    <updated>2012-03-03T01:39:00-05:00</updated>
    <id>http://textjuicer.com/blog/2012/03/03/words-lists-in-a-shell</id>
    <content type="html">&lt;p&gt;I often want to manipulate set of words that I want to compare. This
post present some of the one lines that I frequently use to manipulate
such lists.&lt;/p&gt;

&lt;h2&gt;Get a set of words from a text file&lt;/h2&gt;

&lt;p&gt;If you start from a text file, the following command will convert it
to a list of words:&lt;/p&gt;

&lt;pre&gt;
cat input.txt | sed 's/\&gt;/\n/g' | sed 's/^[[:space:]]*//' | sed 's/[[:space:]]*$//' | grep -v &quot;\^$&quot; | sort | uniq  &gt; output.txt
&lt;/pre&gt;


&lt;p&gt;If you run osx use this command instead (notice the new line in the middle of the command):&lt;/p&gt;

&lt;pre&gt;
cat input.txt | sed 's/[[:&gt;:]]/\
/g' | sed 's/^[[:space:]]*//' | sed 's/[[:space:]]*$//' | grep -v &quot;\^$&quot; | sort | uniq  &gt; output.txt
&lt;/pre&gt;


&lt;p&gt;Both of these commands replace word boundaries by newlines, trim words
and then print a sorted lists of words.&lt;/p&gt;

&lt;h2&gt;Intersection&lt;/h2&gt;

&lt;p&gt;A first one liner to find the elements that are common to two lists:&lt;/p&gt;

&lt;pre&gt;
cat file1.txt file2.txt | sort | uniq -d
&lt;/pre&gt;


&lt;h2&gt;Union&lt;/h2&gt;

&lt;p&gt;A similar one liner to find the elements that are in one set or the
other:&lt;/p&gt;

&lt;pre&gt;
cat file1.txt file2.txt | sort | uniq
&lt;/pre&gt;


&lt;h2&gt;Union minus intersection&lt;/h2&gt;

&lt;p&gt;To get the words that are only in file1.txt or file2.txt, but not
both:&lt;/p&gt;

&lt;pre&gt;
cat file1.txt file2.txt | sort | uniq -u
&lt;/pre&gt;


&lt;h2&gt;Difference&lt;/h2&gt;

&lt;p&gt;To get the elements that are in file1.txt, but not file2.txt:&lt;/p&gt;

&lt;pre&gt;
cat file1.txt file2.txt file2.txt | sort | uniq -u
&lt;/pre&gt;


&lt;h2&gt;Histogram of words&lt;/h2&gt;

&lt;p&gt;As a bonus, we can tweak our first command to get an histogram of
words:&lt;/p&gt;

&lt;pre&gt;
cat input.txt | sed 's/\&gt;/\n/g' | sed 's/^[[:space:]]*//' | sed 's/[[:space:]]*$//' | grep -v &quot;\^$&quot; | sort | uniq -c | sort -nr
&lt;/pre&gt;


&lt;p&gt;The following variations prints words appearing at least 10 times:&lt;/p&gt;

&lt;pre&gt;
cat input.txt | sed 's/\&gt;/\n/g' | sed 's/^[[:space:]]*//' | sed 's/[[:space:]]*$//' | grep -v &quot;\^$&quot; | sort | uniq -c | sort -nr | awk '$1 &gt;= 10 {print $2}'
&lt;/pre&gt;

</content>
  </entry>
  
  <entry>
    <title>Forer Effect</title>
    <link href="http://textjuicer.com/blog/2012/03/03/forer-effect/"/>
    <updated>2012-03-03T00:43:00-05:00</updated>
    <id>http://textjuicer.com/blog/2012/03/03/forer-effect</id>
    <content type="html">&lt;p&gt;I recently learned about
&lt;a href=&quot;http://en.wikipedia.org/wiki/Forer_effect&quot;&gt;Forer effect&lt;/a&gt; or how
people can take general statements and make them their own. In a
&lt;a href=&quot;http://apsychoserver.psych.arizona.edu/JJBAReprints/PSYC621/Forer_The%20fallacy%20of%20personal%20validation_1949.pdf&quot;&gt;classic experiment&lt;/a&gt;
, Forer asked his students to fill a personality test. A week later,
he gave his analysis back to each student and made them
rate its accuracy on a scale of 0 (poor) to 5 (perfect). The
analysis were so targeted that only one out of 39 students rated the
results lower than 4.&lt;/p&gt;

&lt;p&gt;As it turned out, the results were not good, they were &lt;em&gt;perceived&lt;/em&gt; as
good. Everyone received the exact same excerpt from an astrology
book. Students believed the analysis were targeted because they were
made of universally valid statements:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;A universally valid statement, then, is one which applies equally
well to the majority or the totality of the population.  A universally
valid statement is true for the individual, but it lacks the
quantitative specification and the proper focus which are necessary
for differential diagnosis.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;Some universal statements taken from Forer's paper are:&lt;/p&gt;

&lt;blockquote&gt;&lt;p&gt;You have a great need for other people to like and admire you.&lt;/p&gt;

&lt;p&gt;You have a tendency to be critical of yourself.&lt;/p&gt;

&lt;p&gt;Some of your aspirations tend to be pretty unrealistic.&lt;/p&gt;&lt;/blockquote&gt;

&lt;p&gt;One of Forer's conclusions is that people are really bad at assessing
information about
themselves. &lt;a href=&quot;http://denisdutton.com/cold_reading.htm&quot;&gt;Denis Dutton&lt;/a&gt;
explains very nicely how this weakness is used by
mentalists to deceive people with
&lt;a href=&quot;http://en.wikipedia.org/wiki/Cold_reading&quot;&gt;cold reading&lt;/a&gt;.&lt;/p&gt;

&lt;iframe width=&quot;420&quot; height=&quot;315&quot; src=&quot;http://www.youtube.com/embed/qPCsCiOqmXA&quot; frameborder=&quot;0&quot; allowfullscreen&gt;&lt;/iframe&gt;

</content>
  </entry>
  
  <entry>
    <title>Alt Car in Emacs for OSX</title>
    <link href="http://textjuicer.com/blog/2011/08/14/altcar-in-emacs-on-osx/"/>
    <updated>2011-08-14T17:14:00-04:00</updated>
    <id>http://textjuicer.com/blog/2011/08/14/altcar-in-emacs-on-osx</id>
    <content type="html">&lt;p&gt;When I installed &lt;a href=&quot;http://emacsformacosx.com/&quot;&gt;emacs for OSX&lt;/a&gt;, right
option key acted as Meta instead of my beloved alt-car. It can be
fixed using these steps:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;If you use emacs 23 or prior, install &lt;a href=&quot;http://tromey.com/elpa/&quot;&gt;package.el&lt;/a&gt;&lt;/li&gt;
&lt;li&gt;Add &lt;a href=&quot;http://marmalade-repo.org&quot;&gt;marmalade&lt;/a&gt; to the list of repositories in your &lt;code&gt;.emacs&lt;/code&gt;:
     ;; Adds marmalade to package.el
     (require 'package) &lt;br/&gt;
     (add-to-list 'package-archives '(&quot;marmalade&quot; . &quot;&lt;a href=&quot;http://marmalade-repo.org/packages/&quot;&gt;http://marmalade-repo.org/packages/&lt;/a&gt;&quot;))
     (package-initialize)&lt;/li&gt;
&lt;li&gt;Refresh your package index using &lt;code&gt;M-x package-refresh-contents&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Install &lt;a href=&quot;http://marmalade-repo.org/packages/mac-key-mode&quot;&gt;mac-key-mode&lt;/a&gt; using &lt;code&gt;M-x package-install mac-key-mode&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Configure the right option key to act like alt-car using the following lines in your &lt;code&gt;.emacs&lt;/code&gt;:
     (require 'mac-key-mode)
     (setq mac-option-key-is-meta t)
     (setq mac-right-option-modifier nil)&lt;/li&gt;
&lt;li&gt;Restart emacs&lt;/li&gt;
&lt;/ol&gt;


&lt;p&gt;You should now be able to use the right option key as alt-car to
enjoy characters like @ and } on a french canadian keyboard.&lt;/p&gt;
</content>
  </entry>
  
  <entry>
    <title>Welcome to My Blog</title>
    <link href="http://textjuicer.com/blog/2011/07/24/welcome-to-my-blog/"/>
    <updated>2011-07-24T18:06:00-04:00</updated>
    <id>http://textjuicer.com/blog/2011/07/24/welcome-to-my-blog</id>
    <content type="html">&lt;p&gt;Welcome to my blog, a place where I will put information that I hope will be useful to others or future me.&lt;/p&gt;
</content>
  </entry>
  
</feed>
