\documentclass[british,ps,darkblue,slideColor,colorBG,final]{prosper}

 %-- Packages ----------------------------------------------------------------

 \usepackage[latin1]{inputenc}
 \usepackage[english]{babel}
 \usepackage[T1]{fontenc}
 \usepackage{url}

 \usepackage{amsmath}
 \usepackage{amssymb}
 \usepackage{epsfig}
 \usepackage{makeidx}

 %----------------------------------------------------------------------------

\begin{document}
  \begin{slide}
    {Ouverture}
    
    \vspace{-5mm}

    \begin{center}
      {\Large Come creare un libero dizionario per il controllo
        ortografico in lingua Sarda}

      \epsfig{file=../../Projekter/speling.org/images/www.speling.org.ps,
              angle=90,
              width=60mm}

      Jacob Sparre Andersen <sparre@crs4.it>
    \end{center}

    \vspace{4mm}

    {\small Please interrupt me {\em when} I'm speaking too fast or
      otherwise am difficult to understand.}

    {\small Questions are welcome at {\em any time} during the talk.}
  \end{slide}

  \begin{slide}
    {Overview}

    \vspace{-4mm}
    \begin{itemize}
    \item {\bf The principles behind the system.}

    \item How to work as a proof-reader on a "speling.org" based
      dictionary.

    \item How to manage the technical and editorial work on a
      "speling.org" based dictionary.

    \item The to-do list of the "speling.org" developers.
    \end{itemize}
  \end{slide}

  \begin{slide}
    {The principles behind the system}

    \begin{center}
      \verb|speling.org|
    \end{center}

    \begin{itemize}
    \item ... è una sistema per la creazione di dizionari elettronici.
    \item ... è bassata su una idea di cooperazione stilo Open Source.
    \item ... accepts that individual proof-readers can make
      mistakes\footnote{The most active Danish proof-readers have an
        estimated error-rate close to one out of thousand.}.
    \end{itemize}
  \end{slide}

  \begin{slide}
    {The principles behind the system}

    \begin{itemize}
    \item Since we assume that the individual creators of the
      dictionary entries are imperfect, we don't work with dictionary
      entries which are edited directly if somebody locate a mistake.
      -- Wikipedia style.

    \item Instead we work with a system of voting, where the object
      the creators work directly on are votes for or against having
      some information assigned to an entry in the dictionary.
    \end{itemize}
  \end{slide}

  \begin{slide}
    {The principles behind the system}

    Benefits:
    \begin{itemize}
    \item All proof-reading results are stored -- both the positive
      and the negative ones.

    \item There is less back-and-forth editing of the entries.

    \item The quality of the dictionary will gradually improve and
      mistakes by individual contributors has less effect.
    \end{itemize}
  \end{slide}

  \begin{slide}
    {The principles behind the system}

    Drawbacks:
    \begin{itemize}
    \item Since proof-readers have to act, even when the entries are
      correct, it takes more work to create a dictionary.
    \end{itemize}
  \end{slide}

  \begin{slide}
    {The principles behind the system}

    \begin{center}
      Creating a spell-checking word-list from the proof-reading
      results (votes):
    \end{center}

    \begin{itemize}
    \item The system counts up how many positive and negative votes
      each string has received.
    \item A threshold number of votes is selected, such that strings
      with that many more positive than negative votes are considered
      correctly spelled words.
    \end{itemize}
  \end{slide}

  \begin{slide}
    {The principles behind the system}

    \begin{center}
      Gradually improving quality
    \end{center}

    \vspace{3mm}

    For a spell-checking word list to be useful, it has to fullfill
    two requirements:

    \begin{itemize}
    \item It should contain all (as many as possible) of the words the
      writer uses.

    \item It should contain no (as few as possible) misspelled words.
    \end{itemize}
  \end{slide}

  \begin{slide}
    {The principles behind the system}

    \begin{center}
      Gradually improving quality
    \end{center}

    \vspace{3mm}

    As each string in the dictionary is proof-read more and more
    times, we can increase the threshold number of votes without
    shrinking the resulting word-list, but increasing our certainty
    that the strings {\bf in} the list are correctly spelled words.
  \end{slide}

  \begin{slide}
    {Overview}

    \vspace{-4mm}
    \begin{itemize}
    \item The principles behind the system.

    \item {\bf How to work as a proof-reader on a "speling.org" based
        dictionary.}

    \item How to manage the technical and editorial work on a
      "speling.org" based dictionary.  

    \item The to-do list of the "speling.org" developers.
    \end{itemize}
  \end{slide}

  \begin{slide}
    {How to work as a proof-reader ...}

    Proof-reading is primarily done by e-mail:
    \begin{itemize}
    \item You subscribe to a daily collection of words to be proof-read,
    \item you proof-read the words, and
    \item you send them back to the \verb|speling.org| server,
    \item which adds up the votes and updates the word-list.
    \end{itemize}
  \end{slide}

  \begin{slide}
    {How to work as a proof-reader ...}

    \vspace{-5mm}

    Example proof-reading e-mail:
\begin{verbatim}
Reply-To: proof-reading@sc.speling.org
Subject: [SC] Words for proof-reading

# Proof-reading key: 892253cde850f90b5ba451385d03617a  -

WORD: filuferru
STATUS: ?
EDITOR: Jacob Sparre Andersen

WORD: snaps
STATUS: ?
EDITOR: Jacob Sparre Andersen

\end{verbatim}
  \end{slide}

  \begin{slide}
    {How to work as a proof-reader ...}

    \vspace{-5mm}

    We reply and change the \verb|?| to \verb|+| for correct words and
    to \verb|-| for incorrectly spelled words:
\begin{verbatim}
To: proof-reading@sc.speling.org
Subject: Re: [SC] Words for proof-reading

> # Proof-reading key: 892253cde850f90b5ba451385d03617a  -
>
> WORD: filuferru
> STATUS: +
> EDITOR: Jacob Sparre Andersen
>
> WORD: snaps
> STATUS: -
> EDITOR: Jacob Sparre Andersen
>
\end{verbatim}
  \end{slide}

  \begin{slide}
    {How to work as a proof-reader ...}

    \vspace{-5mm}

    Our reply will then count as one extra vote for the string
    \verb|filuferru| and one extra vote against \verb|snaps|.

    \begin{itemize}
    \item If this means that \verb|filuferru| now has enough positive
      votes, the word will be included in the next version of the
      word-list.
    \item If this means that \verb|snaps| (simile a filuferru, ma
      danese) doesn't have enough positive votes, the word will be
      removed in the next version of the word-list. -- If it should
      ever have appeared in the first place.
    \end{itemize}
  \end{slide}

  \begin{slide}
    {How to work as a proof-reader ...}

    Unless the word-list is produced with a threshold of one more
    positive than negative vote, a single proof-reader can not on
    his/her own add a word to the dictionary.

    \vspace{3mm}

    This is of course a bit annoying, but it is an effect of taking
    into account that the proof-readers can make mistakes.
  \end{slide}

  \begin{slide}
    {How to work as a proof-reader ...}

    The proof-reading messages can contain many more fields than the
    three I showed in the example\footnote{ ANTONYM, AUTHORITY,
      CATEGORY, CLASS, COMMENT, COMPOSITE-WORD, CONJUGATION,
      CONJUGATION-RULE, CORRECTION, DATE, DESCRIPTION, EXAMPLE,
      HYPHENATION, ROOT, SOURCE, SOURCE-YEAR, SYNONYM,
      TRANSLATION-DE-WORD, TRANSLATION-EN-WORD, TRANSLATION-FO-WORD,
      TRANSLATION-FR-WORD, TRANSLATION-IT-WORD, TRANSLATION-NO-WORD
      and TRANSLATION-SV-WORD.}, but since the handling of information
    beyond raw word-lists for spell-checking is likely to change soon,
    I will not cover the full extent of the format here.
  \end{slide}

  \begin{slide}
    {How to work as a proof-reader ...}

    Alternate proof-reading methods:
    \begin{itemize}
    \item There is a prototype Gtk+ based proof-reading tool (which I
      presented in my last GULCh talk).
    \item Automated harvesting of user additions to the Aspell and
      Ispell dictionaries
      ({\small\url{http://www.speling.org/#dictionary_feedback}}).
    \item Palm Pilot interface.
    \end{itemize}
  \end{slide}

  \begin{slide}
    {Overview}

    \vspace{-4mm}
    \begin{itemize}
    \item The principles behind the system.

    \item How to work as a proof-reader on a "speling.org" based
      dictionary.

    \item {\bf How to manage the technical and editorial work on a
        "speling.org" based dictionary.}

    \item The to-do list of the "speling.org" developers.
    \end{itemize}
  \end{slide}

  \begin{slide}
    {How to manage the technical ...}

    The first step in managing the technical and editorial work on a
    \verb|speling.org| based dictionary is of course to install the
    software.

    \begin{itemize}
    \item Download the source code from \url{http://www.speling.org/}.
    \item Unpack the source code and move to the created directory.
    \item Configure: \verb|PREFIX=/opt/speling.org ./configure|
    \end{itemize}
  \end{slide}

  \begin{slide}
    {How to manage the technical ...}

    (install continued)
    \begin{itemize}
    \item Make: \verb|make|
    \item Install: \verb|sudo make install|
    \item Fix \verb|PATH|: \verb|export PATH=/opt/speling.org/bin:$PATH|
    \end{itemize}
  \end{slide}

  \begin{slide}
    {How to manage the technical ...}

    Once the system is installed, a dictionary should be configured,
    \verb|make_new_dictionary sc|\footnote{"sc" is the ISO 639-1
      two-letter language code for Sardinian.}, and populated:
\begin{verbatim}
cat corpus/* | tr ' ' '\n' | sort -u \
 | words_to_ds \
 > /var/speling.org/sc/incoming.ds/corpus
\end{verbatim}
  \end{slide}

  \begin{slide}
    {How to manage the technical ...}

    Dictionary data are by default stored under
    {\small\verb|/var/speling.org/<language code>/|}.

    \vspace{3mm}

    New editor proof-reading reports (in .ds format) should be put in
    {\small\verb|/var/speling.org/<language code>/incoming.ds/|}.

    \vspace{3mm}

    The program {\small\verb|update_dictionaries|} reads the new
    proof-reading reports and generates updated word-lists.
  \end{slide}

  \begin{slide}
    {How to manage the technical ...}

    The program {\small\verb|send_words_to_proof-reading|} is used to
    send proof-reading e-mails out to the subscribing proof-readers.

    \vspace{3mm}

    I use Procmail to intercept, filter and archive the proof-reading
    reports as they arrive.  The file {\small\verb|dot.procmailrc|} is
    an example of how this can be done.
  \end{slide}

  \begin{slide}
    {How to manage the technical ...}

    \vspace{-2mm}

    \begin{center}
      Getting words from the World Wide Web \\
      Crúbadán.
    \end{center}

    \vspace{3mm}

    Once you have set up a system for receiving proof-reading messages
    by e-mail, you might want to get in touch with Kevin Patrick
    Scannell who runs Crúbadán
    ({\small\url{http://borel.slu.edu/crubadan/}}).  It is possible
    that he has data for your language in Crúbadán, so he can set the
    system up to send you messages with possible words for your
    dictionary, when it finds them on the net.
  \end{slide}

  \begin{slide}
    {How to manage the ... editorial ...}

    \vspace{-2mm}

    \begin{center}
      Using authoritative sources
    \end{center}

    \vspace{3mm}

    As an editor of a \verb|speling.org| dictionary, you have the
    option of using the \verb|AUTHORITY| field in the proof-reading
    format to cite authoritative sources (commonly recognized
    dictionaries, experts, etc.) of the information you report to the
    system:
\begin{verbatim}
WORD: husholdning
STATUS: +
AUTHORITY: Retskrivningsordbogen, 3. udgave, 2001
EDITOR: Jacob Sparre Andersen
\end{verbatim}
  \end{slide}

  \begin{slide}
    {Overview}

    \vspace{-4mm}
    \begin{itemize}
    \item The principles behind the system.

    \item How to work as a proof-reader on a "speling.org" based
      dictionary.  

    \item How to manage the technical and editorial work on a
      "speling.org" based dictionary.

    \item {\bf The to-do list of the "speling.org" developers.}
    \end{itemize}
  \end{slide}

  \begin{slide}
    {To-do list for "speling.org"}

    \vspace{3mm}

    The current version of \verb|speling.org| is fine for creating a
    plain word-list for spell-checking, but it is insufficient when it
    comes to creating a proper dictionary with grammatical
    information, synonyms, explanations of words, etc.

    \vspace{3mm}

    The problem is that the current format only is expressive enough
    to {\bf add} this extra information, not to correct it, if it is
    wrong.
  \end{slide}

  \begin{slide}
    {To-do list for "speling.org"}

    \begin{itemize}
    \item Make a Debian package with \verb|speling.org|.

    \item Define a more expressive format for adding and correcting
      extra information.

    \item Write a tool for converting from the old to the new format.

    \item Reimplement the system with the new source format.

    \item Write web and and graphical client-side tools for the
      proof-readers.
    \end{itemize}
  \end{slide}

  \begin{slide}
    {Credits}

    \begin{itemize}
    \item The \verb|speling.org| system was developed in cooperation
      with Henrik Christian Grove and Peter Makholm. 
    \item The \verb|speling.org| logo was designed by Hans Schou. \\
      \hspace{20mm}
      \epsfig{file=../../Projekter/speling.org/images/www.speling.org.ps,
        angle=90, width=60mm}
    \item Dansk Sprognævn (The Danish Language Council) provoked me to
      start the project.
    \end{itemize}
  \end{slide}

  \begin{slide}
    {Links}

    \vspace{3mm}

    \begin{itemize}
    \item Source code for the \verb|speling.org| system:
      {\small\url{http://www.speling.org/}}
    \item A running \verb|speling.org| system:
      {\small\url{http://da.speling.org/}}
    \item Crúbadán - a source for minority language corpora:
      {\small\url{http://borel.slu.edu/crubadan/}}
    \item Wikipedia - a different, open way of creating dictionaries:
      {\small\url{http://it.wikipedia.org/wiki/Wikipedia}}
    \end{itemize}
  \end{slide}
\end{document}

