(1) Literature Accompanying:
Correspondence Analysis and Data Coding
with R and Java
- Correspondence Analysis of the movie Casablanca, for
displaying and tracking emotion
on YouTube.
For published work on this, see "narrativization" - analysis and synthesis of narrative site.
-
-
Les Cahiers de l'Analyse des Données, Tome 1, no. 1, 1976
- Tome 22, no. 4, 1997, scanned copies of all 22 volumes of this journal,
with 4 issues annually, from 1976 to 1997.
-
- F. Murtagh,
"The Correspondence Analysis platform for uncovering deep
structure in data and information", The Computer Journal, 53 (3), 304-315, 2010.
6th Annual Public Boole Lecture in Informatics, 2008, Boole Centre for Research in Informatics,
University College Cork.
- J.P. Benzécri,
"Si j'avais un laboratoire" (If I had a lab)",
La Revue MODULAD -- Le Monde des Utilisateurs de l'Analyse des Données, No. 38, 2008.
- J.P. Benzécri,
"L'Analyse des données :
histoire, bilan, projets, perspective (Data analysis: history, balance-sheet, projects, outlook)"
La Revue MODULAD -- Le Monde des Utilisateurs de l'Analyse des Données, No. 35, 2006.
- J.P. Benzécri,
English translation of
part of "Si j'avais un laboratoire".
-
- J.P. Benzécri,
"Choriogenensis: the dynamical genesis of space and its dimensions, controlled by correspondence analysis",
chapter 6, pp. 63-76, in
M. Gettler Summa, L. Bottou, B. Goldfarb, F. Murtagh, C. Pardoux and M. Touati, Eds.,
Statistical Learning and Data Science, Chapman and Hall, 2011.
- Special issue, "About the History of Multivariate Exploratory Data
Analysis",
with many articles dealing with the history and development of
Benzécri's approach to analysis of data and science.
Electronic Journal for History of Probability and Statistics,
Vol. 4/2, December 2008,
- Included in the foregoing is an English version of the article
"L'âme au bout d'un rasoir, The
soul at the razor's edge", originally publised in the journal
Les Cahiers de l'Analyse des Données, vol. V, no. 2, 1980, pp. 229-242.
Updated version (10 May 2009) in English of this article,
"L'âme au bout d'un rasoir, The soul at the razor's edge".
-
- H. Rouanet, W. Ackerman and B. Le Roux,
"The geometric analysis of questionnaires: the lesson of Bourdieu's La Distinction",
updated version 2004, Bulletin de Méthodologie Sociologique, 2000, 65, 5-15.
- F. Murtagh, "Mathematics, science, and the role of data analysis",
translated freely from: J.P. Benzézecri,
"L'avenir de l'analyse des
données", Behaviormetrika, 10, 1-11 (1983).
- Brigitte Le Roux's pages, with details of "Geometric Data Analysis Workshop"
("Ateliers d'Analyse Géometrique des Données") training courses;
B. Le Roux and H. Rouanet, Multiple Correspondence Analysis, SAGE, 2010;
B. Le Roux and H. Rouanet, Geometric Data Analysis: From Correspondence
Analysis to Structured Data Analysis, Kluwer, 2004.
(2) Software Accompanying:
Correspondence Analysis and Data Coding
with R and Java
The software and data presented here accompanies the book
Correspondence Analysis and Data Coding with R and Java, by Fionn
Murtagh, Chapman & Hall/CRC, 2005, pp 250+xviii.
J.P. Benzécri, from Foreword:
"Physics progresses, mainly, by constituting corpora of rare
phenomena among immense sets of ordinary cases. The simple
observation of one of these ordinary cases requires detection
apparatus based on millions of small elementary detectors.
Yet physics is, in part, a computational science, as evidenced
by the conclusion of a paper on the theory of generalized zeta
functions: "Our results are secure, numerically, yet appear very
hard to prove by analysis".
I repeat: the statistician has to be modest. The work of my generation
has been exalting. A new statistical and data analysis is there to be
invented, now that one has inexpensive means of computation that could
not be dreamed of just thirty years ago."
Some of the programs, especially
the R and C ones, are in ascii text. Some others
are binary (e.g. the clustering DLL program, and the Java class files).
The Java code and the data sets are collected together in tar files, to
be extracted using WinZIP or tar or some similar utility.
1. Software in R
The R package can be obtained for most computer platforms at the
address The R Project for
Statistical Computing.
- Correspondence analysis
- Hierarchical clustering
- Interpretation aids
- Utilities and data
2. Text Processing
The text processing support programs are all in C.
- Analysis of multiple text files.
- aviation-reports-data.tar,
47 aviation accident reports, the list of these files, programs
txtanalysis.c and xtabulate.c, and output files words.txt, words0.txt,
and xtabulate.txt, used as examples in the following.
- txtanalysis.c program, that is run
as follows: txtanalysis filelist.txt [words.txt]
- xtabulate.c program, that is run as
follows: xtabulate words.txt filelist.txt [xtabulate.txt]
- word_analysis.c, program to check
for sufficient number of occurrences of words in all texts. Hence,
this program yields a common word-list. This common word-list can be
used by xtabulate.
- Analysis of a single (large) text file.
- arist10.txt, Aristotle's Categories.
Note: we removed the legal information (so as not to influence the
analysis) to yield the file arist10x.txt.
- docanalysis.c program, to produce a word
list from a single text file. Example of use: docanalysis
arist10x.txt words.txt. for the Categories 1260 words are found.
It is best to filter or cull these (or else the cross-tabulations, to
follow will be very large).
- doctabulate.c program, to produce a
cross-tabulation for each of the chapter and section levels in the
Categories. Use: doctabulate arist10x.txt words.txt out. This
produces the cross-tabulations, or contingency tables, out1.txt,
out2.txt, out3.txt, out4.txt, corresponding to the different section
levels in this book.
- Notes: programs txtanalysis and xtabulate should handle acceptably
(i) accented characters, and (ii) use in a Mac OS X environment. (The
latter issue is that memory allocation is already catered for; so
the line "#include <malloc.h>" at the
start of the file should not be present.)
3. Software in Java
To install JDK or JRE (see below), check
Sunsoft Sun Developer Network Site.
4. Updates to the Book
Errata
- P. 35, 2 lines above expressions (2.4),
change j \in J to: i \in I.
- P. 37, line -14, change f_K to
f_I. And in paragraphs on lines -11, -10, delete opening words and
terms to begin sentence with: We can right-multiply the
eigen-equation above ....
- P. 113, line 6 of text, 40% should be 48%.
- See above for changes to the C program for hierarchical clustering
(minimum variance/Wards with weighting of rows/cases), and associated R
calling script.
Updates
- A new version of facor, with an example of
use at the start of the program. Input data set,
casa2.prn (text file), a characterization with 13 person and place
attributes of the 77 scenes of the film, Casablanca.
5. Book Reviews and Survey Papers
- "Detailed examples of its application to data are drawn from an astonishingly wide variety of fields; astronomy,
financial modeling and forecasting, comparisons of prehistoric and modern groups of dogs, ancient goblets and measurements
on ancient Egyptian skulls. ... All in all this book can be recommended as a succinct reference on all aspects of correspondence
analysis, theoretical, computational, and practical." - J.M. Juritz, Short Book Reviews of the ISI.
- "This book plays an important role in bridging the gap between learning a method and actually implementing it ... could
serve as either a text for an introductory course on CA or as a supplementary text to a more advanced graduate course in CA or
multivariate techniques in general ... The author should be commended for bringing these issues to the forefront."
– Douglas Steinley, University of Missouri-Columbia, in Psychometrika, Vol. 74, No. 1, 2007.
- F. Murtagh, Review, Journal of
Classification, 25, 137-141, 2008, of
Brigitte Le Roux and Henry Rouanet,
Geometric Data Analysis, From Correspondence Analysis to Structured
Data Analysis, Kluwer, Dordrecht, 2004.
- F. Murtagh,
reply to Jan de Leeuw regarding this book,
focusing on the continuing ground-breaking innovation underlying data
coding in the correspondence analysis and associated data analysis
framework.
- F. Murtagh,
Origins of modern data analysis
linked to the beginnings and early development of computer science and
information engineering, Electronic Journal for History of Probability and Statistics,
vol. 4, no. 2, Dec. 2008.
6. Other Data Analysis and Signal Processing Software
- Multivariate
data analysis resources, code by F Murtagh, in C, Fortran, R/S-Plus,
and Java, for cluster analysis, and other purposes. Features:
O(n^2) hierarchical clustering, wavelet transform on a hierarchy, online
books, datasets. Code used in: CLUSTAN; R; by American Airlines Pricing
Systems; many others.
- MR, a large suite of
programs for wavelet transform analysis of images and signals,
together with other multiresolution transform analysis approaches
(curvelet and ridgelet transform), and general image and signal
processing (edge detection, fractal analysis, etc.). For filtering
and noise modeling, compression, deconvolution, visualization, and other
related applications.
- Code in Matlab and IDL
accompanying the book J.L. Starck, F. Murtagh and J. Fadili,
Sparse Image and Signal Processing: Wavelets, Curvelets, Morphological
Diversity, Cambridge University Press, 2010.
7. Linnaeus, Huyghens, Laplace
From J.-P. Benzécri et coll., L'Analyse des Données.
Tome I, Taxinomie. Tome II, Correspondances.
Dunod, 1973 (2nd edn., 1976).
- Linnaeus
- Huyghens
- Laplace
Author's homepage.
Contact: f murtagh at acm dot org (user name: one word)