Emacs Lisp Stat - data(attitude)
目次
1 Abstract
- http://sugano-nu.github.io/emacs-lisp-stat-attitude/
- R and Emacs Lisp for Data Analysis
- R is Domain Specific Language for data analysis.
- Emacs Lisp is so much flexible.
- References
2 R: DSL for Statistical Data Analysis
2.1 attitude data in R
help(attitude)
The Chatterjee-Price Attitude Data
Description:
From a survey of the clerical employees of a large financial
organization, the data are aggregated from the questionnaires of
the approximately 35 employees for each of 30 (randomly selected)
departments. The numbers give the percent proportion of
favourable responses to seven questions in each department.
Usage:
attitude
Format:
A data frame with 30 observations on 7 variables. The first column
are the short names from the reference, the second one the
variable names in the data frame:
Y rating numeric Overall rating
X[1] complaints numeric Handling of employee complaints
X[2] privileges numeric Does not allow special privileges
X[3] learning numeric Opportunity to learn
X[4] raises numeric Raises based on performance
X[5] critical numeric Too critical
X[6] advancel numeric Advancement
Source:
Chatterjee, S. and Price, B. (1977) _Regression Analysis by
Example_. New York: Wiley. (Section 3.7, p.68ff of 2nd
ed.(1991).)
Examples:
require(stats); require(graphics)
pairs(attitude, main = "attitude data")
summary(attitude)
summary(fm1 <- lm(rating ~ ., data = attitude))
opar <- par(mfrow = c(2, 2), oma = c(0, 0, 1.1, 0),
mar = c(4.1, 4.1, 2.1, 1.1))
plot(fm1)
summary(fm2 <- lm(rating ~ complaints, data = attitude))
plot(fm2)
par(opar)
data(attitude) attitude[1:5, ]
| rating | complaints | privileges | learning | raises | critical | advance | |
|---|---|---|---|---|---|---|---|
| 1 | 43 | 51 | 30 | 39 | 61 | 92 | 45 |
| 2 | 63 | 64 | 51 | 54 | 63 | 73 | 47 |
| 3 | 71 | 70 | 68 | 69 | 76 | 86 | 48 |
| 4 | 61 | 63 | 45 | 47 | 54 | 84 | 35 |
| 5 | 81 | 78 | 56 | 66 | 71 | 83 | 47 |
Here are the correlations.
round(cor(attitude),2)
| rating | complaints | privileges | learning | raises | critical | advance | |
|---|---|---|---|---|---|---|---|
| rating | 1 | 0.83 | 0.43 | 0.62 | 0.59 | 0.16 | 0.16 |
| complaints | 0.83 | 1 | 0.56 | 0.6 | 0.67 | 0.19 | 0.22 |
| privileges | 0.43 | 0.56 | 1 | 0.49 | 0.45 | 0.15 | 0.34 |
| learning | 0.62 | 0.6 | 0.49 | 1 | 0.64 | 0.12 | 0.53 |
| raises | 0.59 | 0.67 | 0.45 | 0.64 | 1 | 0.38 | 0.57 |
| critical | 0.16 | 0.19 | 0.15 | 0.12 | 0.38 | 1 | 0.28 |
| advance | 0.16 | 0.22 | 0.34 | 0.53 | 0.57 | 0.28 | 1 |
2.2 scatterplot by pairs()
par(family = "HiraKakuProN-W3") ## ← Windowsでは実行しない pairs(attitude, panel = panel.smooth, names(attitude))
図1: emacs-lisp-stat-attitude-R-pairs-01
2.3 Scatterplot by pairs.panels()
par(family = "HiraKakuProN-W3") ## ← Windowsでは実行しない library(psych) pairs.panels(attitude, scale = TRUE)
図2: emacs-lisp-stat-attitude-R-pairs-panels-01
2.4 Regression Analysis by lm()
attitude.lm1 <- lm(rating ~ ., data = attitude)
summary(attitude.lm1)
Call:
lm(formula = rating ~ ., data = attitude)
Residuals:
Min 1Q Median 3Q Max
-10.9418 -4.3555 0.3158 5.5425 11.5990
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 10.78708 11.58926 0.931 0.361634
complaints 0.61319 0.16098 3.809 0.000903 ***
privileges -0.07305 0.13572 -0.538 0.595594
learning 0.32033 0.16852 1.901 0.069925 .
raises 0.08173 0.22148 0.369 0.715480
critical 0.03838 0.14700 0.261 0.796334
advance -0.21706 0.17821 -1.218 0.235577
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 7.068 on 23 degrees of freedom
Multiple R-squared: 0.7326, Adjusted R-squared: 0.6628
F-statistic: 10.5 on 6 and 23 DF, p-value: 0.0000124
2.5 Variable Selection
attitude.lm2 <- step(attitude.lm1)
Start: AIC=123.36
rating ~ complaints + privileges + learning + raises + critical +
advance
Df Sum of Sq RSS AIC
- critical 1 3.41 1152.4 121.45
- raises 1 6.80 1155.8 121.54
- privileges 1 14.47 1163.5 121.74
- advance 1 74.11 1223.1 123.24
<none> 1149.0 123.36
- learning 1 180.50 1329.5 125.74
- complaints 1 724.80 1873.8 136.04
Step: AIC=121.45
rating ~ complaints + privileges + learning + raises + advance
Df Sum of Sq RSS AIC
- raises 1 10.61 1163.0 119.73
- privileges 1 14.16 1166.6 119.82
- advance 1 71.27 1223.7 121.25
<none> 1152.4 121.45
- learning 1 177.74 1330.1 123.75
- complaints 1 724.70 1877.1 134.09
Step: AIC=119.73
rating ~ complaints + privileges + learning + advance
Df Sum of Sq RSS AIC
- privileges 1 16.10 1179.1 118.14
- advance 1 61.60 1224.6 119.28
<none> 1163.0 119.73
- learning 1 197.03 1360.0 122.42
- complaints 1 1165.94 2328.9 138.56
Step: AIC=118.14
rating ~ complaints + learning + advance
Df Sum of Sq RSS AIC
- advance 1 75.54 1254.7 118.00
<none> 1179.1 118.14
- learning 1 186.12 1365.2 120.54
- complaints 1 1259.91 2439.0 137.94
Step: AIC=118
rating ~ complaints + learning
Df Sum of Sq RSS AIC
<none> 1254.7 118.00
- learning 1 114.73 1369.4 118.63
- complaints 1 1370.91 2625.6 138.16
2.6 Results of Regression Analysis
summary(attitude.lm2)
Call:
lm(formula = rating ~ complaints + learning, data = attitude)
Residuals:
Min 1Q Median 3Q Max
-11.5568 -5.7331 0.6701 6.5341 10.3610
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 9.8709 7.0612 1.398 0.174
complaints 0.6435 0.1185 5.432 0.00000957 ***
learning 0.2112 0.1344 1.571 0.128
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 6.817 on 27 degrees of freedom
Multiple R-squared: 0.708, Adjusted R-squared: 0.6864
F-statistic: 32.74 on 2 and 27 DF, p-value: 0.00000006058
Regression Equation is:
rating = 9.8709 + 0.6435 × complaints + 0.2112 × earning
2.7 Regression Diagnosis
par(family = "HiraKakuProN-W3") ## ← Windowsでは実行しない par(mfrow = c(2, 2)) plot(attitude.lm2)
図3: emacs-lisp-stat-attitude-R-plot-lm
3 Emacs Lisp: It's a LISP.
3.1 Making CSV Data
-rw-r--r--@ 1 sugano staff 850 2 3 19:59 attitude.csv
31 31 850 attitude.csv
3.2 list of attitude
(setq dd
(with-temp-buffer
(org-table-import (expand-file-name file) nil)
(org-table-to-lisp)))
(setq LIST-VAR (car dd))
(setq dd (cdr dd))
3.3 variable list
LIST-VAR
| rating | complaints | privileges | learning | raises | critical | advance |
3.4 Data
- first line of data
(car dd)
("1" "43" "51" "30" "39" "61" "92" "45")
- last line of data
(car (reverse dd))
("30" "82" "82" "39" "59" "64" "78" "39")
3.5 Emacs Lisp Stat Function
(require 'cl-lib) (require 'calc) (defun els-mean (N LIST-OF-DATA) "Calculate an arithmetic mean of N th column of LIST-OF-DATA. The first column is 0 th." (setq LIST (mapcar* #'(lambda (x) (nth N x)) LIST-OF-DATA)) (string-to-number (math-format-number (calcFunc-vmean (cons 'vec (mapcar* #'(lambda (X) (math-read-number X)) LIST)))))) (defun els-varp (N LIST-OF-DATA) "Calculate a population variance of N th column of LIST-OF-DATA. The first column is 0 th." (setq LIST (mapcar* #'(lambda (x) (nth N x)) LIST-OF-DATA)) (string-to-number (math-format-number (calcFunc-vpvar (cons 'vec (mapcar* #'(lambda (X) (math-read-number X)) LIST)))))) (defun els-var (N LIST-OF-DATA) "Calculate an unbiased variance of N th column of LIST-OF-DATA. The first column is 0 th." (setq LIST (mapcar* #'(lambda (x) (nth N x)) LIST-OF-DATA)) (string-to-number (math-format-number (calcFunc-vvar (cons 'vec (mapcar* #'(lambda (X) (math-read-number X)) LIST)))))) (defun els-sd (N LIST-OF-DATA) "Calculate an unbiased standard deviation of N th column of LIST-OF-DATA. The first column is 0 th." (setq LIST (mapcar* #'(lambda (x) (nth N x)) LIST-OF-DATA)) (sqrt (string-to-number (math-format-number (calcFunc-vvar (cons 'vec (mapcar* #'(lambda (X) (math-read-number X)) LIST))))))) (defun els-cor (N1 N2 LIST-OF-DATA) "Calculate a correlation of coefficient of N1 and N2 th column of LIST-OF-DATA. The first column is 0 th." (setq LISTX (mapcar* #'(lambda (x) (nth N1 x)) LIST-OF-DATA)) (setq LISTY (mapcar* #'(lambda (y) (nth N2 y)) LIST-OF-DATA)) (string-to-number (math-format-number (calcFunc-vcorr (cons 'vec (mapcar* #'(lambda (X) (math-read-number X)) LISTX)) (cons 'vec (mapcar* #'(lambda (Y) (math-read-number Y)) LISTY)))))) (defun els-cov (N1 N2 LIST-OF-DATA) "Calculate a covariance of N1 and N2 th column of LIST-OF-DATA. The first column is 0 th." (setq LISTX (mapcar* #'(lambda (x) (nth N1 x)) LIST-OF-DATA)) (setq LISTY (mapcar* #'(lambda (y) (nth N2 y)) LIST-OF-DATA)) (string-to-number (math-format-number (calcFunc-vcov (cons 'vec (mapcar* #'(lambda (X) (math-read-number X)) LISTX)) (cons 'vec (mapcar* #'(lambda (Y) (math-read-number Y)) LISTY)))))) (defun els-round (VALUE N) "Rounds the numeric value to the specified number of decimal places." (/ (* 1.0 (round (* VALUE (expt 10 N)))) (expt 10 N)))
els-round
3.6 Mean
LIST-VAR
("" "rating" "complaints" "privileges" "learning" "raises" "critical" "advance")
(els-mean 1 dd)
64.6333333333
3.7 Multiple Means: Programming with Emacs Lisp
I had to struggle for a while to get the same result as in R.
List data structure use here in Emacs Lisp is very simple for the moment.
- No variable labels.
- No factor or value labels.
(number-sequence 1 7)
(1 2 3 4 5 6 7)
(mapcar* #'(lambda (N) (els-mean N dd)) (number-sequence 1 7))
(64.6333333333 66.6 53.1333333333 56.3666666667 64.6333333333 74.7666666667 42.9333333333)
(mapcar* #'(lambda (N) (els-round (els-mean N dd) 2)) (number-sequence 1 7))
| 64.63 | 66.6 | 53.13 | 56.37 | 64.63 | 74.77 | 42.93 |
- Means with Title of Variable Names
(list (cdr LIST-VAR) 'hline (mapcar* #'(lambda (N) (els-round (els-mean N dd) 2)) (number-sequence 1 7)))
| rating | complaints | privileges | learning | raises | critical | advance |
|---|---|---|---|---|---|---|
| 64.63 | 66.6 | 53.13 | 56.37 | 64.63 | 74.77 | 42.93 |
3.8 Multiple Means: Just too easy for R
Ah, everything is so easy in R environment.
t(round(colMeans(attitude), 2))
| rating | complaints | privileges | learning | raises | critical | advance |
|---|---|---|---|---|---|---|
| 64.63 | 66.6 | 53.13 | 56.37 | 64.63 | 74.77 | 42.93 |
4 GNUPLOT
'((0 0.1) (0.1 1) (0.5 10)))
| 0 | 0.1 |
| 0.1 | 1 |
| 0.5 | 10 |
(defun transpose (a) (apply #'mapcar* #'list a))
transpose
(transpose (list (second (transpose dd)) (third (transpose dd))))
| 43 | 51 |
| 63 | 64 |
| 71 | 70 |
| 61 | 63 |
| 81 | 78 |
| 43 | 55 |
| 58 | 67 |
| 71 | 75 |
| 72 | 82 |
| 67 | 61 |
| 64 | 53 |
| 67 | 60 |
| 69 | 62 |
| 68 | 83 |
| 77 | 77 |
| 81 | 90 |
| 74 | 85 |
| 65 | 60 |
| 65 | 70 |
| 50 | 58 |
| 50 | 40 |
| 64 | 61 |
| 53 | 66 |
| 40 | 37 |
| 63 | 54 |
| 66 | 77 |
| 78 | 75 |
| 48 | 57 |
| 85 | 85 |
| 82 | 82 |
| 43 | 51 |
| 63 | 64 |
| 71 | 70 |
| 61 | 63 |
| 81 | 78 |
| 43 | 55 |
| 58 | 67 |
| 71 | 75 |
| 72 | 82 |
| 67 | 61 |
| 64 | 53 |
| 67 | 60 |
| 69 | 62 |
| 68 | 83 |
| 77 | 77 |
| 81 | 90 |
| 74 | 85 |
| 65 | 60 |
| 65 | 70 |
| 50 | 58 |
| 50 | 40 |
| 64 | 61 |
| 53 | 66 |
| 40 | 37 |
| 63 | 54 |
| 66 | 77 |
| 78 | 75 |
| 48 | 57 |
| 85 | 85 |
| 82 | 82 |
LIST-VAR
("" "rating" "complaints" "privileges" "learning" "raises" "critical" "advance")
set grid lw 2 plot data w p title "rating and complaints"
図4: emacs-lisp-stat-attitude-gnuplot-01
5 Conclusion for the Moment
I had to realize that:
- What on the earth why R is so well built!
- How easy it is to anlayze data with R.
But ONE MORE THING:
- Even for a Sunday programmer, How flexible Emacs Lisp is!