Emacs Lisp Stat - data(attitude)

目次

1 Abstract

2 R: DSL for Statistical Data Analysis

2.1 attitude data in R

help(attitude)
The Chatterjee-Price Attitude Data

Description:

     From a survey of the clerical employees of a large financial
     organization, the data are aggregated from the questionnaires of
     the approximately 35 employees for each of 30 (randomly selected)
     departments.  The numbers give the percent proportion of
     favourable responses to seven questions in each department.

Usage:

     attitude
     
Format:

     A data frame with 30 observations on 7 variables. The first column
     are the short names from the reference, the second one the
     variable names in the data frame:

          Y  rating      numeric  Overall rating                    
       X[1]  complaints  numeric  Handling of employee complaints   
       X[2]  privileges  numeric  Does not allow special privileges 
       X[3]  learning    numeric  Opportunity to learn              
       X[4]  raises      numeric  Raises based on performance       
       X[5]  critical    numeric  Too critical                      
       X[6]  advancel    numeric  Advancement                       
      
Source:

     Chatterjee, S. and Price, B. (1977) _Regression Analysis by
     Example_.  New York: Wiley.  (Section 3.7, p.68ff of 2nd
     ed.(1991).)

Examples:

     require(stats); require(graphics)
     pairs(attitude, main = "attitude data")
     summary(attitude)
     summary(fm1 <- lm(rating ~ ., data = attitude))
     opar <- par(mfrow = c(2, 2), oma = c(0, 0, 1.1, 0),
                 mar = c(4.1, 4.1, 2.1, 1.1))
     plot(fm1)
     summary(fm2 <- lm(rating ~ complaints, data = attitude))
     plot(fm2)
     par(opar)
data(attitude)
attitude[1:5, ]
  rating complaints privileges learning raises critical advance
1 43 51 30 39 61 92 45
2 63 64 51 54 63 73 47
3 71 70 68 69 76 86 48
4 61 63 45 47 54 84 35
5 81 78 56 66 71 83 47

Here are the correlations.

round(cor(attitude),2)
  rating complaints privileges learning raises critical advance
rating 1 0.83 0.43 0.62 0.59 0.16 0.16
complaints 0.83 1 0.56 0.6 0.67 0.19 0.22
privileges 0.43 0.56 1 0.49 0.45 0.15 0.34
learning 0.62 0.6 0.49 1 0.64 0.12 0.53
raises 0.59 0.67 0.45 0.64 1 0.38 0.57
critical 0.16 0.19 0.15 0.12 0.38 1 0.28
advance 0.16 0.22 0.34 0.53 0.57 0.28 1

2.2 scatterplot by pairs()

par(family = "HiraKakuProN-W3") ## ← Windowsでは実行しない
pairs(attitude, panel = panel.smooth, names(attitude))

emacs-lisp-stat-attitude-R-pairs-01.png

図1: emacs-lisp-stat-attitude-R-pairs-01

2.3 Scatterplot by pairs.panels()

par(family = "HiraKakuProN-W3") ## ← Windowsでは実行しない
library(psych)
pairs.panels(attitude, scale = TRUE)

emacs-lisp-stat-attitude-R-pairs-panels-01.png

図2: emacs-lisp-stat-attitude-R-pairs-panels-01

2.4 Regression Analysis by lm()

attitude.lm1 <- lm(rating ~ ., data = attitude)
summary(attitude.lm1)
Call:
lm(formula = rating ~ ., data = attitude)

Residuals:
     Min       1Q   Median       3Q      Max 
-10.9418  -4.3555   0.3158   5.5425  11.5990 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 10.78708   11.58926   0.931 0.361634    
complaints   0.61319    0.16098   3.809 0.000903 ***
privileges  -0.07305    0.13572  -0.538 0.595594    
learning     0.32033    0.16852   1.901 0.069925 .  
raises       0.08173    0.22148   0.369 0.715480    
critical     0.03838    0.14700   0.261 0.796334    
advance     -0.21706    0.17821  -1.218 0.235577    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 7.068 on 23 degrees of freedom
Multiple R-squared:  0.7326,	Adjusted R-squared:  0.6628 
F-statistic:  10.5 on 6 and 23 DF,  p-value: 0.0000124

2.5 Variable Selection

attitude.lm2 <- step(attitude.lm1)
Start:  AIC=123.36
rating ~ complaints + privileges + learning + raises + critical + 
    advance

             Df Sum of Sq    RSS    AIC
- critical    1      3.41 1152.4 121.45
- raises      1      6.80 1155.8 121.54
- privileges  1     14.47 1163.5 121.74
- advance     1     74.11 1223.1 123.24
<none>                    1149.0 123.36
- learning    1    180.50 1329.5 125.74
- complaints  1    724.80 1873.8 136.04

Step:  AIC=121.45
rating ~ complaints + privileges + learning + raises + advance

             Df Sum of Sq    RSS    AIC
- raises      1     10.61 1163.0 119.73
- privileges  1     14.16 1166.6 119.82
- advance     1     71.27 1223.7 121.25
<none>                    1152.4 121.45
- learning    1    177.74 1330.1 123.75
- complaints  1    724.70 1877.1 134.09

Step:  AIC=119.73
rating ~ complaints + privileges + learning + advance

             Df Sum of Sq    RSS    AIC
- privileges  1     16.10 1179.1 118.14
- advance     1     61.60 1224.6 119.28
<none>                    1163.0 119.73
- learning    1    197.03 1360.0 122.42
- complaints  1   1165.94 2328.9 138.56

Step:  AIC=118.14
rating ~ complaints + learning + advance

             Df Sum of Sq    RSS    AIC
- advance     1     75.54 1254.7 118.00
<none>                    1179.1 118.14
- learning    1    186.12 1365.2 120.54
- complaints  1   1259.91 2439.0 137.94

Step:  AIC=118
rating ~ complaints + learning

             Df Sum of Sq    RSS    AIC
<none>                    1254.7 118.00
- learning    1    114.73 1369.4 118.63
- complaints  1   1370.91 2625.6 138.16

2.6 Results of Regression Analysis

summary(attitude.lm2)
Call:
lm(formula = rating ~ complaints + learning, data = attitude)

Residuals:
     Min       1Q   Median       3Q      Max 
-11.5568  -5.7331   0.6701   6.5341  10.3610 

Coefficients:
            Estimate Std. Error t value   Pr(>|t|)    
(Intercept)   9.8709     7.0612   1.398      0.174    
complaints    0.6435     0.1185   5.432 0.00000957 ***
learning      0.2112     0.1344   1.571      0.128    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 6.817 on 27 degrees of freedom
Multiple R-squared:  0.708,	Adjusted R-squared:  0.6864 
F-statistic: 32.74 on 2 and 27 DF,  p-value: 0.00000006058

Regression Equation is:

rating = 9.8709 + 0.6435 × complaints + 0.2112 × earning

2.7 Regression Diagnosis

par(family = "HiraKakuProN-W3") ## ← Windowsでは実行しない
par(mfrow = c(2, 2))
plot(attitude.lm2)

emacs-lisp-stat-attitude-R-plot-lm.png

図3: emacs-lisp-stat-attitude-R-plot-lm

3 Emacs Lisp: It's a LISP.

3.1 Making CSV Data

-rw-r--r--@ 1 sugano  staff  850  2  3 19:59 attitude.csv
      31      31     850 attitude.csv

3.2 list of attitude

(setq dd
      (with-temp-buffer
        (org-table-import (expand-file-name file) nil)
        (org-table-to-lisp)))
(setq LIST-VAR (car dd))
(setq dd (cdr dd))

3.3 variable list

LIST-VAR
  rating complaints privileges learning raises critical advance

3.4 Data

  • first line of data
(car dd)
("1" "43" "51" "30" "39" "61" "92" "45")
  • last line of data
(car (reverse dd))
("30" "82" "82" "39" "59" "64" "78" "39")

3.5 Emacs Lisp Stat Function

(require 'cl-lib)
(require 'calc)

(defun els-mean (N LIST-OF-DATA)
  "Calculate an arithmetic mean of N th column of LIST-OF-DATA.
The first column is 0 th."
  (setq LIST (mapcar* #'(lambda (x) (nth N x)) LIST-OF-DATA))
  (string-to-number (math-format-number
                     (calcFunc-vmean
                      (cons 'vec (mapcar* #'(lambda (X) (math-read-number X))
                                          LIST))))))

(defun els-varp (N LIST-OF-DATA)
  "Calculate a population variance of N th column of LIST-OF-DATA.
The first column is 0 th."
  (setq LIST (mapcar* #'(lambda (x) (nth N x)) LIST-OF-DATA))
  (string-to-number (math-format-number
                     (calcFunc-vpvar
                      (cons 'vec (mapcar* #'(lambda (X) (math-read-number X))
                                          LIST))))))

(defun els-var (N LIST-OF-DATA)
  "Calculate an unbiased variance of N th column of LIST-OF-DATA.
The first column is 0 th."
  (setq LIST (mapcar* #'(lambda (x) (nth N x)) LIST-OF-DATA))
  (string-to-number (math-format-number
                     (calcFunc-vvar
                      (cons 'vec (mapcar* #'(lambda (X) (math-read-number X))
                                          LIST))))))

(defun els-sd (N LIST-OF-DATA)
  "Calculate an unbiased standard deviation of N th column of LIST-OF-DATA.
The first column is 0 th."
  (setq LIST (mapcar* #'(lambda (x) (nth N x)) LIST-OF-DATA))
  (sqrt (string-to-number (math-format-number
                           (calcFunc-vvar
                            (cons 'vec (mapcar* #'(lambda (X) (math-read-number X))
                                                LIST)))))))

(defun els-cor (N1 N2 LIST-OF-DATA)
  "Calculate a correlation of coefficient of N1 and N2 th column of LIST-OF-DATA.
The first column is 0 th."
  (setq LISTX (mapcar* #'(lambda (x) (nth N1 x)) LIST-OF-DATA))
  (setq LISTY (mapcar* #'(lambda (y) (nth N2 y)) LIST-OF-DATA))
  (string-to-number (math-format-number
                     (calcFunc-vcorr
                      (cons 'vec (mapcar* #'(lambda (X) (math-read-number X))
                                          LISTX))
                      (cons 'vec (mapcar* #'(lambda (Y) (math-read-number Y))
                                          LISTY))))))

(defun els-cov (N1 N2 LIST-OF-DATA)
  "Calculate a covariance of N1 and N2 th column of LIST-OF-DATA.
The first column is 0 th."
  (setq LISTX (mapcar* #'(lambda (x) (nth N1 x)) LIST-OF-DATA))
  (setq LISTY (mapcar* #'(lambda (y) (nth N2 y)) LIST-OF-DATA))
  (string-to-number (math-format-number
                     (calcFunc-vcov
                      (cons 'vec (mapcar* #'(lambda (X) (math-read-number X))
                                          LISTX))
                      (cons 'vec (mapcar* #'(lambda (Y) (math-read-number Y))
                                          LISTY))))))

(defun els-round (VALUE N)
  "Rounds the numeric value to the specified number of decimal places."
  (/ (* 1.0 (round (* VALUE (expt 10 N))))
     (expt 10 N)))
els-round

3.6 Mean

LIST-VAR
("" "rating" "complaints" "privileges" "learning" "raises" "critical" "advance")
(els-mean 1 dd)
64.6333333333

3.7 Multiple Means: Programming with Emacs Lisp

I had to struggle for a while to get the same result as in R.

List data structure use here in Emacs Lisp is very simple for the moment.

  • No variable labels.
  • No factor or value labels.
(number-sequence 1 7)
(1 2 3 4 5 6 7)
(mapcar* #'(lambda (N) (els-mean N dd))
         (number-sequence 1 7))
(64.6333333333 66.6 53.1333333333 56.3666666667 64.6333333333 74.7666666667 42.9333333333)
(mapcar* #'(lambda (N) (els-round (els-mean N dd) 2))
         (number-sequence 1 7))
64.63 66.6 53.13 56.37 64.63 74.77 42.93
  • Means with Title of Variable Names
(list
 (cdr LIST-VAR)
 'hline
 (mapcar* #'(lambda (N) (els-round (els-mean N dd) 2))
          (number-sequence 1 7)))
rating complaints privileges learning raises critical advance
64.63 66.6 53.13 56.37 64.63 74.77 42.93

3.8 Multiple Means: Just too easy for R

Ah, everything is so easy in R environment.

t(round(colMeans(attitude), 2))
rating complaints privileges learning raises critical advance
64.63 66.6 53.13 56.37 64.63 74.77 42.93

4 GNUPLOT

'((0 0.1) (0.1 1) (0.5 10)))
0 0.1
0.1 1
0.5 10
(defun transpose (a)
  (apply #'mapcar* #'list a))
transpose
(transpose
 (list
  (second (transpose dd)) 
  (third (transpose dd))))
43 51
63 64
71 70
61 63
81 78
43 55
58 67
71 75
72 82
67 61
64 53
67 60
69 62
68 83
77 77
81 90
74 85
65 60
65 70
50 58
50 40
64 61
53 66
40 37
63 54
66 77
78 75
48 57
85 85
82 82
43 51
63 64
71 70
61 63
81 78
43 55
58 67
71 75
72 82
67 61
64 53
67 60
69 62
68 83
77 77
81 90
74 85
65 60
65 70
50 58
50 40
64 61
53 66
40 37
63 54
66 77
78 75
48 57
85 85
82 82
LIST-VAR
("" "rating" "complaints" "privileges" "learning" "raises" "critical" "advance")
set grid lw 2
plot data w p title "rating and complaints"

emacs-lisp-stat-attitude-gnuplot-01.png

図4: emacs-lisp-stat-attitude-gnuplot-01

5 Conclusion for the Moment

I had to realize that:

  • What on the earth why R is so well built!
  • How easy it is to anlayze data with R.

But ONE MORE THING:

  • Even for a Sunday programmer, How flexible Emacs Lisp is!

著者: sugano

Created: 2016-02-04 Thu 01:56

Validate