Quantitative Data
This chapter covers some basic numerical and graphical summaries of data. Different numerical summaries and graphical displays would be appropriate for different types of data.
- PDF / 366,295 Bytes
- 36 Pages / 439.37 x 666.14 pts Page_size
- 96 Downloads / 195 Views
Quantitative Data
2.1 Introduction This chapter covers some basic numerical and graphical summaries of data. Different numerical summaries and graphical displays would be appropriate for different types of data. A variable may be classified as one of the following types, Quantitative (numeric or integer) Ordinal (ordered, like integers) Qualitative (categorical, nominal, or factor)
and a data frame may contain several variables of possibly different types. There may be some structure to the data, such as in time series data, which has a time index. In this chapter we present examples of selected numerical and graphical summaries for various types of data. Chapter 3 covers summaries of categorical data in more detail. A natural continuation of Chapters 2 and 3 might be Chapter 5, “Exploratory Data Analysis.”
2.2 Bivariate Data: Two Quantitative Variables Our first example is a bivariate data set with two numeric variables, the body and brain size of mammals. We use it to illustrate some basic statistics, graphics, and operations on the data.
J. Albert and M. Rizzo, R by Example, Use R, DOI 10.1007/978-1-4614-1365-3__2, © Springer Science+Business Media, LLC 2012
43
44
2 Quantitative Data
2.2.1 Exploring the data Body and brain size of mammals There are many data sets included with the R distribution. A list of the available data sets can be displayed with the data() command. MASS [50] is one of the recommended packages that is bundled with the base R package, so it should already be installed with R. To use the data sets or functions in MASS one first loads MASS by the command > library(MASS) > data()
#load the package #display available datasets
After the MASS package is loaded, the data sets in MASS will be included in the list of available datasets generated by the data() command. Example 2.1 (mammals). In the result of the data() command, under the heading “Data sets in package MASS:” there is a data set named mammals. The command > ?mammals
displays information about the mammals data. This data contains brain size and body size for 62 mammals. Typing the name of the data set causes the data to be printed at the console. It is rather long, so here we just display the first few observations using head. > head(mammals) body brain Arctic fox 3.385 44.5 Owl monkey 0.480 15.5 Mountain beaver 1.350 8.1 Cow 465.000 423.0 Grey wolf 36.330 119.5 Goat 27.660 115.0
This data consists of two numeric variables, body and brain.
Rx
2.1 In the display above it is not obvious whether mammals is a matrix or a data frame. One way to check whether we have a matrix or a data frame is:
> is.matrix(mammals) [1] FALSE > is.data.frame(mammals) [1] TRUE
One could convert mammals to a matrix by as.matrix(mammals) if a matrix would be required in an analysis.
2.2 Bivariate Data: Two Quantitative Variables
45
Some basic statistics and plots The summary method computes a five number summary and mean for each numeric variable in the data frame. > summary(mammals) body Min. : 0.005 1st Qu.: 0.600 Median : 3.342 Mean : 198.79
Data Loading...