Package 'YaleToolkit' reference manual

Package 'YaleToolkit'

Title:	Data Exploration Tools from Yale University
Description:	This collection of data exploration tools was developed at Yale University for the graphical exploration of complex multivariate data; barcode and gpairs now have their own packages. The big.read.table() function provided here may be useful for large files when only a subset is needed (but please see the note in the help page for this function).
Authors:	John W. Emerson and Walton A. Green
Maintainer:	John W. Emerson <[email protected]>
License:	LGPL-3
Version:	4.2.3
Built:	2025-02-20 04:59:32 UTC
Source:	https://github.com/cran/YaleToolkit

Title:

Data Exploration Tools from Yale University

Description:

This collection of data exploration tools was developed at Yale University for the graphical exploration of complex multivariate data; barcode and gpairs now have their own packages. The big.read.table() function provided here may be useful for large files when only a subset is needed (but please see the note in the help page for this function).

Authors:

John W. Emerson and Walton A. Green

Maintainer:

John W. Emerson <[email protected]>

License:

LGPL-3

Version:

4.2.3

Built:

2025-02-20 04:59:32 UTC

Source:

https://github.com/cran/YaleToolkit

Help Index

Read in chunks from a large file with row/column filtering to obtain a reasonable-sized data.frame.

Description

Read in chunks from a large file with row/column filtering to obtain a reasonable-sized data.frame.

Usage

big.read.table(
  file,
  nrows = 1e+05,
  sep = ",",
  header = TRUE,
  row.names = NULL,
  cols = NULL,
  rowfilter = NULL,
  as.is = TRUE,
  estimate = FALSE
)
big.read.table(
  file,
  nrows = 1e+05,
  sep = ",",
  header = TRUE,
  row.names = NULL,
  cols = NULL,
  rowfilter = NULL,
  as.is = TRUE,
  estimate = FALSE
)

Arguments

`file`	the name of the file, obviously
`nrows`	the chunk size; consider reducing this if there are lots of columns
`sep`	by default we expect a CSV file
`header`	is `TRUE` by default
`row.names`	I really dislike row names
`cols`	for filtering column by name or number (supporting negative indexing)
`rowfilter`	a function that is assumed to take a chunk as a data frame and return a smaller data frame (with fewer rows), separately from the column filtering.
`as.is`	`TRUE` by default
`estimate`	do a preliminary estimation of the work to be done, and then have a chance to bail out if it looks like a bad idea

Note

This is very much 'in development' and could be buggy. I put it here as I used some example in one of my courses, but then I needed to update the package to keep CRAN happy. So here it is. Buyer Beware. - Jay

Examples

data(CO2)
write.csv(CO2, "CO2.csv", row.names=FALSE)
x <- big.read.table("CO2.csv", nrows=10)
unlink("CO2.csv")
head(x)
data(CO2)
write.csv(CO2, "CO2.csv", row.names=FALSE)
x <- big.read.table("CO2.csv", nrows=10)
unlink("CO2.csv")
head(x)

Get the number of rows of the file

Description

Use iterators to avoid the memory overhead of obtaining the number of rows of a file.

Usage

getnrows(file, n = 10000)
getnrows(file, n = 10000)

Arguments

`file`	the name of a file (possible with a path)
`n`	the size of the chunks used by the iterator

Value

an integer

Examples

data(CO2)
write.csv(CO2, "CO2.csv", row.names=FALSE)
getnrows("CO2.csv")
unlink("CO2.csv")
data(CO2)
write.csv(CO2, "CO2.csv", row.names=FALSE)
getnrows("CO2.csv")
unlink("CO2.csv")

Pressure and High Cloud Cover Spatially Distributed Time Series

Description

Six years of monthly pressure and high cloud cover measurements over a regular grid of the Americas, from NASA's poster competition at the 2006 Joint Statistical Meeting (JSM).

Usage

data(nasa)data(nasa)

Format

This NASA data set is stored as a list of 3 components: data (containing the pressure and high cloud cover measurements), elev (the elevation data), and coast (the coastline data). To see the structure, type str(nasa), and see Details and Source for more information, below.

Details

The data are a subset of some geographic and atmospheric measurements on a coarse 24 by 24 grid covering Central America. The variables included are elevation, air pressure, and high cloud cover. With the exception of elevation, the variables are monthly averages, with observations for Jan., 1995 to Dec., 2000. These data were obtained from the NASA Langley Research Center Atmospheric Sciences Data Center.

Source

NASA Langley Research Center Atmospheric Sciences Data Center, with permission. The JSM poster competition was announced at:

http://www.amstat-online.org/sections/graphics/dataexpo/2006.php

Examples

  # See sparkmat().
# See sparkmat().

Draws a sparkline

Description

Draws a times series or ‘sparkline’ in a compact iconic fashion suitable for inclusion in more complex graphics or text.

Usage

sparkline(s, times = NULL, ylim = NULL, buffer = unit(0, "lines"),
          margins = NULL, IQR = NULL, yaxis = FALSE, xaxis = FALSE,
          ptopts = list(points = NULL, labels = NULL, labels.ch = NULL,
          gp = NULL, just = NULL, pch = NULL), margin.pars = NULL,
          buffer.pars = NULL, frame.pars = NULL, line.pars = gpar(lwd = 1),
          main = NULL, sub = NULL, xlab = NULL, ylab = NULL, new = TRUE)
sparkline(s, times = NULL, ylim = NULL, buffer = unit(0, "lines"),
          margins = NULL, IQR = NULL, yaxis = FALSE, xaxis = FALSE,
          ptopts = list(points = NULL, labels = NULL, labels.ch = NULL,
          gp = NULL, just = NULL, pch = NULL), margin.pars = NULL,
          buffer.pars = NULL, frame.pars = NULL, line.pars = gpar(lwd = 1),
          main = NULL, sub = NULL, xlab = NULL, ylab = NULL, new = TRUE)

Arguments

`s`	a vector or time series (class `"ts"` or `"zoo"`) giving the data to be plotted. If `s` is a time series, the `start`, `end`, and `frequency` found in `attributes(s)$tsp` are automatically converted into an argument to `times`.
`times`	the times at which to plot the data; if `NULL` (the default), equal spacing is assumed, equivalent to setting `times = 1:length(s)`.
`ylim`	the maximum and minimum value on the y-axis; if `NULL`, defaults to the actual maximum and minimum of the data.
`buffer`	a buffer above the maximum and below the minimum values attained by the sparkline. Defaults to `unit(0, 'lines')`.
`margins`	margins around the sparkline-plus-buffer area. NULL (the default) provides no margins; the value passed must be a 4-vector of units giving the bottom, left, top and right margins in that order.
`IQR`	a list of graphics parameters to shade or otherwise delineate the interquartile range of the sparkline. `NULL` (the default), does not show the IQR. See `Details` for more information.
`yaxis`	draws a vertical axis if `TRUE`; defaults to `FALSE` in which case no axis is drawn.
`xaxis`	`'interior'` draws a horizontal axis inside the plotting frame; `'exterior'` outside the plotting frame (in the margins); defaults to `FALSE`, in which case no axis is drawn.
`ptopts`	a list of graphics parameters describing the points on the sparkline that are plotted and labelled. In particular the first and last or minimum and maximum points are labeled if `ptopts$labels` is `'first.last'` or `'min.max'`. In addition to `labels`, other relevant parameters from `gpar` should be valid. See `Details` for more information.
`margin.pars`	a list of graphics parameters describing the margin area. See `Details` for more information.
`buffer.pars`	a list of graphics parameters describing the buffer area. See `Details` for more information.
`frame.pars`	a list of graphics parameters describing the exact area taken up by the plotted sparkline. See `Details` for more information.
`line.pars`	a list of graphics parameters describing the sparkline. See `Details` for more information.
`main`	a main title, above the sparkline.
`sub`	a subtitle, to the right of the sparkline.
`xlab`	a string to label the x-axis.
`ylab`	a string to label the y-axis.
`new`	defaults to `TRUE`, which creates a new, empty page; otherwise adds the sparkline to an existing plot.

Details

In all the cases where a list of graphics parameters is needed, the valid parameter names are the same as would be valid when passed to gpar in the appropriate call. That is, passing list(fill = 'blue', col = 'red') to margin gives a margin that is blue with a red border; but adding fontface = 'bold' will have no effect, just as it would have no effect in a call to grid.rect(). In particular, note that ptopts takes the following non-standard parameters: labels, a vector indexing the points to label or the string 'min.max' or 'first.last'; labels.ch, a vector of strings giving the labels; and points, a vector indexing the points at which points should be plotted. Passing 'min.max' or 'first.last' to ptopts$labels overrides any values of ptopts$labels.ch.

Note

This is primarily intended to be called by other functions (sparklines() and sparkmat()), but it can also be used as an alternative to ts.plot(). Thanks to Gabor Grothendieck for suggesting the generalization that provides support of "zoo" objects.

Author(s)

John W. Emerson, Walton Green

References

Tufte, E. R. (2006) /it Beautiful Evidence Cheshire, Connecticut: Graphics Press.

Examples


### sparkline examples
data(nhtemp)

## The default behaviour of sparkline

sparkline(nhtemp)

## Creating stand-alone plots

sparkline(rnorm(10),
                buffer = unit(1, "lines"),
                ptopts = 'first.last',
                margins = unit(c(1,1,1,1), 'inches'),
                yaxis = TRUE, xaxis=TRUE,
                IQR = gpar(fill = 'grey', col = 'grey'),
                main = "Ten Random Standard Normal Numbers",
                sub = '...plotted here')

data(YaleEnergy)
y <- YaleEnergy[YaleEnergy$name==YaleEnergy$name[2],]
sparkline(y$ELSQFT, times=y$year+y$month/12,
          xaxis=TRUE, yaxis=TRUE, main="Branford College Electrical Consumption",
          buffer=unit(1, "lines"), margins = unit(c(1, 1, 1, 1), 'inches'))

sparkline(Nile,
                buffer = unit(1, "lines"),
                ptopts = list(labels = 'min.max'),
                margin.pars = gpar(fill = 'lightblue'),
                buffer.pars = gpar(fill = 'lightgreen'),
                frame.pars = gpar(fill = 'lightyellow'),
                yaxis = TRUE, xaxis=TRUE,
                IQR = gpar(fill = 'grey', col = 'grey'),
                main="Nile Discharge between 1871 and 1970",
                sub='In what units?')

## Adding a sparkline to an existing plot

grid.newpage()
pushViewport(viewport(w = 0.8, h = 0.8))
sparkline(rnorm(10),
                buffer = unit(1, "lines"),
                margins = unit(c(4,4,4,4),'points'),
                ptopts = list(labels = 'min.max'),
                margin.pars = gpar(fill = 'lightblue'),
                buffer.pars = gpar(fill = 'lightgreen'),
                frame.pars = gpar(fill = 'lightyellow'),
                yaxis = TRUE, xaxis=TRUE,
                IQR = gpar(fill = 'grey', col = 'grey'),
                main="Title (plotted OUTSIDE the viewport)", new = FALSE)
popViewport()

### sparkline examples
data(nhtemp)

## The default behaviour of sparkline

sparkline(nhtemp)

## Creating stand-alone plots

sparkline(rnorm(10),
                buffer = unit(1, "lines"),
                ptopts = 'first.last',
                margins = unit(c(1,1,1,1), 'inches'),
                yaxis = TRUE, xaxis=TRUE,
                IQR = gpar(fill = 'grey', col = 'grey'),
                main = "Ten Random Standard Normal Numbers",
                sub = '...plotted here')

data(YaleEnergy)
y <- YaleEnergy[YaleEnergy$name==YaleEnergy$name[2],]
sparkline(y$ELSQFT, times=y$year+y$month/12,
          xaxis=TRUE, yaxis=TRUE, main="Branford College Electrical Consumption",
          buffer=unit(1, "lines"), margins = unit(c(1, 1, 1, 1), 'inches'))

sparkline(Nile,
                buffer = unit(1, "lines"),
                ptopts = list(labels = 'min.max'),
                margin.pars = gpar(fill = 'lightblue'),
                buffer.pars = gpar(fill = 'lightgreen'),
                frame.pars = gpar(fill = 'lightyellow'),
                yaxis = TRUE, xaxis=TRUE,
                IQR = gpar(fill = 'grey', col = 'grey'),
                main="Nile Discharge between 1871 and 1970",
                sub='In what units?')

## Adding a sparkline to an existing plot

grid.newpage()
pushViewport(viewport(w = 0.8, h = 0.8))
sparkline(rnorm(10),
                buffer = unit(1, "lines"),
                margins = unit(c(4,4,4,4),'points'),
                ptopts = list(labels = 'min.max'),
                margin.pars = gpar(fill = 'lightblue'),
                buffer.pars = gpar(fill = 'lightgreen'),
                frame.pars = gpar(fill = 'lightyellow'),
                yaxis = TRUE, xaxis=TRUE,
                IQR = gpar(fill = 'grey', col = 'grey'),
                main="Title (plotted OUTSIDE the viewport)", new = FALSE)
popViewport()

Draws a panel of vertically stacked sparklines

Description

Draws a panel of vertically stacked, aligned sparklines, or time series.

Usage

sparklines(ss, times = NULL, overlap = FALSE, yscale = NULL,
           buffer = unit(0, "lines"), buffer.pars = NULL, IQR = NULL,
           ptopts = NULL, yaxis = TRUE, xaxis = "exterior",
           labeled.points = NULL, point.labels = NULL,
           label.just = c(1.2, 0.5), frame.pars = NULL,
           line.pars = gpar(lwd = 1), 
           outer.margin = unit(c(5, 4, 4, 2), "lines"), 
           outer.margin.pars = NULL, main = NULL, sub = NULL, 
           xlab = NULL, ylab = NULL, lcol = NULL, new = TRUE)
sparklines(ss, times = NULL, overlap = FALSE, yscale = NULL,
           buffer = unit(0, "lines"), buffer.pars = NULL, IQR = NULL,
           ptopts = NULL, yaxis = TRUE, xaxis = "exterior",
           labeled.points = NULL, point.labels = NULL,
           label.just = c(1.2, 0.5), frame.pars = NULL,
           line.pars = gpar(lwd = 1), 
           outer.margin = unit(c(5, 4, 4, 2), "lines"), 
           outer.margin.pars = NULL, main = NULL, sub = NULL, 
           xlab = NULL, ylab = NULL, lcol = NULL, new = TRUE)

Arguments

`ss`	a data frame whose columns give the time series to be plotted
`overlap`	`FALSE` for stacked sparklines; `TRUE` for all plotted on the same y-axis.
`times`	the times at which to plot the data; if `NULL` (the default), equal spacing is assumed. All the sparklines must share the same `times` argument. If unaligned time series must be plotted, multiple calls to `sparklines()` are required.
`yscale`	either a vector of length 2 giving the y-limits for all sparklines, or a list having the same length as the number of columns in `ss` (each component of which is a 2-vector giving the associated sparkline scales). Defaults to `NULL`, in which case the scales for each sparkline are set to the sparkline's minimum and maximum values.
`buffer`	a buffer above the maximum and below the minimum values attained by the sparkline. Defaults to `unit(0, 'lines')`.
`buffer.pars`	a list of graphics parameters describing the buffer area. See `Details` for more information.
`IQR`	a list of graphics parameters to shade or otherwise delineate the interquartile range of the sparkline. Defaults to `NULL`, in which case the IQR is not shown. See `Details` for more information.
`ptopts`	a list of graphics parameters describing the points on the sparkline that are plotted and labelled. In particular the first and last or minimum and maximum points are labeled if `ptopts\$labels` is `'first.last'` or `'min.max'`.
`yaxis`	draws a vertical axis if `TRUE`; defaults to `FALSE`, in which case no axis is drawn.
`xaxis`	`'interior'` draws horizontal axes inside the plotting frame (for each sparkline); `'exterior'` draws the common axis for all the sparklines outside the plotting frame; defaults to `FALSE` (no axis).
`labeled.points`	not implemented. See `ptopts`.
`point.labels`	not implemented. See `ptopts`.
`label.just`	not implemented. See `ptopts`.
`frame.pars`	a list of graphics parameters describing the exact area taken up by the plotted sparkline. See `Details` for more information.
`line.pars`	a list of graphics parameters describing the sparkline. See `Details` for more information.
`outer.margin`	a vector of 4 units (bottom, left, top, right) giving the outer margin sizes in order (around the entire panel of sparklines). Defaults to `unit(c(0,0,0,0), 'lines')`.
`outer.margin.pars`	a list of graphics parameters describing the outer margin. See `Details` for more information.
`main`	a main title, above the stack of sparklines.
`sub`	a character vector the length of `length(ss)` providing titles for the individual sparklines, printed to the right of the sparklines.
`xlab`	a string providing the label for the common x-axis or (probably a useless feature) a character vector the length of `length(ss)` providing x-axis labels for the individual sparklines.
`ylab`	a character vector the length of `length(ss)` providing y-axis labels for the individual sparklines.
`lcol`	a vector of colors the same length as the number of columns in ss to color the line. As in base graphics, can be either a vector of strings giving the color names, a numeric vector referring to the current pallette, or the output of functions like `hsv` or `rgb`
`new`	defaults to `TRUE`, which creates a new, empty page; otherwise adds the sparkline to the existing plot.

Details

Note

We do not support non-aligned time series plots such as ts.plot(airmiles, Nile, nhtemp).

Author(s)

John W. Emerson, Walton Green

References

Tufte, E. R. (2006) Beautiful Evidence Cheshire, Connecticut: Graphics Press.

Examples


### sparkline examples
data(beaver1)

## The default behaviour of sparklines
sparklines(beaver1)

sparklines(beaver1,
           outer.margin = unit(c(2,4,4,5), 'lines'),
           outer.margin.pars = gpar(fill = 'lightblue'),
           buffer = unit(1, "lines"),
           frame.pars = gpar(fill = 'lightyellow'),
           buffer.pars = gpar(fill = 'lightgreen'),
           yaxis = TRUE, xaxis=FALSE,
           IQR = gpar(fill = 'grey', col = 'grey'),
           main = 'Beaver 1')

data(YaleEnergy)
y <- YaleEnergy[YaleEnergy$name==YaleEnergy$name[2],]
sparklines(y[,c("ELSQFT", "STEAM")], times=y$year+y$month/12,
           main="Branford Electric and Steam Consumption")

## Adding a pair of sparklines to an existing plot

grid.newpage()
pushViewport(viewport(w = 0.8, h = 0.8))
sparklines(data.frame(x = rnorm(10), y = rnorm(10, mean=5)), new = FALSE)
popViewport()

grid.newpage()
pushViewport(viewport(w = 0.8, h = 0.8))
sparklines(data.frame(x = rnorm(10), y = rnorm(10, mean=2)),
                buffer = unit(1, "lines"),
                frame.pars = gpar(fill = 'lightyellow'),
                yaxis = TRUE, xaxis=FALSE,
                IQR = gpar(fill = 'grey', col = 'grey'), new = FALSE)
popViewport()

### sparkline examples
data(beaver1)

## The default behaviour of sparklines
sparklines(beaver1)

sparklines(beaver1,
           outer.margin = unit(c(2,4,4,5), 'lines'),
           outer.margin.pars = gpar(fill = 'lightblue'),
           buffer = unit(1, "lines"),
           frame.pars = gpar(fill = 'lightyellow'),
           buffer.pars = gpar(fill = 'lightgreen'),
           yaxis = TRUE, xaxis=FALSE,
           IQR = gpar(fill = 'grey', col = 'grey'),
           main = 'Beaver 1')

data(YaleEnergy)
y <- YaleEnergy[YaleEnergy$name==YaleEnergy$name[2],]
sparklines(y[,c("ELSQFT", "STEAM")], times=y$year+y$month/12,
           main="Branford Electric and Steam Consumption")

## Adding a pair of sparklines to an existing plot

grid.newpage()
pushViewport(viewport(w = 0.8, h = 0.8))
sparklines(data.frame(x = rnorm(10), y = rnorm(10, mean=5)), new = FALSE)
popViewport()

grid.newpage()
pushViewport(viewport(w = 0.8, h = 0.8))
sparklines(data.frame(x = rnorm(10), y = rnorm(10, mean=2)),
                buffer = unit(1, "lines"),
                frame.pars = gpar(fill = 'lightyellow'),
                yaxis = TRUE, xaxis=FALSE,
                IQR = gpar(fill = 'grey', col = 'grey'), new = FALSE)
popViewport()

Draws a sparkmat

Description

Draws multiple time series (or sparklines) at given locations.

Usage

sparkmat(x, locs = NULL, w = NULL, h = NULL, lcol = NULL,
         yscales = NULL, tile.shading = NULL, 
         tile.margin = unit(c(0, 0, 0, 0), "points"),
         tile.pars = NULL, just = c("right", "top"),
         new = TRUE, ...)
sparkmat(x, locs = NULL, w = NULL, h = NULL, lcol = NULL,
         yscales = NULL, tile.shading = NULL, 
         tile.margin = unit(c(0, 0, 0, 0), "points"),
         tile.pars = NULL, just = c("right", "top"),
         new = TRUE, ...)

Arguments

`x`	a list of data frames, all with the same dimensions, one for each panel of vertically aligned sparklines.
`locs`	a data frame with x-coordinates in the first variable and y-coordinates in the second variable, giving locations of each of the `length(x)` sparkline panels.
`w`	vector of unit widths (or native widths if not specified as units).
`h`	vector of unit heights (or native heights if not specified as units).
`lcol`	vector of `ncol(x[[1]])` line colors, one for each sparkline in each panel.
`yscales`	either a vector of length 2 giving the y-limits for all sparklines, or a list having the same length as the number of columns in `ss` (each component of which is a 2-vector giving scales for the individual sparklines). Defaults to `NULL`, in which case the scales for each sparkline are set to its minimum and maximum value within the panel.
`tile.shading`	vector of background shadings for the panels.
`tile.margin`	an outer margin around each tile (panel of sparklines). A 4-vector of units giving the bottom, left, top and right margins; defaults to `unit(c(0,0,0,0), 'points')`.
`tile.pars`	a list of graphics parameters describing the buffer area. See `Details` for more information.
`just`	default is `c("right", "top")`; controls the justification of the sparklines relative to the provided location coordinates.
`new`	defaults to `TRUE`, which creates a new, empty page; otherwise adds the sparkline to the existing plot.
`...`	for arguments to be passed through to `sparklines()`.

Details

In all the cases where a list of graphics parameters is needed, the valid parameter names are the same as would be valid when passed to gpar in the appropriate call. That is, passing list(fill = 'blue', col = 'red') to margin gives a margin that is blue with a red border; but adding fontface = 'bold' will have no effect, just as it would have no effect in a call to grid.rect().

Author(s)

John W. Emerson, Walton Green

References

Tufte, E. R. (2006) Beautiful Evidence Cheshire, Connecticut: Graphics Press.

Examples


# An example with a time series of energy consumption at Yale colleges.
data(YaleEnergy)
y <- YaleEnergy

# Need list of 12 data frames, each with one time series.

z <- list(data.frame(y[y$name==y$name[1],"ELSQFT"]),
          data.frame(y[y$name==y$name[2],"ELSQFT"]),
          data.frame(y[y$name==y$name[3],"ELSQFT"]),
          data.frame(y[y$name==y$name[4],"ELSQFT"]),
          data.frame(y[y$name==y$name[5],"ELSQFT"]),
          data.frame(y[y$name==y$name[6],"ELSQFT"]),
          data.frame(y[y$name==y$name[7],"ELSQFT"]),
          data.frame(y[y$name==y$name[8],"ELSQFT"]),
          data.frame(y[y$name==y$name[9],"ELSQFT"]),
          data.frame(y[y$name==y$name[10],"ELSQFT"]),
          data.frame(y[y$name==y$name[11],"ELSQFT"]),
          data.frame(y[y$name==y$name[12],"ELSQFT"]))

sparkmat(z, locs=data.frame(y$lon, y$lat), new=TRUE,
         w=0.002, h=0.0002, just=c("left", "top"))
grid.text(y[1:12,1], unit(y$lon[1:12]+0.001, "native"),
          unit(y$lat[1:12]+0.00003, "native"),
          just=c("center", "bottom"), gp=gpar(cex=0.7))
grid.text("Degrees Longitude", 0.5, unit(-2.5, "lines"))
grid.text("Degrees Latitude", unit(-4.5, "lines"), 0.5, rot=90)
grid.text("Monthly Electrical Consumption (KwH/SqFt)",
          0.5, 0.82, gp=gpar(cex=1, font=2))
grid.text("of Yale Residential Colleges",
          0.5, 0.77, gp=gpar(cex=1, font=2))
grid.text("July 1999 - July 2006",
          0.5, 0.72, gp=gpar(cex=1, font=2))

# An example with pressure and high cloud cover over a regular grid of the
# Americas, provided by NASA ().

runexample <- FALSE
if (runexample) {

data(nasa)

grid.newpage()
pushViewport(viewport(w = unit(1, "npc")-unit(2, "inches"),
                      h = unit(1, "npc")-unit(2, "inches")))
v <- viewport(xscale = c(-115, -55),
              yscale = c(-22.5, 37.5))
pushViewport(v)

y <- vector(mode="list", length=24*24)
locs <- as.data.frame(matrix(0, 24*24, 2))
tile.shading <- rep(0, 24*24)
for(i in 1:24) {     # Latitudes
  for(j in 1:24) {   # Longitudes
    y[[(i-1)*24+j]] <- as.data.frame(t(nasa$data[,,i,j]))
    locs[(i-1)*24+j,] <- c(as.numeric(dimnames(nasa$data)$lon[j]),
                           as.numeric(dimnames(nasa$data)$lat[i]))
    tile.shading[(i-1)*24+j] <- gray( 1-.5*(nasa$elev[i,j]/max(nasa$elev)) )
  }
}

yscales <- list(quantile(nasa$data["pressure",,,], c(0.01, 0.99), na.rm=TRUE),
                quantile(nasa$data["cloudhigh",,,], c(0.01, 0.99), na.rm=TRUE))

sparkmat(y, locs=locs, just='center', w=2.5, h=2.5,
         tile.shading=tile.shading, lcol=c(6,3), yscales=yscales,
         tile.margin = unit(c(2,2,2,2), 'points'), new=FALSE)

grid.xaxis(gp=gpar(fontface=2, fontsize=14))
grid.yaxis(gp=gpar(fontface=2, fontsize=14))
grid.rect()

grid.text("Degrees Latitude", x=unit(-0.75, "inches"), y=0.5, rot=90,
          gp=gpar(fontface=2, fontsize=14))
grid.text("Degrees Longitude", x=0.5, y=unit(-0.75, "inches"), rot=0,
          gp=gpar(fontface=2, fontsize=14))
grid.text("Grayscale shading reflects",
          x=unit(1, "npc")+unit(0.6, "inches"), y=0.5, rot=270,
          gp=gpar(fontface=2, fontsize=14))
grid.text("average elevation above sea level",
          x=unit(1, "npc")+unit(0.3, "inches"), y=0.5, rot=270,
          gp=gpar(fontface=2, fontsize=14))

grid.lines(nasa$coast[,1], nasa$coast[,2], default.units = 'native',
           gp = gpar(col = 'black', lwd = 1))

grid.text("Pressure",
          x=0.25, y=unit(1, "npc")+unit(1.25, "lines"),
          gp=gpar(fontface=2, fontsize=14))
grid.rect(x=0.25, y=unit(1, "npc") + unit(0.5, "lines"),
          width=0.4, height=unit(0.05, "inches"), gp=gpar(col=6, fill=6))
grid.text("High Cloud",
          x=0.75, y=unit(1, "npc")+unit(1.25, "lines"),
          gp=gpar(fontface=2, fontsize=14))
grid.rect(x=0.75, y=unit(1, "npc") + unit(0.5, "lines"),
          width=0.4, height=unit(0.05, "inches"), gp=gpar(col=3, fill=3))
}
# An example with a time series of energy consumption at Yale colleges.
data(YaleEnergy)
y <- YaleEnergy

# Need list of 12 data frames, each with one time series.

z <- list(data.frame(y[y$name==y$name[1],"ELSQFT"]),
          data.frame(y[y$name==y$name[2],"ELSQFT"]),
          data.frame(y[y$name==y$name[3],"ELSQFT"]),
          data.frame(y[y$name==y$name[4],"ELSQFT"]),
          data.frame(y[y$name==y$name[5],"ELSQFT"]),
          data.frame(y[y$name==y$name[6],"ELSQFT"]),
          data.frame(y[y$name==y$name[7],"ELSQFT"]),
          data.frame(y[y$name==y$name[8],"ELSQFT"]),
          data.frame(y[y$name==y$name[9],"ELSQFT"]),
          data.frame(y[y$name==y$name[10],"ELSQFT"]),
          data.frame(y[y$name==y$name[11],"ELSQFT"]),
          data.frame(y[y$name==y$name[12],"ELSQFT"]))

sparkmat(z, locs=data.frame(y$lon, y$lat), new=TRUE,
         w=0.002, h=0.0002, just=c("left", "top"))
grid.text(y[1:12,1], unit(y$lon[1:12]+0.001, "native"),
          unit(y$lat[1:12]+0.00003, "native"),
          just=c("center", "bottom"), gp=gpar(cex=0.7))
grid.text("Degrees Longitude", 0.5, unit(-2.5, "lines"))
grid.text("Degrees Latitude", unit(-4.5, "lines"), 0.5, rot=90)
grid.text("Monthly Electrical Consumption (KwH/SqFt)",
          0.5, 0.82, gp=gpar(cex=1, font=2))
grid.text("of Yale Residential Colleges",
          0.5, 0.77, gp=gpar(cex=1, font=2))
grid.text("July 1999 - July 2006",
          0.5, 0.72, gp=gpar(cex=1, font=2))

# An example with pressure and high cloud cover over a regular grid of the
# Americas, provided by NASA ().

runexample <- FALSE
if (runexample) {

data(nasa)

grid.newpage()
pushViewport(viewport(w = unit(1, "npc")-unit(2, "inches"),
                      h = unit(1, "npc")-unit(2, "inches")))
v <- viewport(xscale = c(-115, -55),
              yscale = c(-22.5, 37.5))
pushViewport(v)

y <- vector(mode="list", length=24*24)
locs <- as.data.frame(matrix(0, 24*24, 2))
tile.shading <- rep(0, 24*24)
for(i in 1:24) {     # Latitudes
  for(j in 1:24) {   # Longitudes
    y[[(i-1)*24+j]] <- as.data.frame(t(nasa$data[,,i,j]))
    locs[(i-1)*24+j,] <- c(as.numeric(dimnames(nasa$data)$lon[j]),
                           as.numeric(dimnames(nasa$data)$lat[i]))
    tile.shading[(i-1)*24+j] <- gray( 1-.5*(nasa$elev[i,j]/max(nasa$elev)) )
  }
}

yscales <- list(quantile(nasa$data["pressure",,,], c(0.01, 0.99), na.rm=TRUE),
                quantile(nasa$data["cloudhigh",,,], c(0.01, 0.99), na.rm=TRUE))

sparkmat(y, locs=locs, just='center', w=2.5, h=2.5,
         tile.shading=tile.shading, lcol=c(6,3), yscales=yscales,
         tile.margin = unit(c(2,2,2,2), 'points'), new=FALSE)

grid.xaxis(gp=gpar(fontface=2, fontsize=14))
grid.yaxis(gp=gpar(fontface=2, fontsize=14))
grid.rect()

grid.text("Degrees Latitude", x=unit(-0.75, "inches"), y=0.5, rot=90,
          gp=gpar(fontface=2, fontsize=14))
grid.text("Degrees Longitude", x=0.5, y=unit(-0.75, "inches"), rot=0,
          gp=gpar(fontface=2, fontsize=14))
grid.text("Grayscale shading reflects",
          x=unit(1, "npc")+unit(0.6, "inches"), y=0.5, rot=270,
          gp=gpar(fontface=2, fontsize=14))
grid.text("average elevation above sea level",
          x=unit(1, "npc")+unit(0.3, "inches"), y=0.5, rot=270,
          gp=gpar(fontface=2, fontsize=14))

grid.lines(nasa$coast[,1], nasa$coast[,2], default.units = 'native',
           gp = gpar(col = 'black', lwd = 1))

grid.text("Pressure",
          x=0.25, y=unit(1, "npc")+unit(1.25, "lines"),
          gp=gpar(fontface=2, fontsize=14))
grid.rect(x=0.25, y=unit(1, "npc") + unit(0.5, "lines"),
          width=0.4, height=unit(0.05, "inches"), gp=gpar(col=6, fill=6))
grid.text("High Cloud",
          x=0.75, y=unit(1, "npc")+unit(1.25, "lines"),
          gp=gpar(fontface=2, fontsize=14))
grid.rect(x=0.75, y=unit(1, "npc") + unit(0.5, "lines"),
          width=0.4, height=unit(0.05, "inches"), gp=gpar(col=3, fill=3))
}

Data frame summary

Description

Summarize the characteristics of variables (columns) in a data frame.

Usage

whatis(x, var.name.truncate = 20, type.truncate = 14)
whatis(x, var.name.truncate = 20, type.truncate = 14)

Arguments

`x`	a data frame
`var.name.truncate`	maximum length (in characters) for truncation of variable names. The default is 20; anything less than 12 is less than the column label in the resulting data frame and is a waste of information.
`type.truncate`	maximum length (in characters) for truncation of variable type; `14` is the full width, but `4` works well if space is at a premium.

Details

The function whatis() provides a basic examination of some characteristics of each variable (column) in a data frame.

Value

A list of characteristics describing the variables in the data frame, x. Each component of the list has length(x) values, one for each variable in the data frame x.

variable.name: from the names(x) attribute, possibly truncated to var.name.truncate characters in length.
type: the possibilities include "pure factor", "mixed factor", "ordered factor", "character", and "numeric"; whatis() considers the possibility that a factor or a vector could contain character and/or numeric values. If both character and numeric values are present, and if the variable is a factor, then it is called a mixed factor. If the levels of a factor are purely character or numeric (but not both), it is a pure factor. Non-factors must then be either character or numeric.
missing: the number of NAs in the variable.
distinct.values: the number of distinct values in the variable, equal to length(table(variable)).
precision: the number of decimal places of precision.
min: the minumum value (if numeric) or first value (alphabetically) as appropriate.
max: the maximum value (if numeric) or the last value (alphabetically) as appropriate.

Author(s)

John W. Emerson, Walton Green

References

Special thanks to John Hartigan and the students of 'Statistical Case Studies' of 2004 for their help troubleshooting and developing the function whatis().

Examples

mydf <- data.frame(a=rnorm(100),
                   b=sample(c("Cat", "Dog"), 100, replace=TRUE), 
                   c=sample(c("Apple", "Orange", "8"), 100, replace=TRUE),
                   d=sample(c("Blue", "Red"), 100, replace=TRUE))
mydf$d <- as.character(mydf$d)
whatis(mydf)

data(iris)
whatis(iris)
mydf <- data.frame(a=rnorm(100),
                   b=sample(c("Cat", "Dog"), 100, replace=TRUE), 
                   c=sample(c("Apple", "Orange", "8"), 100, replace=TRUE),
                   d=sample(c("Blue", "Red"), 100, replace=TRUE))
mydf$d <- as.character(mydf$d)
whatis(mydf)

data(iris)
whatis(iris)

Monthly energy consumption of Yale residential colleges.

Description

The data set contains monthly energy time series for Yale residential college, from July 1999 through July 2006

Usage

data(YaleEnergy)data(YaleEnergy)

Format

A data frame with 1020 observations on the following 18 variables.

name: a factor with levels BERKELEY BRANFORD CALHOUN DAVENPORT EZRA STILES JONATHAN EDWARDS MORSE PIERSON SAYBROOK SILLIMAN TIMOTHY DWIGHT TRUMBULL
address: a factor with levels 189 ELM ST. 205 ELM ST. 241 ELM ST. 242 ELM ST. 248 YORK ST. 261 PARK ST. 302 YORK ST. 345 TEMPLE ST. 505 COLLEGE ST. 70 HIGH ST. 74 HIGH ST.
gsf: gross square footage of the college
EL: electrical consumption in kilowatt hours
ELSQFT: electrical consumption per square foot
CHW: chilled water consumption in tons
SQFTCHW: square feet per ton of chilled water
STEAM: steam consumption in pounds
STEAMSQFT: steam per square foot
MBTU: million British Thermal Units (BTU) from chilled water and steam
MBTUSQFT: million BTUs per square foot
year: year of the record
month: month of the record
lon: degrees longitude of the college
lat: degrees latitude

Source

John W. Emerson, Yale University

Examples

data(YaleEnergy)
whatis(YaleEnergy)

y <- YaleEnergy             # This is just for convenience.
esqft <- list(data.frame(y[y$name==y$name[1],"ELSQFT"]),
              data.frame(y[y$name==y$name[2],"ELSQFT"]),
              data.frame(y[y$name==y$name[3],"ELSQFT"]),
              data.frame(y[y$name==y$name[4],"ELSQFT"]),
              data.frame(y[y$name==y$name[5],"ELSQFT"]),
              data.frame(y[y$name==y$name[6],"ELSQFT"]),
              data.frame(y[y$name==y$name[7],"ELSQFT"]),
              data.frame(y[y$name==y$name[8],"ELSQFT"]),
              data.frame(y[y$name==y$name[9],"ELSQFT"]),
              data.frame(y[y$name==y$name[10],"ELSQFT"]),
              data.frame(y[y$name==y$name[11],"ELSQFT"]),
              data.frame(y[y$name==y$name[12],"ELSQFT"]))

# The sparkmat() command does most of the work:
sparkmat(esqft, locs=data.frame(y$lon, y$lat), new=TRUE,
         w=0.002, h=0.0002, just=c("left", "top"))

# We'll add some text for a nice finished product:
grid.text(y[1:12,1], unit(y$lon[1:12]+0.001, "native"),
          unit(y$lat[1:12]+0.00003, "native"),
          just=c("center", "bottom"), gp=gpar(cex=0.7))
grid.text("Degrees Longitude", 0.5, unit(-2.5, "lines"))
grid.text("Degrees Latitude", unit(-4.5, "lines"), 0.5, rot=90)
grid.text("Monthly Electrical Consumption (KwH/SqFt) of Yale Colleges",
          0.5, 0.8, gp=gpar(cex=1, font=2))
grid.text("July 1999 - July 2006",
          0.5, 0.74, gp=gpar(cex=1, font=2))
data(YaleEnergy)
whatis(YaleEnergy)

y <- YaleEnergy             # This is just for convenience.
esqft <- list(data.frame(y[y$name==y$name[1],"ELSQFT"]),
              data.frame(y[y$name==y$name[2],"ELSQFT"]),
              data.frame(y[y$name==y$name[3],"ELSQFT"]),
              data.frame(y[y$name==y$name[4],"ELSQFT"]),
              data.frame(y[y$name==y$name[5],"ELSQFT"]),
              data.frame(y[y$name==y$name[6],"ELSQFT"]),
              data.frame(y[y$name==y$name[7],"ELSQFT"]),
              data.frame(y[y$name==y$name[8],"ELSQFT"]),
              data.frame(y[y$name==y$name[9],"ELSQFT"]),
              data.frame(y[y$name==y$name[10],"ELSQFT"]),
              data.frame(y[y$name==y$name[11],"ELSQFT"]),
              data.frame(y[y$name==y$name[12],"ELSQFT"]))

# The sparkmat() command does most of the work:
sparkmat(esqft, locs=data.frame(y$lon, y$lat), new=TRUE,
         w=0.002, h=0.0002, just=c("left", "top"))

# We'll add some text for a nice finished product:
grid.text(y[1:12,1], unit(y$lon[1:12]+0.001, "native"),
          unit(y$lat[1:12]+0.00003, "native"),
          just=c("center", "bottom"), gp=gpar(cex=0.7))
grid.text("Degrees Longitude", 0.5, unit(-2.5, "lines"))
grid.text("Degrees Latitude", unit(-4.5, "lines"), 0.5, rot=90)
grid.text("Monthly Electrical Consumption (KwH/SqFt) of Yale Colleges",
          0.5, 0.8, gp=gpar(cex=1, font=2))
grid.text("July 1999 - July 2006",
          0.5, 0.74, gp=gpar(cex=1, font=2))

Data exploration tools from the Department of Statistics at Yale University

Description

This collection of data exploration tools was developed at Yale University for the graphical exploration of complex multivariate data. The main functions provided are barcode(), gpairs(), whatis(), and sparkmat(), although barcode() and gpairs() are now provided by packages of the same names, respectively.

Details

The package also includes several data sets. For more information, please see the help files for nasa and YaleEnergy. Please get in touch with us if you note any problems.

Author(s)

John W. Emerson, Walton Green

References

Chambers, J.M., Cleveland, W.S., Kleiner, B., and Tukey, P.A. (1983), Graphical Methods for Data Analysis, Belmont, CA: Wadsworth.
Friendly, M. (2002) 'Corrgrams: Exploratory displays for correlation matrices' American Statistician 56(4), 316–324.
Tufte, Edward R. (2006) Beautiful Evidence The Graphics Press, Cheshire, Connecticut. See https://www.edwardtufte.com for this and other references.

Package 'YaleToolkit'

Help Index

Read in chunks from a large file with row/column filtering to obtain a reasonable-sized data.frame.

Description

Usage

Arguments

Note

Examples

Get the number of rows of the file

Description

Usage

Arguments

Value

Examples

Pressure and High Cloud Cover Spatially Distributed Time Series

Description

Usage

Format

Details

Source

Examples

Draws a sparkline

Description

Usage

Arguments

Details

Note

Author(s)

References

See Also

Examples

Draws a panel of vertically stacked sparklines

Description

Usage

Arguments

Details

Note

Author(s)

References

See Also

Examples

Draws a sparkmat

Description

Usage

Arguments

Details

Author(s)

References

See Also

Examples

Data frame summary

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Monthly energy consumption of Yale residential colleges.

Description

Usage

Format

Source

Examples

Data exploration tools from the Department of Statistics at Yale University

Description

Details

Author(s)

References