As it was already mentioned, mdatools has its own functions for plotting with several extra options not available in basic plot tools. These functions are used to make all plots in the models and results (e.g. scores, loadings, predictions, etc.) therefore it can be useful to spend some time and learn the new features (e.g. coloring data points with a vector of values or using manual ticks for axes). But if you are going to make all plots manually (e.g. using ggplot2) you can skip this and the next sections.
In this section we will look at how to make simple plots from your data objects. Simple plots are scatter (
type = "p"), density-scatter (
type = "d"), line (
type = "l"), line-scatter (
type = "b"), bar (
type = "h") or errorbar (
type = "e") plots made for a one set of objects. All plots can be created using the same method
mdaplot() by providing a whole dataset as a main argument. Depending on a plot type, the method “treats” the data values differently.
This table below contains a list of parameters for
mdaplot(), which are not available for traditional R plots. In this section we will describe most of the details using simple examples.
||a vector of values (same as number of rows in data) used to colorize plot objects with a color gradient.|
||color map for the color gradient (possible values are
||when color grouping is used,
||logical parameter showing labels beside plot objects (points, lines, etc).
Size and color of labels can be adjusted using parameters
||parameter telling what to use as labels (by default row names, but can also be indices or manual values).|
||color for the labels.|
||font size for the labels (as a scale factor).|
||vector with numeric values to show the x-axis ticks at.|
||vector with numeric values to show the y-axis ticks at.|
||vector with labels (numbers or text) for the x-ticks.|
||vector with labels (numbers or text) for the y-ticks.|
||an integer between 0 and 3 telling at which angle the x-tick labels have to be shown.|
||an integer between 0 and 3 telling at which angle the y-tick labels have to be shown.|
||a vector with two numbers — position of horizontal and vertical lines on a plot (e.g. coordinate axes).|
||logical, show or not a grid. It places grid behind the plot object in contrast
||logical, show or not points or lines corresponded to the excluded rows.|
||opacity of colors in range 0…1 (applied to all colors of current plot).|
We will use
people dataset for illustration how scatter plots work (see
?people for details).
For scatter plots the method takes first two columns of a dataset as x and y vectors. If only one column is available
mdaplot() uses it for y-values and generate x-values as an index for each value.
All parameters, available for the standard
points() method will work with
mdaplot() as well. Besides that, you can colorize points according to some values using a color gradient. By default, the gradient is generated using one of the diverging color schemes from colorbrewer2.org, but this can be changed using parameter
colmap as it is shown below.
par(mfrow = c(2, 2)) mdaplot(people, type = "p", cgroup = people[, "Beer"]) mdaplot(people, type = "p", cgroup = people[, "Beer"], show.colorbar = FALSE) mdaplot(people, type = "p", cgroup = people[, "Beer"], colmap = "gray") mdaplot(people, type = "p", cgroup = people[, "Beer"], colmap = c("red", "yellow", "green"))
If the vector with values for color grouping is a factor, level labels will be shown on a colorbar legend and there will be a small margin between bars.
If you use point characters from 21 to 25 (the ones which allow to specify both color of border and
background of the marker symbol) you can use
pch.colinv to apply color grouping to background
instead of border. See an example below
Another useful option is adding labels to the data points. By default row names will be taken for the labels but you can specify a parameter
labels, which can be either a text (
"indices") or a vector with values to show as labels. Color and size of the labels can be adjusted.
par(mfrow = c(2, 2)) mdaplot(people, type = "p", show.labels = TRUE) mdaplot(people, type = "p", show.labels = TRUE, labels = "indices") mdaplot(people, type = "p", show.labels = TRUE, labels = "names", lab.col = "red", lab.cex = 0.5) mdaplot(people, type = "p", show.labels = TRUE, labels = paste0("O", seq_len(nrow(people))))
To avoid any problems with arguments when you make a subset, use
mda.subset() instead of the traditional ways. As you can see in the example below, if we take first 16 rows, information about excluded objects (as well as all other uder defined arguments, e.g.
"name") disappear and they are show in the plot as normal. But if we use
mda.subset() it will take the subset without excluded rows as it is shown below. The subset can be created using logical expressions as well as indices or names of the rows.
weight = people[, "Weight"] par(mfrow = c(2, 2)) mdaplot(people[1:16, ], show.labels = TRUE) mdaplot(mda.subset(people, subset = 1:16), show.labels = TRUE) mdaplot(mda.subset(people, subset = c("Lisa", "Benito", "Federico")), show.labels = TRUE) mdaplot(mda.subset(people, subset = weight > 70), show.labels = TRUE)
You can also manually specify axis ticks and tick labels. The labels can be rotated using parameters
ylas, see the examples below.
par(mfrow = c(2, 2)) mdaplot(people, xticks = c(165, 175, 185), xticklabels = c("Small", "Medium", "Hight")) mdaplot(people, yticks = c(55, 70, 85), yticklabels = c("Light", "Medium", "Heavy")) mdaplot(people, xticks = c(165, 175, 185), xticklabels = c("Small", "Medium", "Hight"), xlas = 2) mdaplot(people, yticks = c(55, 70, 85), yticklabels = c("Light", "Medium", "Heavy"), ylas = 2)
If both axis labels and rotated axis ticks have to be shown, you can adjust plot margins and position of the label using
par() function and
mtext() for positioning axis label manually.
par(mfrow = c(1, 2)) # change margin for bottom part par(mar = c(6, 4, 4, 2) + 0.1) mdaplot(people, xticks = c(165, 175, 185), xticklabels = c("Small", "Medium", "Hight"), xlas = 2, xlab = "") mtext("Height", side = 1, line = 5) # change margin for left part par(mar = c(5, 6, 4, 1) + 0.1) mdaplot(people, yticks = c(55, 70, 85), yticklabels = c("Light", "Medium", "Heavy"), ylas = 2, ylab = "") mtext("Weight", side = 2, line = 5)
There is also a couple of other parameters, allowing to show/hide grid as well as show horizontal and vertical lines on the plot (axes limits will be adjusted correspondingly).
From version 0.10.0 function
mdaplot() returns plot series data, which can be used for
extra options. For example, in case of scatter plot you can add confidence ellipse or convex
hull for data points. To do this, points must be color grouped by a factor as shown below. For confidence ellipse you can specify the confidence level (default 0.95).
In case when number of data points is large (e.g. when dealing with images, where every pixel is a data point), using density plot is a good alternative to conventional scatter plots. The plot does not show all data points but instead split the whole plot space into small hexagonal regions and use color gradient for illustration a density of the points in each region. This approach is known as hexagonal binning. To create a density plot simply use
type="d". You can also specify color map and number of bins along each axes (
The code below show an example of using density plots for 100000 data points with x and y values taken from normally distributed population.
When line plot is created, the
mdatools() shows a line plot for every row. So if data set has more than one row, the plot will show a banch of lines having same properties (color, type, etc). This is particularly useful when working with signals and spectroscopic data. In this subsection we will use simulated UV/Vis spectra from
Here are simple examples of how to make the line plots.
Most of the parameters described for scatter plots will work for the line plots as well. For example, you can colorise the lines by using a vector with some values (in the example below I use concentration of one of the chemical components).
One of the new features, appeared first in version 0.8.0, is a special attribute, allowing to provide manual x-values —
'xaxis.values' (similar parameter for y-values is
'yaxis.values'). In the example below we show the spectra using wavelength in nm and wavenumbers in inverse cm.
When you provide such data to any model methods (e.g. PCA, PLS, etc), then all variable related results (loadings, regression coefficients, etc.) will inherit this attribute and use it for making line plots.
Bar and errorbar plots
Bar plot is perhaps the simplest as it shows values for the first row of the data as bars. Let us get back to the people data, calculate mean for all variables and show the calculated values as a bar plot (excluding column with Income as it has much bigger values comparing to the others) — in the simplest form as well as with some extra parameters.
m = matrix(apply(people, 2, mean), nrow = 1) colnames(m) = colnames(people) m = mda.exclcols(m, "Income") attr(m, "name") = "People (means)" par(mfrow = c(2, 1)) mdaplot(m, type = "h") mdaplot(m, type = "h", xticks = 1:12, xticklabels = colnames(people), col = "red", show.labels = TRUE, labels = "values")
Errorbar plot always expect data to have two or three rows. The first row is a origin points of the error bars, secod row is the size of the bottom part and the third row is the size of the top part. If data has only two rows the both parts will be symmetric related to the origin. In the example below we show mean and standard deviation of the people data as an error bar.
d = rbind(apply(people, 2, mean), apply(people, 2, sd)) rownames(d) = c("Mean", "Std") colnames(d) = colnames(people) attr(d, 'name') = "Statistics" d = mda.exclcols(d, "Income") par(mfrow = c(2, 1)) mdaplot(d, type = "e") mdaplot(d, type = "e", xticks = 1:12, xticklabels = colnames(people), col = "red")
All simple plots can be combined together on the same axes. In this case, first plot is created as usual and all other plots have to be created with option
show.axes = FALSE as it is shown below. It must be noted that in this case axes limits have to be set manually when creating the first plot.
In the next section we will discuss plots for several groups of objects (rows).