PgR6 with Methods. Final users should use pagoo
instead of this, since is more easy to understand.
Inherits: PgR6
Super class
pagoo::PgR6 -> PgR6M
Methods
Method new()
Create a PgR6M object.
Usage
PgR6M$new(
data,
org_meta,
cluster_meta,
core_level = 95,
sep = "__",
verbose = TRUE,
DF,
group_meta
)Arguments
dataA
data.frameorDataFramecontaining at least the following columns:gene(gene name),org(organism name to which the gene belongs to), andcluster(group of orthologous to which the gene belongs to). More columns can be added as metadata for each gene.org_meta(optional) A
data.frameorDataFramecontaining additional metadata for organisms. Thisdata.framemust have a column named "org" with valid organisms names (that is, they should match with those provided indata, columnorg), and additional columns will be used as metadata. Each row should correspond to each organism.cluster_meta(optional) A
data.frameorDataFramecontaining additional metadata for clusters. Thisdata.framemust have a column named "cluster" with valid organisms names (that is, they should match with those provided indata, columncluster), and additional columns will be used as metadata. Each row should correspond to each cluster.core_levelThe initial core_level (that's the percentage of organisms a core cluster must be in to be considered as part of the core genome). Must be a number between 100 and 85, (default: 95). You can change it later by using the
$core_levelfield once the object was created.sepA separator. By default is '__'(two underscores). It will be used to create a unique
gid(gene identifier) for each gene.gids are created by pastingorgtogene, separated bysep.verboselogical. Whether to display progress messages when loading class.DFDeprecated. Use
datainstead.group_metaDeprecated. Use
cluster_metainstead.
Method rarefact()
Rarefact pangenome or corgenome. Compute the number of genes which belong to the pangenome or to the coregenome, for a number of random permutations of increasingly bigger sample of genomes.
Method dist()
Compute distance between all pairs of genomes. The default dist method is
"bray" (Bray-Curtis distance). Another used distance method is "jaccard",
but you should set binary = FALSE (see below) to obtain a meaningful result.
See vegdist for details, this is just a wrapper function.
Usage
PgR6M$dist(
method = "bray",
binary = FALSE,
diag = FALSE,
upper = FALSE,
na.rm = FALSE,
...
)Arguments
methodThe distance method to use. See vegdist for available methods, and details for each one.
binaryTransform abundance matrix into a presence/absence matrix before computing distance.
diagCompute diagonals.
upperReturn only the upper diagonal.
na.rmPairwise deletion of missing observations when computing dissimilarities.
...Other parameters. See vegdist for details.
Method pan_pca()
Performs a principal components analysis on the panmatrix
Arguments
centera logical value indicating whether the variables should be shifted to be zero centered. Alternately, a vector of length equal the number of columns of x can be supplied. The value is passed to scale.
scale.a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. The default is TRUE.
...Other arguments. See prcomp
Returns
Returns a list with class "prcomp". See prcomp for more information.
Method pg_power_law_fit()
Fits a power law curve for the pangenome rarefaction simulation.
Method cg_exp_decay_fit()
Fits an exponential decay curve for the coregenome rarefaction simulation.
Arguments
raref(Optional) A rarefaction matrix, as returned by
rarefact().pcountsAn integer of pseudo-counts. This is used to better fit the function at small numbers, as the linearization method requires to subtract a constant C, which is the coregenome size, from
y. Asybecomes closer to the coregenome size, this operation tends to 0, and its logarithm goes crazy. By defaultpcounts=10....Further arguments to be passed to
rarefact(). Ifrarefis missing, it will be computed with default arguments, or with the ones provided here.
Method gg_binmap()
Plot a pangenome binary map representing the presence/absence of each gene within each organism.
Returns
A binary map (ggplot2::geom_raster()), and a gg object (ggplot2
package) invisibly.
Method gg_dist()
Plot a heatmap showing the computed distance between all pairs of organisms.
Arguments
methodDistance method. One of "Jaccard" (default), or "Manhattan", see above.
...More arguments to be passed to
distManhattan.
Returns
A heatmap (ggplot2::geom_tile()), and a gg object (ggplot2
package) invisibly.
Method gg_pca()
Plot a scatter plot of a Principal Components Analysis.
Arguments
colourThe name of the column in
$organismsfield from which points will take colour (if provided).NULL(default) renders black points....More arguments to be passed to
ggplot2::autoplot().
Returns
A scatter plot (ggplot2::autoplot()), and a gg object (ggplot2
package) invisibly.
Method gg_pie()
Plot a pie chart showing the number of clusters of each pangenome category: core, shell, or cloud.
Method gg_curves()
Plot pangenome and/or coregenome curves with the fitted functions returned by pg_power_law_fit()
and cg_exp_decay_fit(). You can add points by adding + geom_points(), of ggplot2 package
Usage
PgR6M$gg_curves(what = c("pangenome", "coregenome"), ...)Method runShinyApp()
Launch an interactive shiny app. It contains a sidebar with controls and switches to interact with the pagoo object. You can drop/recover organisms from the dataset, modify the core_level, visualize statistics, plots, and browse cluster and gene information. In the main body, it contains 2 tabs to switch between summary statistics plots and core genome information on one side, and accessory genome plots and information on the other.
The lower part of each tab contains two tables, side by side. On the "Summary" tab, the left one contain information about core clusters, with one cluster per row. When one of them is selected (click), the one on the right is updated to show information about its genes (if provided), one gene per row. On the "Accessory" tab, a similar configuration is shown, but on this case only accessory clusters/genes are displayed. There is a slider on the sidebar where one can select the accessory frequency range to display.
Give it a try!
Take into account that big pangenomes can slow down the performance of the app. More than 50-70 organisms often leads to a delay in the update of the plots/tables.