Skip to contents

PgR6 with Methods. Final users should use pagoo instead of this, since is more easy to understand. Inherits: PgR6

Super class

pagoo::PgR6 -> PgR6M

Methods

Inherited methods


Method new()

Create a PgR6M object.

Usage

PgR6M$new(
  data,
  org_meta,
  cluster_meta,
  core_level = 95,
  sep = "__",
  verbose = TRUE,
  DF,
  group_meta
)

Arguments

data

A data.frame or DataFrame containing at least the following columns: gene (gene name), org (organism name to which the gene belongs to), and cluster (group of orthologous to which the gene belongs to). More columns can be added as metadata for each gene.

org_meta

(optional) A data.frame or DataFrame containing additional metadata for organisms. This data.frame must have a column named "org" with valid organisms names (that is, they should match with those provided in data, column org), and additional columns will be used as metadata. Each row should correspond to each organism.

cluster_meta

(optional) A data.frame or DataFrame containing additional metadata for clusters. This data.frame must have a column named "cluster" with valid organisms names (that is, they should match with those provided in data, column cluster), and additional columns will be used as metadata. Each row should correspond to each cluster.

core_level

The initial core_level (that's the percentage of organisms a core cluster must be in to be considered as part of the core genome). Must be a number between 100 and 85, (default: 95). You can change it later by using the $core_level field once the object was created.

sep

A separator. By default is '__'(two underscores). It will be used to create a unique gid (gene identifier) for each gene. gids are created by pasting org to gene, separated by sep.

verbose

logical. Whether to display progress messages when loading class.

DF

Deprecated. Use data instead.

group_meta

Deprecated. Use cluster_meta instead.

Returns

An R6 object of class PgR6M. It contains basic fields and methods for analyzing a pangenome. It also contains additional statistical methods for analyze it, and methods to make basic exploratory plots.


Method rarefact()

Rarefact pangenome or corgenome. Compute the number of genes which belong to the pangenome or to the coregenome, for a number of random permutations of increasingly bigger sample of genomes.

Usage

PgR6M$rarefact(what = "pangenome", n.perm = 10)

Arguments

what

One of "pangenome" or "coregenome".

n.perm

The number of permutations to compute (default: 10).

Returns

A matrix, rows are the number of genomes added, columns are permutations, and the cell number is the number of genes in each category.


Method dist()

Compute distance between all pairs of genomes. The default dist method is "bray" (Bray-Curtis distance). Another used distance method is "jaccard", but you should set binary = FALSE (see below) to obtain a meaningful result. See vegdist for details, this is just a wrapper function.

Usage

PgR6M$dist(
  method = "bray",
  binary = FALSE,
  diag = FALSE,
  upper = FALSE,
  na.rm = FALSE,
  ...
)

Arguments

method

The distance method to use. See vegdist for available methods, and details for each one.

binary

Transform abundance matrix into a presence/absence matrix before computing distance.

diag

Compute diagonals.

upper

Return only the upper diagonal.

na.rm

Pairwise deletion of missing observations when computing dissimilarities.

...

Other parameters. See vegdist for details.

Returns

A dist object containing all pairwise dissimilarities between genomes.


Method pan_pca()

Performs a principal components analysis on the panmatrix

Usage

PgR6M$pan_pca(center = TRUE, scale. = FALSE, ...)

Arguments

center

a logical value indicating whether the variables should be shifted to be zero centered. Alternately, a vector of length equal the number of columns of x can be supplied. The value is passed to scale.

scale.

a logical value indicating whether the variables should be scaled to have unit variance before the analysis takes place. The default is TRUE.

...

Other arguments. See prcomp

Returns

Returns a list with class "prcomp". See prcomp for more information.


Method pg_power_law_fit()

Fits a power law curve for the pangenome rarefaction simulation.

Usage

PgR6M$pg_power_law_fit(raref, ...)

Arguments

raref

(Optional) A rarefaction matrix, as returned by rarefact().

...

Further arguments to be passed to rarefact(). If raref is missing, it will be computed with default arguments, or with the ones provided here.

Returns

A list of two elements: $formula with a fitted function, and $params with fitted parameters. An attribute "alpha" is also returned (If alpha>1, then the pangenome is closed, otherwise is open.


Method cg_exp_decay_fit()

Fits an exponential decay curve for the coregenome rarefaction simulation.

Usage

PgR6M$cg_exp_decay_fit(raref, pcounts = 10, ...)

Arguments

raref

(Optional) A rarefaction matrix, as returned by rarefact().

pcounts

An integer of pseudo-counts. This is used to better fit the function at small numbers, as the linearization method requires to subtract a constant C, which is the coregenome size, from y. As y becomes closer to the coregenome size, this operation tends to 0, and its logarithm goes crazy. By default pcounts=10.

...

Further arguments to be passed to rarefact(). If raref is missing, it will be computed with default arguments, or with the ones provided here.

Returns

A list of two elements: $formula with a fitted function, and $params with fitted intercept and decay parameters.


Method gg_barplot()

Plot a barplot with the frequency of genes within the total number of genomes.

Usage

PgR6M$gg_barplot()

Returns

A barplot, and a gg object (ggplot2 package) invisibly.


Method gg_binmap()

Plot a pangenome binary map representing the presence/absence of each gene within each organism.

Usage

PgR6M$gg_binmap()

Returns

A binary map (ggplot2::geom_raster()), and a gg object (ggplot2 package) invisibly.


Method gg_dist()

Plot a heatmap showing the computed distance between all pairs of organisms.

Usage

PgR6M$gg_dist(method = "bray", ...)

Arguments

method

Distance method. One of "Jaccard" (default), or "Manhattan", see above.

...

More arguments to be passed to distManhattan.

Returns

A heatmap (ggplot2::geom_tile()), and a gg object (ggplot2 package) invisibly.


Method gg_pca()

Plot a scatter plot of a Principal Components Analysis.

Usage

PgR6M$gg_pca(colour = NULL, ...)

Arguments

colour

The name of the column in $organisms field from which points will take colour (if provided). NULL (default) renders black points.

...

More arguments to be passed to ggplot2::autoplot().

Returns

A scatter plot (ggplot2::autoplot()), and a gg object (ggplot2 package) invisibly.


Method gg_pie()

Plot a pie chart showing the number of clusters of each pangenome category: core, shell, or cloud.

Usage

PgR6M$gg_pie()

Returns

A pie chart (ggplot2::geom_bar() + coord_polar()), and a gg object (ggplot2 package) invisibly.


Method gg_curves()

Plot pangenome and/or coregenome curves with the fitted functions returned by pg_power_law_fit() and cg_exp_decay_fit(). You can add points by adding + geom_points(), of ggplot2 package

Usage

PgR6M$gg_curves(what = c("pangenome", "coregenome"), ...)

Arguments

what

One of "pangenome" or "coregenome".

...

????

Returns

A scatter plot, and a gg object (ggplot2 package) invisibly.


Method runShinyApp()

Launch an interactive shiny app. It contains a sidebar with controls and switches to interact with the pagoo object. You can drop/recover organisms from the dataset, modify the core_level, visualize statistics, plots, and browse cluster and gene information. In the main body, it contains 2 tabs to switch between summary statistics plots and core genome information on one side, and accessory genome plots and information on the other.

The lower part of each tab contains two tables, side by side. On the "Summary" tab, the left one contain information about core clusters, with one cluster per row. When one of them is selected (click), the one on the right is updated to show information about its genes (if provided), one gene per row. On the "Accessory" tab, a similar configuration is shown, but on this case only accessory clusters/genes are displayed. There is a slider on the sidebar where one can select the accessory frequency range to display.

Give it a try!

Take into account that big pangenomes can slow down the performance of the app. More than 50-70 organisms often leads to a delay in the update of the plots/tables.

Usage

PgR6M$runShinyApp()


Method clone()

The objects of this class are cloneable with this method.

Usage

PgR6M$clone(deep = FALSE)

Arguments

deep

Whether to make a deep clone.