vignette_pkggraph

Quickstart

suppressPackageStartupMessages(library("dplyr"))          # for tidy data manipulations
suppressPackageStartupMessages(library("magrittr"))       # for friendly piping
suppressPackageStartupMessages(library("network"))        # for plotting
suppressPackageStartupMessages(library("sna"))            # for plotting
suppressPackageStartupMessages(library("statnet.common")) # for plotting
suppressPackageStartupMessages(library("networkD3"))      # for plotting
suppressPackageStartupMessages(library("igraph"))         # for graph computations
suppressPackageStartupMessages(library("pkggraph"))       # attach the package
suppressMessages(init(local = TRUE))                      # initiate the package
get_neighborhood("mlr") # a tibble, every row indicates a dependency
## Warning: `arrange_()` was deprecated in dplyr 0.7.0.
## ℹ Please use `arrange()` instead.
## ℹ See vignette('programming') for more help
## ℹ The deprecated feature was likely used in the pkggraph package.
##   Please report the issue at <https://github.com/talegari/pkggraph/issues>.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
## # A tibble: 445 × 3
##    pkg_1       relation pkg_2       
##    <chr>       <fct>    <chr>       
##  1 ada         Depends  rpart       
##  2 adabag      Depends  rpart       
##  3 adabag      Depends  mlbench     
##  4 adabag      Depends  caret       
##  5 bartMachine Depends  randomForest
##  6 batchtools  Depends  data.table  
##  7 bst         Depends  gbm         
##  8 caret       Depends  ggplot2     
##  9 clusterSim  Depends  cluster     
## 10 clusterSim  Depends  MASS        
## # ℹ 435 more rows
# observe only 'Imports' and reverse 'Imports'
neighborhood_graph("mlr", relation = "Imports") %>% 
  plot()

# observe the neighborhood of 'tidytext' package
get_neighborhood("tidytext") %>% 
  make_neighborhood_graph() %>% 
  plot()

# interact with the neighborhood of 'tm' package
# legend does not appear in the vignette, but it appears directly
neighborhood_graph("tm") %>% 
  plotd3(700, 700)
# which packages work as 'hubs' or 'authorities' in the above graph
neighborhood_graph("tidytext", type = "igraph") %>% 
  extract2(1) %>% 
  authority_score() %>% 
  extract2("vector") %>% 
  tibble(package = names(.), score = .) %>% 
  top_n(10, score) %>% 
  ggplot(aes(reorder(package, score), score)) + 
    geom_bar(stat = "identity") +
    xlab("package") +
    ylab("score") +
    coord_flip()
## Warning: `authority_score()` was deprecated in igraph 2.1.0.
## ℹ Please use `hits_scores()` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

Introduction

The package pkggraph aims to provide a consistent and intuitive platform to explore the dependencies of packages in CRAN like repositories.

The package attempts to strike a balance between two aspects:

  • Understanding characteristics of the repository, at repository level (relating to ‘forest’)
  • Discover relevant packages and their contribution (relating to ‘trees’)

So that, we do not see trees for the forest nor see only a forest !

Important Features

The important features of pkggraph are:

  • Most functions return a three column tibble (pkg_1, relation, pkg_2). The first row in the table below indicates that dplyr package ‘Imports’ assertthat package.
get_imports(c("dplyr", "tidyr"))
## # A tibble: 20 × 3
##    pkg_1 relation pkg_2     
##    <chr> <fct>    <chr>     
##  1 dplyr Imports  assertthat
##  2 dplyr Imports  bindrcpp  
##  3 dplyr Imports  glue      
##  4 dplyr Imports  magrittr  
##  5 dplyr Imports  methods   
##  6 dplyr Imports  pkgconfig 
##  7 dplyr Imports  rlang     
##  8 dplyr Imports  R6        
##  9 dplyr Imports  Rcpp      
## 10 dplyr Imports  tibble    
## 11 dplyr Imports  utils     
## 12 tidyr Imports  dplyr     
## 13 tidyr Imports  glue      
## 14 tidyr Imports  magrittr  
## 15 tidyr Imports  purrr     
## 16 tidyr Imports  rlang     
## 17 tidyr Imports  Rcpp      
## 18 tidyr Imports  stringi   
## 19 tidyr Imports  tibble    
## 20 tidyr Imports  tidyselect
  • There are three function families:

    • get family: These functions return a tibble. ex: get_reverse_depends
    • neighborhood family: These functions return a pkggraph object containing a network or a igraph object. ex: neighborhood_graph
    • relies family: These functions capture recursive dependencies.
  • plot method which uses ggnetwork package to generate a static plot.

  • plotd3 function uses networkD3 to produce a interactive D3 plot.

The five different types of dependencies a package can have over another are: Depends, Imports, LinkingTo, Suggests and Enhances.

init

Always, begin with init(). This creates two variables deptable and packmeta in the environment where it is called. The variables are created using local copy or computed after downloading from internet (when local = FALSE, the default value). It is suggested to use init(local = FALSE) to get up to date dependencies.

library("pkggraph")
init(local = FALSE)

The repository argument takes CRAN, bioconductor and omegahat repositories. For other CRAN-like repositories not listed in repository, an additional argument named repos is required.

get family

  • These functions return a tibble
  • All of them take packages as their first argument.
  • All of them take level argument (Default value is 1).
get_imports("ggplot2")
## # A tibble: 10 × 3
##    pkg_1   relation pkg_2   
##    <chr>   <fct>    <chr>   
##  1 ggplot2 Imports  digest  
##  2 ggplot2 Imports  grid    
##  3 ggplot2 Imports  gtable  
##  4 ggplot2 Imports  MASS    
##  5 ggplot2 Imports  plyr    
##  6 ggplot2 Imports  reshape2
##  7 ggplot2 Imports  scales  
##  8 ggplot2 Imports  stats   
##  9 ggplot2 Imports  tibble  
## 10 ggplot2 Imports  lazyeval

Lets observe packages that ‘Suggest’ knitr.

get_reverse_suggests("knitr", level = 1)
## # A tibble: 2,213 × 3
##    pkg_1           relation pkg_2
##    <chr>           <fct>    <chr>
##  1 abbyyR          Suggests knitr
##  2 ABC.RAP         Suggests knitr
##  3 ABHgenotypeR    Suggests knitr
##  4 AbSim           Suggests knitr
##  5 ACMEeqtl        Suggests knitr
##  6 acmeR           Suggests knitr
##  7 acnr            Suggests knitr
##  8 ACSNMineR       Suggests knitr
##  9 adaptiveGPCA    Suggests knitr
## 10 additivityTests Suggests knitr
## # ℹ 2,203 more rows

By setting level = 2, observe that packages from first level (first column of the previous table) and their suggestors are captured.

get_reverse_suggests("knitr", level = 2)
## # A tibble: 5,387 × 3
##    pkg_1        relation pkg_2  
##    <chr>        <fct>    <chr>  
##  1 abbyyR       Suggests knitr  
##  2 ABCoptim     Suggests covr   
##  3 ABC.RAP      Suggests knitr  
##  4 abctools     Suggests ggplot2
##  5 abd          Suggests ggplot2
##  6 abd          Suggests Hmisc  
##  7 ABHgenotypeR Suggests knitr  
##  8 AbSim        Suggests knitr  
##  9 acebayes     Suggests R.rsp  
## 10 ACMEeqtl     Suggests knitr  
## # ℹ 5,377 more rows

What if we required to capture dependencies of more than one type, say both Depends and Imports?

get_all_dependencies and get_all_reverse_dependencies

These functions capture direct and reverse dependencies until the suggested level for any subset of dependency type.

get_all_dependencies("mlr", relation = c("Depends", "Imports"))
## # A tibble: 9 × 3
##   pkg_1 relation pkg_2       
##   <chr> <fct>    <chr>       
## 1 mlr   Depends  ParamHelpers
## 2 mlr   Imports  BBmisc      
## 3 mlr   Imports  backports   
## 4 mlr   Imports  ggplot2     
## 5 mlr   Imports  stringi     
## 6 mlr   Imports  checkmate   
## 7 mlr   Imports  data.table  
## 8 mlr   Imports  parallelMap 
## 9 mlr   Imports  survival
get_all_dependencies("mlr", relation = c("Depends", "Imports"), level = 2)
## # A tibble: 303 × 3
##    pkg_1       relation pkg_2          
##    <chr>       <fct>    <chr>          
##  1 ada         Depends  rpart          
##  2 adabag      Depends  rpart          
##  3 adabag      Depends  mlbench        
##  4 adabag      Depends  caret          
##  5 bartMachine Depends  rJava          
##  6 bartMachine Depends  bartMachineJARs
##  7 bartMachine Depends  car            
##  8 bartMachine Depends  randomForest   
##  9 bartMachine Depends  missForest     
## 10 batchtools  Depends  data.table     
## # ℹ 293 more rows

Observe that ada ‘Depends’ on rpart.

Sometimes, we would like to capture only specified dependencies recursively. In this case, at second level, say we would like to capture only ‘Depends’ and ‘Imports’ of packages which were dependents/imports of mlr. Then, set strict = TRUE.

get_all_dependencies("mlr"
                     , relation = c("Depends", "Imports")
                     , level    = 2
                     , strict   = TRUE)
## # A tibble: 28 × 3
##    pkg_1     relation pkg_2       
##    <chr>     <fct>    <chr>       
##  1 mlr       Depends  ParamHelpers
##  2 BBmisc    Imports  checkmate   
##  3 checkmate Imports  backports   
##  4 ggplot2   Imports  digest      
##  5 ggplot2   Imports  grid        
##  6 ggplot2   Imports  gtable      
##  7 ggplot2   Imports  MASS        
##  8 ggplot2   Imports  plyr        
##  9 ggplot2   Imports  reshape2    
## 10 ggplot2   Imports  scales      
## # ℹ 18 more rows

Notice that ada was ’Suggest’ed by mlr. That is why, it appeared when strict was FALSE(default).

What if we required to capture both dependencies and reverse dependencies until a specified level?

get_neighborhood

This function captures both dependencies and reverse dependencies until a specified level for a given subset of dependency type.

get_neighborhood("hash", level = 2)
## # A tibble: 62 × 3
##    pkg_1       relation pkg_2 
##    <chr>       <fct>    <chr> 
##  1 BOG         Depends  hash  
##  2 COMBIA      Depends  hash  
##  3 GABi        Depends  hash  
##  4 HAP.ROR     Depends  hash  
##  5 neuroim     Depends  hash  
##  6 orderbook   Depends  hash  
##  7 rpartitions Depends  hash  
##  8 Rtextrankr  Depends  KoNLP 
##  9 CITAN       Imports  hash  
## 10 covr        Imports  crayon
## # ℹ 52 more rows
get_neighborhood("hash", level = 2) %>% 
  make_neighborhood_graph %>% 
  plot()

Observe that testthat family appears due to Suggests. Lets look at Depends and Imports only:

get_neighborhood("hash"
                 , level = 2
                 , relation = c("Imports", "Depends")
                 , strict = TRUE) %>% 
  make_neighborhood_graph %>% 
  plot()

Observe that the graph below captures the fact: parallelMap ‘Imports’ BBmisc

get_neighborhood("mlr", relation = "Imports") %>% 
  make_neighborhood_graph() %>% 
  plot()

get_neighborhood looks if any packages until the specified level have a dependency on each other at one level higher. This can be done turned off by setting interconnect = FALSE.

get_neighborhood("mlr", relation = "Imports", interconnect = FALSE) %>% 
  make_neighborhood_graph() %>% 
  plot()

neighborhood_graph and make_neighborhood_graph

  • neighborhood_graph creates a graph object of a set of packages of class pkggraph. This takes same arguments as get_neighborhood and additionally type. Argument type defaults to igraph. The alternative is network.
neighborhood_graph("caret", relation = "Imports") %>% 
  plot()

make_neighborhood_graph accepts the output of any get_* as input and produces a graph object.

Essentially, you can get the information from get_ function after some trial and error, then create a graph object for further analysis or plotting.

get_all_reverse_dependencies("rpart", relation = "Imports") %>% 
make_neighborhood_graph() %>% 
  plot()

Checking dependencies and relies

For quick dependency checks, one could use infix operators: %depends%, %imports%, %linkingto%, %suggests%, %enhances%.

"dplyr" %imports% "tibble"
## [1] TRUE

A package A is said to rely on package B if A either ‘Depends’, ‘Imports’ or ‘LinkingTo’ B, recursively. relies function captures this.

relies("glmnet")[[1]]
##  [1] "Matrix"    "utils"     "foreach"   "methods"   "graphics"  "grid"     
##  [7] "stats"     "lattice"   "codetools" "iterators" "grDevices"
# level 1 dependencies of "glmnet" are:
get_all_dependencies("glmnet", relation = c("Imports", "Depends", "LinkingTo"))[[3]]
## [1] "Matrix"  "foreach"
"glmnet" %relies% "grid"
## [1] TRUE
reverse_relies("tokenizers")[[1]]
## [1] "covfefe"    "ptstem"     "tidytext"   "statquotes" "widyr"

plot and its handles

plot produces a static plot from a pkggraph object. The available handles are:

  • The default: The node size is based on the number of ‘in’ and ‘out’ degree.
pkggraph::neighborhood_graph("hash") %>%
  plot()

  • Let node size depend on ‘in’ degree alone and white ‘background’:
pkggraph::neighborhood_graph("hash") %>%
  plot(nodeImportance = "in", background = "white")

  • Without variable node size and white ‘background’:
pkggraph::neighborhood_graph("hash") %>%
  plot(nodeImportance = "none", background = "white")

plotd3

For interactive exploration of large graphs, plotd3 might be better than static plots. Note that,

  • By holding the mouse over a vertex, highlights all related nodes and edges.
  • By clicking a vertex and dragging it, changes the way graph looks and gives a better view of related ‘cluster’.
# legend does not appear in the vignette, but it appears directly
plotd3(neighborhood_graph("tibble"), height = 1000, width = 1000)

Acknowledgement

Package authors Srikanth KS and Nikhil Singh would like to thank R core, Hadley Wickham for tidyverse framework and the fantastic R community!