Global Information Lookup Global Information

Programming with Big Data in R information


bdrp
ParadigmSPMD and MPMD
Designed byWei-Chen Chen, George Ostrouchov, Pragneshkumar Patel, and Drew Schmidt
DeveloperpbdR Core Team
First appearedSeptember 2012; 11 years ago (2012-09)
Preview release
Through GitHub at RBigData
Typing disciplineDynamic
OSCross-platform
LicenseGeneral Public License and Mozilla Public License
Websitewww.r-pbd.org
Influenced by
R, C, Fortran, MPI, and ØMQ

Programming with Big Data in R (pbdR)[1] is a series of R packages and an environment for statistical computing with big data by using high-performance statistical computation.[2][3] The pbdR uses the same programming language as R with S3/S4 classes and methods which is used among statisticians and data miners for developing statistical software. The significant difference between pbdR and R code is that pbdR mainly focuses on distributed memory systems, where data are distributed across several processors and analyzed in a batch mode, while communications between processors are based on MPI that is easily used in large high-performance computing (HPC) systems. R system mainly focuses[citation needed] on single multi-core machines for data analysis via an interactive mode such as GUI interface.

Two main implementations in R using MPI are Rmpi[4] and pbdMPI of pbdR.

  • The pbdR built on pbdMPI uses SPMD parallelism where every processor is considered as worker and owns parts of data. The SPMD parallelism introduced in mid 1980 is particularly efficient in homogeneous computing environments for large data, for example, performing singular value decomposition on a large matrix, or performing clustering analysis on high-dimensional large data. On the other hand, there is no restriction to use manager/workers parallelism in SPMD parallelism environment.
  • The Rmpi[4] uses manager/workers parallelism where one main processor (manager) serves as the control of all other processors (workers). The manager/workers parallelism introduced around early 2000 is particularly efficient for large tasks in small clusters, for example, bootstrap method and Monte Carlo simulation in applied statistics since i.i.d. assumption is commonly used in most statistical analysis. In particular, task pull parallelism has better performance for Rmpi in heterogeneous computing environments.

The idea of SPMD parallelism is to let every processor do the same amount of work, but on different parts of a large data set. For example, a modern GPU is a large collection of slower co-processors that can simply apply the same computation on different parts of relatively smaller data, but the SPMD parallelism ends up with an efficient way to obtain final solutions (i.e. time to solution is shorter).[5]

  1. ^ Ostrouchov, G., Chen, W.-C., Schmidt, D., Patel, P. (2012). "Programming with Big Data in R".{{cite web}}: CS1 maint: multiple names: authors list (link)
  2. ^ Chen, W.-C. & Ostrouchov, G. (2011). "HPSC -- High Performance Statistical Computing for Data Intensive Research". Archived from the original on 2013-07-19. Retrieved 2013-06-25.
  3. ^ "Basic Tutorials for R to Start Analyzing Data". 3 November 2022.
  4. ^ a b Yu, H. (2002). "Rmpi: Parallel Statistical Computing in R". R News.
  5. ^ Mike Houston. "Folding@Home - GPGPU". Retrieved 2007-10-04.

and 20 Related for: Programming with Big Data in R information

Request time (Page generated in 1.1977 seconds.)

Programming with Big Data in R

Last Update:

Programming with Big Data in R (pbdR) is a series of R packages and an environment for statistical computing with big data by using high-performance statistical...

Word Count : 1316

Big data

Last Update:

Big data primarily refers to data sets that are too large or complex to be dealt with by traditional data-processing application software. Data with many...

Word Count : 16295

List of statistical software

Last Update:

data PSPP – A free software alternative to IBM SPSS Statistics R – free implementation of the S (programming language) Programming with Big Data in R...

Word Count : 1483

Data science

Last Update:

(programming language) R (programming language) Data engineering Big data Machine learning Donoho, David (2017). "50 Years of Data Science". Journal of...

Word Count : 2829

Outline of machine learning

Last Update:

logic Probability matching Probit model Product of experts Programming with Big Data in R Proper generalized decomposition Pruning (decision trees) Pushpak...

Word Count : 3580

ScaLAPACK

Last Update:

community at large. Programming with Big Data in R fully utilizes ScaLAPACK and two-dimensional block cyclic decomposition for Big Data statistical analysis...

Word Count : 310

Data structure

Last Update:

methods and programming languages emphasize data structures, rather than algorithms, as the key organizing factor in software design. Data structures can...

Word Count : 1825

Data mining

Last Update:

comprehensive data analytics framework. Massive Online Analysis (MOA): a real-time big data stream mining with concept drift tool in the Java programming language...

Word Count : 5009

SAS language

Last Update:

fourth-generation procedural programming language designed for the statistical analysis of data. It is Turing-complete and domain specific, with many of the attributes...

Word Count : 1534

Apache Spark

Last Update:

analytics engine for large-scale data processing. Spark provides an interface for programming clusters with implicit data parallelism and fault tolerance...

Word Count : 2732

Social data science

Last Update:

social data scientist combines cdomain knowledge and specialized theories from the social sciences with programming, statistical and other data analysis...

Word Count : 3083

Glossary of computer science

Last Update:

used in computer science, its sub-disciplines, and related fields, including terms relevant to software, data science, and computer programming. Contents: ...

Word Count : 23796

Iteratively reweighted least squares

Last Update:

One of the advantages of IRLS over linear programming and convex programming is that it can be used with Gauss–Newton and Levenberg–Marquardt numerical...

Word Count : 820

Analytics

Last Update:

International Data Corporation, global spending on big data and business analytics (BDA) solutions is estimated to reach $215.7 billion in 2021. As per...

Word Count : 3318

Data lineage

Last Update:

metadata is harvested: data lineage involving software packages for structured data, programming languages, and big data. Data lineage information includes...

Word Count : 6167

Apache Hadoop

Last Update:

of data and computation.[vague] It provides a software framework for distributed storage and processing of big data using the MapReduce programming model...

Word Count : 5093

Solution stack

Last Update:

(database management systems) Jamstack JavaScript (programming language) APIs (Application programming interfaces) Markup (content) MARQS Apache Mesos (node...

Word Count : 1385

Computer program

Last Update:

A computer program is a sequence or set of instructions in a programming language for a computer to execute. It is one component of software, which also...

Word Count : 13249

Declarative programming

Last Update:

In computer science, declarative programming is a programming paradigm—a style of building the structure and elements of computer programs—that expresses...

Word Count : 2378

Associative array

Last Update:

or other more specialized structures. Many programming languages include associative arrays as primitive data types, while many other languages provide...

Word Count : 2769

PDF Search Engine © AllGlobal.net