ML_101
  • Introduction
  • ML Fundamentals
    • Basics
    • Optimization
    • How to prevent overfitting
    • Linear Algebra
    • Clustering
    • Calculate Parameters in CNN
    • Normalization
    • Confidence Interval
    • Quantization
  • Classical Machine Learning
    • Basics
    • Unsupervised Learning
  • Neural Networks
    • Basics
    • Activation function
    • Different Types of Convolution
    • Resnet
    • Mobilenet
  • Loss
    • L1 and L2 Loss
    • Hinge Loss
    • Cross-Entropy Loss
    • Binary Cross-Entropy Loss
    • Categorical Cross-Entropy Loss
    • (Optional) Focal Loss
    • (Optional) CORAL Loss
  • Computer Vision
    • Two Stage Object Detection
      • Metrics
      • ROI
      • R-CNN
      • Fast RCNN
      • Faster RCNN
      • Mask RCNN
    • One Stage Object Detection
      • FPN
      • YOLO
      • Single Shot MultiBox Detector(SSD)
    • Segmentation
      • Panoptic Segmentation
      • PSPNet
    • FaceNet
    • GAN
    • Imbalance problem in object detection
  • NLP
    • Embedding
    • RNN
    • LSTM
    • LSTM Ext.
    • RNN for text prediction
    • BLEU
    • Seq2Seq
    • Attention
    • Self Attention
    • Attention without RNN
    • Transformer
    • BERT
  • Parallel Computing
    • Communication
    • MapReduce
    • Parameter Server
    • Decentralized And Ring All Reduce
    • Federated Learning
    • Model Parallelism: GPipe
  • Anomaly Detection
    • DBSCAN
    • Autoencoder
  • Visualization
    • Saliency Maps
    • Fooling images
    • Class Visualization
Powered by GitBook
On this page
  • MapReduce
  • Broadcast
  • Map
  • Reduce
  • Data Parallelism
  • Parallel Gradient Descent Using MapReduce
  • Speedup Ratio
  • Communication Cost
  • Bulk Synchronous
  • Synchronization Cost
  • Footnote

Was this helpful?

  1. Parallel Computing

MapReduce

PreviousCommunicationNextParameter Server

Last updated 3 years ago

Was this helpful?

MapReduce

  • MapReduce is a programming model and software system developed by Google .

  • Characters: client-server architecture, message-passing communication, and bulk synchronous parallel.

  • Apache Hadoop is an open-source implementation of MapReduce.

  • Apache Spark is an improved open-source MapReduce.

Broadcast

Map

Reduce

Data Parallelism

Partition the data among worker nodes. (A node has a subset of data.)

Parallel Gradient Descent Using MapReduce

  • Broadcast: Server broadcast the up-to-date parameters wtw_twt​ o workers.

  • Map: Workers do computation locally.

    • Map (xi,yi,wt)(x_i,y_i,w_t)(xi​,yi​,wt​) to gi=(xiTwt−yi)xig_i=(x_i^T w_t-yi)x_igi​=(xiT​wt​−yi)xi​.

    • Obtain nnn vectors: g1,g2,g3,...,gng_1, g_2,g_3,...,g_ng1​,g2​,g3​,...,gn​

  • Reduce: Compute the sum: g=∑i=1ngig=\sum_{i=1}^{n}g_ig=∑i=1n​gi​

  • Every worker sums all the gi{g_i}gi​ stored in its local memory to get a vector.

  • Then, the server sums the resulting m vectors. (There are m workers.)

  • Server updates the parameters: wt+1=tt−α⋅gw_{t+1}=t_t-\alpha \cdot gwt+1​=tt​−α⋅g

Speedup Ratio

Communication Cost

  • Communication complexity: How many words are transmitted between server and workers.

    • Proportional to number of parameters.

    • Grow with number of worker nodes.

  • Latency: How much time it takes for a packet of data to get from one point to another. (Determined by the compute network.)

  • Communication time: comlexitybancwith+latency\frac{comlexity}{bancwith}+latencybancwithcomlexity​+latency

Bulk Synchronous

Synchronization Cost

Question: What if a node fails and then restart?

  • This node will be much slower than all the others.

  • It is called straggler.

  • Straggler effect:

    • The wall-clock time is determined by the slowest node.

    • It is a consequence of synchronization.

Footnote

Slides
Youtube
mapreduce-1
mapreduce-2
mapreduce-3
data_parallelism_1
Parallel_Gradient_Descent_Using_MapReduce_1
speedup_ratio_1
Bulk_Synchronous_1