22. System Design

Table of Contents

A. Introduction

The three stages to the interview:

  • Problem: The interviewer gives you a purposely vague task.
  • Clarification: You show leadership and experience by asking meaningful questions to clarify requirements and performance priorities.
  • Presentation: You give an impromptu presentation on solutions (probably as you’re thinking of the solution.)

B. Methodology - Framework for System Design

  1. Always start with clarification!

    • Ask about the priorities, scale and restrictions of the system

      • How many users?
      • How much data?
      • How many requests per second?
      • Latency requirements?
  2. After understanding the requirements, start with a high level solution

    • Start abstract and simple to lay out an end to end solution
    • Draw an abstract diagram of the system
    • the components should be technology agnostic and abstract away scaling and concrete methods
    • eg: Checking whether the system prediction is real time, pre calculated batch or some hybrid
  3. Start filling in some details; architectural components

    • eg: How models are delivered to the product? What type of infrastructure is required?

      • Decribe the reasoning behind the design choices
    • add data models if necessary

  4. Start with concerete component implementations

    • Discuss the tradeoffs of the design choices
    • eg: Why did you choose a particular database? Why did you choose a particular language?
  5. Finishing touches

C. ML Model Development

  1. Providing wide array of model types for the problem; it’s good to cover some breadth instead of naming one solution.

  2. Offline training and evaluation

    • Data used for training
      • How to deal with data imbalances?
    • Metrics used to compare models- Precision, Recall, F1 Score, ROC AUC, etc.
    • Evaluation methods
      • K-fold cross validation
      • Training sample selection
  3. Online Evaluation

    • Evaluate the model performance via A/B testing
      • What metrics to use?
  4. Model lifecycle management

    • Monitoring the model to ensure its health
    • Operations to track to keep model performance
      • How to manage the model lifecycle?
      • How to deal with model drift?
      • How to deal with model decay?

I. ML Operational Diffculties

A. ML Drifts

ML Drifts
ML Drifts
  1. Concept drift - The learned distribution changes over time due to behavioural changes
  2. Data drift - The learned distribution changes over time due to changes in the data

What to do about it?

  1. Setup data integrity and outlier monitoring
    • Data errors slowly degrade
    • Missing values
    • Broken data pipelines due to bugs or API updates
  2. Setup model drift monitoring

D. Elements of Distributed Systems

Distributed system is an interconnected set of nodes that are linked by a network and share information between them.

I. Microservices

A design paradigm where an application is structured to collect several independent services. The characteristics of these services include:

  • Independent deployment
  • Highly scalable
  • Loosely coupled
  • Highly maintainable and testable

II. Service Discovery

A service discovery is a mechanism that allows services to find and communicate with each other without hardcoding the location of each other.

Eg: Apache zookeeper, consul, etcd

III. Load Balancing

A device that acts as a reverse proxy and distributes the network traffic across several different servers. There are two distinct types of load balancers:

  1. Layer 4
    • Act upon data found in network and transport layer protocols (IP, TCP, FTP, UDP)
  2. Layer 7
    • Distributes requests based on application data (HTTP, HTTPS, FTP, SMTP)

Standard algorithms for load balancing:

  • Round Robin
  • Least Connections
  • Least Response Time
  • Weighted Round Robin

IV. Databases

Refer to the databases page for more details.

V. Caches

A temporary storage with smaller capacity in memory with fast access time.

  1. Client Caching
  2. CDN Caching
  3. Web Server Caching
  4. Database Caching
  5. Application Caching

VI. File System Storage

A file is an unstructured collection of records. Two basic formats:

  • Block storage - Organize data in blocks on disk
  • Object storage
    • Amazon S3
    • Highlty scalable

VII. Message Queues

Lets services to communicate with each other asynchronously. It is a buffer between the producer and consumer.

  1. Producer - The service that produces the message and writes it to the queue
  2. Consumer - The service that consumes the message from the queue and reads it from the queue

Example: RabbitMQ, Kafka