Overview for some Azure Services (Part 1)

Event Hub

Characteristics

Big data event streaming service
To help in the process where you are continuously sending data to your service
Scalable up to terabytes of data and millions of events per second
Reliable with Zero data loss
Supports multiple protocols and SDKs

Basics

Event producers
- Application and services that will send an event to Event Hub
Event Hub divided into partitions from 1 to 32, partitions are not changeable after creation
Event Hub namespace is the logical container you create in the portal to hold several Event Hubs and it has shared properties like throughput, access policy, cost, etc.
Pricing tier
- Basic, only gives you 1 consumer group
- Standard gives you 20 consumer groups
Throughput units
- From 1 to 20
- It indicates the performance of the Event Hub or how many messages you can process for the Event Hub.
- You can request to increase that up to 40 or create an Event Hub dedicated cluster.
Shared access policy on Event Hub namespace will be shared across all Event Hubs in this Event Hub namespace but shared access policy for Event Hub will be only for this Event Hub

Sending Events

You can create a .Net app that uses Event Hub package or SDK so, you can send events (msgs) to Event Hub
You need to create an Access Policy with Send policy and use its connection string in the .Net app
You can use other services like Logic App, TSI to send Event to Event Hub

Event Consumer

A consumer group is a unique view of Event Hub data
You can create a .Net app that uses Event Hub package or SDK so, you can get events (msgs) from Event Hub
You need to create an Access Policy with Listen policy and use its connection string in the .Net app
Offset is the position on Event Hub
A checkpoint is progress of saving offset on the client-side
In the .NET app you can use Storage Account so you can save the checkpoints

Event Capture

Store the Event data on Azure storage more than the maximum retention policy duration (7 days)

Databricks

Components

Workspaces
Shared
- Folder 1
- Folder 2
Users
- User A
  - Folder 1
  - Folder 2
- User B
  - Folder 1
  - Folder 2

Inside the folders, you usually put the Notebooks.

Workflows
- We can organize scripts to call other scripts inside what we call workflows
Clusters
- Cluster A
- Cluster B
- Cluster C
Jobs
- Job A
- Job B
- Job C
Experiment
- Experiment A
- Experiment B
- Experiment C

When you create an Azure Databricks cluster, there is a new resource group created that holds 2 VMs as nodes for your cluster, one is a master node that orchestrator other nodes, and the second is the worker node that does the job. Each Notebook divided into sections called cmds or cells you can run each cmd/cell separately.

Azure Data Factory

Azure Data Factory is a cloud-based data integration service that allows creating workflows in the cloud for orchestrating and automating data movement and data transformation or ETL (Extracting, Transformation, and Loading). ADF does not store data. It just creates and designs workflows to orchestrate the movement of data between supported data stores and processing of data using compute.

Components

Pipelines
Link Services
Datasets
IR (Integration Runtime)
Triggers
Monitor pipelines execution