The Backend Engineering Show with Hussein Nasser

Hussein Nasser

About

Welcome to the Backend Engineering Show podcast with your host Hussein Nasser. If you like software engineering you’ve come to the right place. I discuss all sorts of software engineering technologies and news with specific focus on the backend. All opinions are my own.

Most of my content in the podcast is an audio version of videos I post on my youtube channel here http://www.youtube.com/c/HusseinNasser-software-engineering

Buy me a coffee
https://www.buymeacoffee.com/hnasr

🧑‍🏫 Courses I Teach
https://husseinnasser.com/courses

Available on

Community

512 episodes

Google Patches Linux kernel with 40% TCP performance

The Backend Engineering Show with Hussein Nasser

Get my backend course https://backend.win Google submitted a patch to Linux Kernel 6.8 to improve TCP performance by 40%, this is done via rearranging the tcp structures for better cpu cache lines, I explore this here. 0:00 Intro 0:30 Google improves Linux Kernel TCP by 40% 1:40 How CPU Cache Line Works 6:45 Reviewing the Google Patch https://www.phoronix.com/news/Linux-6.8-Networking https://lore.kernel.org/netdev/20231129072756.3684495-1-lixiaoyan@google.com/ Discovering Backend Bottlenecks: Unlocking Peak Performance https://performance.husseinnasser.com

13m

Mar 05, 2024

Database Torn pages

The Backend Engineering Show with Hussein Nasser

0:00 Intro 2:00 File System Block vs Database Pages 4:00 Torn pages or partial page 7:40 How Oracle Solves torn pages 8:40 MySQL InnoDB Doublewrite buffer 10:45 Postgres Full page writes

15m

Feb 29, 2024

Cloudflare Open sources Pingora (NGINX replacement)

The Backend Engineering Show with Hussein Nasser

Get my backend course https://backend.win Cloudflare has announced they are opening sources Pingora as a networking framework! Big news, let us discuss 0:00 Intro 0:30 Reasons why Cloudflare built Pingora? 3:00 It is a framework! 7:30 What in Pingora? 11:50 Security in Pingora 13:45 Multi-threading in Pingora 21:00 Customization vs Configuration 25:00 Summary ⁠https://blog.cloudflare.com/pingora-open-source/?utm_campaign=cf_blog&utm_content=20240228&utm_medium=organic_social&utm_source=twitter⁠ https://blog.cloudflare.com/pingora-open-source/?utm_campaign=cf_blog&utm_content=20240228&utm_medium=organic_social&utm_source=twitter

31m

Feb 28, 2024

The Internals of MongoDB

The Backend Engineering Show with Hussein Nasser

https://backend.win https://databases.win I’m a big believer that database systems share similar core fundamentals at their storage layer and understanding them allows one to compare different DBMS objectively. For example, How documents are stored in MongoDB is no different from how MySQL or PostgreSQL store rows. Everything goes to pages of fixed size and those pages are flushed to disk. Each database define page size differently based on their workload, for example MongoDB default page size is 32KB, MySQL InnoDB is 16KB and PostgreSQL is 8KB. The trick is to fetch what you need from disk efficiently with as fewer I/Os as possible, the rest is API. In this video I discuss the evolution of MongoDB internal architecture on how documents are stored and retrieved focusing on the index storage representation. I assume the reader is well versed with fundamentals of database engineering such as indexes, B+Trees, data files, WAL etc, you may pick up my database course to learn the skills. Let us get started.

44m

Feb 19, 2024

The Beauty of Programming Languages

The Backend Engineering Show with Hussein Nasser

In this video I explore the type of languages, compiled, garbage collected, interpreted, JIT and more.

17m

Feb 19, 2024

The Danger of Defaults - A PostgreSQL Story

The Backend Engineering Show with Hussein Nasser

I talk about default values and how PostgreSQL 14 got slower when a default parameter has changed. Mike's blog https://smalldatum.blogspot.com/2024/02/it-wasnt-performance-regression-in.html

11m

Feb 18, 2024

Database Background writing

The Backend Engineering Show with Hussein Nasser

Background writing is a process that writes dirty pages in shared buffer to the disk (well goes to the OS file cache then get flushed to disk by the OS) I go into this process in this video

Feb 16, 2024

The Cost of Memory Fragmentation

The Backend Engineering Show with Hussein Nasser

Fragmentation is a very interesting topic to me, especially when it comes to memory. While virtually memory does solve external fragmentation (you can still allocate logically contiguous memory in non-contiguous physical memory) it does however introduce performance delays as we jump all over the physical memory to read what appears to us for example as contiguous array in virtual memory. You see, DDR RAM consists of banks, rows and columns. Each row has around 1024 columns and each column has 64 bits which makes a row around 8kib. The cost of accessing the RAM is the cost of “opening” a row and all its columns (around 50-100 ns) once the row is opened all the columns are opened and the 8 kib is cached in the row buffer in the RAM. The CPU can ask for an address and transfer 64 bytes at a time (called bursts) so if the CPU (or the MMU to be exact) asks for the next 64 bytes next to it, it comes at no cost because the entire row is cached in the RAM. However if the CPU sends a different address in a different row the old row must be closed and a new row should be opened taking an additional 50 ns hit. So spatial access of bytes ensures efficiency, So fragmentation does hurt performance if the data you are accessing are not contiguous in physical memory (of course it doesn’t matter if it is contiguous in virtual memory). This kind of remind me of the old days of HDD and how the disk needle physically travels across the disk to read one file which prompted the need of “defragmentation” , although RAM access (and SSD NAND for that matter) isn’t as bad. Moreover, virtual memory introduces internal fragmentation because of the use of fixed-size blocks (called pages and often 4kib in size), and those are mapped to frames in physical memory. So if you want to allocate a 32bit integer (4 bytes) you get a 4 kib worth of memory, leaving a whopping 4092 allocated for the process but unused, which cannot be used by the OS. These little pockets of memory can add up as many processes. Another reason developers should take care when allocating memory for efficiency.

39m

Jan 29, 2024

The Real Hidden Cost of a Request

The Backend Engineering Show with Hussein Nasser

In this video I explore the hidden costs of sending a request from the frontend to the backend Heard https://medium.com/@hnasr/the-journey-of-a-request-to-the-backend-c3de704de223

13m

Dec 13, 2023

Why create Index blocks writes

The Backend Engineering Show with Hussein Nasser

Fundamentals of Database Engineering udemy course (link redirects to udemy with coupon) https://database.husseinnasser.com Why create Index blocks writes In this video I explore how create index, why does it block writes and how create index concurrently work and allow writes. 0:00 Intro 1:28 How Create Index works 4:45 Create Index blocking Writes 5:00 Create Index Concurrently

12m

Oct 28, 2023

Consider this before migrating the Backend to HTTP/3

The Backend Engineering Show with Hussein Nasser

HTTP/3 is getting popular in the cloud scene but before you migrate to HTTP/3 consider its cost. I explore it here. 0:00 Intro HTTP/3 is getting popular 3:40 HTTP/1.1 Cost 5:18 HTTP/2 Cost 6:30 HTTP/3 Cost https://blog.apnic.net/2023/09/25/why-http-3-is-eating-the-world/

12m

Oct 05, 2023

Encrypted Client Hello - The Pros & Cons

The Backend Engineering Show with Hussein Nasser

The Encrypted Client Hello or ECH is a new RFC that encrypts the TLS client hello to hide sensitive information like the SNI. In this video I go through pros and cons of this new rfc. 0:00 Intro 2:00 SNI 4:00 Client Hello 8:40 Encrypted Client Hello 11:30 Inner Client Hello Encryption 18:00 Client-Facing Outer SNI 21:20 Decrypting Inner Client Hello 23:30 Disadvantages 26:00 Censorship vs Privacy ECH https://blog.cloudflare.com/announcing-encrypted-client-hello/ https://chromestatus.com/feature/6196703843581952

33m

Sep 29, 2023

The Journey of a Request to the Backend

The Backend Engineering Show with Hussein Nasser

From the frontend through the kernel to the backend process When we send a request to a backend most of us focus on the processing aspect of the request which is really just the last step. There is so much more happening before a request is ready to be processed, most of this step happens in the Kernel. I break this into 6 steps, each step can theoretically be executed by a dedicated thread or process. Pretty much all backends, web servers, proxies, frameworks and even databases have to do all these steps and they all do choose to do it differently. Grab my backend performance course https://performance.husseinnasser.com 0:00 Intro 3:50 What is a Request? 10:14 Step 1 - Accept 21:30 Step 2 - Read 29:30 Step 3 - Decrypt 34:00 Step 4 - Parse 40:36 Step 5 - Decode 43:14 Step 6 - Process Medium article https://medium.com/@hnasr/the-journey-of-a-request-to-the-backend-c3de704de223

52m

Aug 01, 2023

They Enabled Postgres Partitioning and their Backend fell apart

The Backend Engineering Show with Hussein Nasser

In a wonderful blog, Kyle explores the pains he faced managing a Postgres instance for a startup he works for and how enabling partitioning sigintfically created wait events causing the backend and subsequently NGINX to through 500 errors. We discuss this in this video/podcast https://www.kylehailey.com/post/postgres-partition-pains-lockmanager-waits

32m

Jun 24, 2023

WebTransport - A Backend Game Changer

The Backend Engineering Show with Hussein Nasser

WebTransport is a cutting-edge protocol framework designed to support multiplexed and secure transport over HTTP/2 and HTTP/3. It brings together the best of web and transport technologies, providing an all-in-one solution for real-time, bidirectional communication on the web. Watch full episode (subscribers only) https://spotifyanchor-web.app.link/e/cTSGkq5XuAb

15m

Jun 09, 2023

Your SSD lies but that's ok | Postgres fsync

The Backend Engineering Show with Hussein Nasser

fsync is a linux system call that flushes all pages and metadata for a given file to the disk. It is indeed an expensive operation but required for durability especially for database systems. Regular writes that make it to the disk controller are often placed in the SSD local cache to accumulate more writes before getting flushed to the NAND cells. However when the disk controller receives this flush command it is required to immediately persist all of the data to the NAND cells. Some SSDs however don't do that because they don't trust the host and no-op the fsync. In this video I explain this in details and go through details on how postgres provide so many options to fine tune fsync 0:00 Intro 1:00 A Write doesn’t write 2:00 File System Page Cache 6:00 Fsync 7:30 SSD Cache 9:20 SSD ignores the flush 9:30 15 Year old Firefox fsync bug 12:30 What happens if SSD loses power 15:00 What options does Postgres exposes? 15:30 open_sync (O_SYNC) 16:15 open_datasync (O_DSYNC) 17:10 O_DIRECT 19:00 fsync 20:50 fdatasync 21:13 fsync = off 23:30 Don’t make your API simple 26:00 Database on metal?

30m

May 25, 2023

The problem with software engineering

The Backend Engineering Show with Hussein Nasser

ego is the main problem to a defective software product. the ego of the engineer or the tech lead seeps into the quality of the product. Fundamentals of Backend Engineering Design patterns udemy course (link redirects to udemy with coupon) https://backend.husseinnasser.com

17m

May 21, 2023

2x Faster Reads and Writes with this MongoDB feature | Clustered Collections

The Backend Engineering Show with Hussein Nasser

Fundamentals of Database Engineering udemy course (link redirects to udemy with coupon) https://database.husseinnasser.com In version 5.3, MongoDB introduced a feature called clustered collection which stores documents in the _id index as oppose to the hidden wiredTiger hidden index. This eliminates an entire b+tree seek for reads using the _id index and also removes the additional write to the hidden index speeding both reads and writes. However like we know in software engineering, everything has a cost. This feature does come with a few that one must be aware of before using it. In this video I discuss the following __ __

27m

May 11, 2023

Prime Video Swaps Microservices for Monolith: 90% Cost Reduction

The Backend Engineering Show with Hussein Nasser

Prime video engineering team has posted a blog detailing how they moved their live stream monitoring service from microservices to a monolith reducing their cost by 90%, let us discuss this 0:00 Intro 2:00 Overview 10:35 Distributed System Overhead 21:30 From Microservices to Monolith 29:00 Scaling the Monolith 32:30 Takeaways https://www.primevideotech.com/video-streaming/scaling-up-the-prime-video-audio-video-monitoring-service-and-reducing-costs-by-90 Fundamentals of Backend Engineering Design patterns udemy course (link redirects to udemy with coupon) https://backend.husseinnasser.com

35m

May 06, 2023

A Deep Dive in How Slow SELECT * is

The Backend Engineering Show with Hussein Nasser

Fundamentals of Database Engineering udemy course (link redirects to udemy with coupon) https://database.husseinnasser.com In a row-store database engine, rows are stored in units called pages. Each page has a fixed header and contains multiple rows, with each row having a record header followed by its respective columns. When the database fetches a page and places it in the shared buffer pool, we gain access to all rows and columns within that page. So, the question arises: if we have all the columns readily available in memory, why would SELECT * be slow and costly? Is it really as slow as people claim it to be? And if so why is it so? In this post, we will explore these questions and more. 0:00 Intro 1:49 Database Page Layout 5:00 How SELECT Works 10:49 No Index-Only Scans 18:00 Deserialization Cost 21:00 Not All Columns are Inline 28:00 Network Cost 36:00 Client Deserialization https://medium.com/@hnasr/how-slow-is-select-8d4308ca1f0c

39m

May 02, 2023

AWS Serverless Lambda Supports Response Streaming

The Backend Engineering Show with Hussein Nasser

Lambda now supports Response payload streaming, now you can flush changes to the network socket as soon as it is available and it will be written to the client socket. I think this is a game changing feature 0:00 Intro 1:00 Traditional Lambda 3:00 Server Sent Events & Chunk-Encoding 5:00 What happens to clients? 6:00 Supported Regions 7:00 My thoughts Fundamentals of Backend Engineering Design patterns udemy course (link redirects to udemy with coupon) https://backend.husseinnasser.com

13m

Apr 07, 2023

The Cloudflare mTLS vulnerability - A Deep Dive Analysis

The Backend Engineering Show with Hussein Nasser

Cloudflare released a blog detailing a vulnerability that has been in their system for nearly two years. it is related to mTLS or mutual TLS and specifically client certificate revocation. I explore this in details 0:00 Intro 3:00 The Vulnerability 7:00 What happened? 8:50 Certificate Revocation 12:30 Rejecting certain endpoints 17:00 Certificate Authentication 20:30 Certificate serial number 24:00 Session Resumption (PSK) 35:00 The bug 37:00 How they addressed the problem Fundamentals of Backend Engineering Design patterns udemy course (link redirects to udemy with coupon) https://backend.husseinnasser.com

43m

Apr 06, 2023

The Virgin Media ISP outage - What happened?

The Backend Engineering Show with Hussein Nasser

BGP (Border gateway protocol) withdrawals caused the Virgin media ISP customers to lose their Internet connection. I go into details on this video. 0:00 Intro 2:00 What happened? 4:11 How BGP works? 11:50 Version media withdrawals 15:00 Deep dive Fundamentals of Backend Engineering Design patterns udemy course (link redirects to udemy with coupon) https://backend.husseinnasser.com

23m

Apr 06, 2023

GitHub SSH key is Leaked - How bad is this?

The Backend Engineering Show with Hussein Nasser

GitHub Accidentally Exposed their SSH RSA Private key, this is the message you will get . @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! Someone could be eavesdropping on you right now (man-in-the-middle attack)! It is also possible that a host key has just been changed. The fingerprint for the RSA key sent by the remote host is SHA256:uNiVztksCsDhcc0u9e8BujQXVUpKZIDTMczCvj3tD2s. Please contact your system administrator. Add correct host key in ~/.ssh/known_hosts to get rid of this message. Host key for github.com has changed and you have requested strict checking. Host key verification failed. In this video I discuss how bad is this,. 0:00 Intro 1:10 What happened? 3:00 SSH vs TLS Authentication 6:00 SSH Connect 7:45 How bad is the github leak? 15:00 What should you do? 18:50 Is ECDSA immune? https://github.blog/2023-03-23-we-updated-our-rsa-ssh-host-key/

21m

Mar 30, 2023

Cookie Hijacking - How Linus Tech Tips got Hacked

The Backend Engineering Show with Hussein Nasser

How Linus Tech Tips channel got Hacked In this short video we explain how was it possible for Linux to get hacked with cookies hijacking. 0:00 Intro 0:47 TLDR what happened 5:10 Cookies in Chrome 7:30 Cookies Hijacking 8:46 Session Tokens (Access/Refresh) 10:00 Remedies

13m

Mar 29, 2023

All Postgres Locks Explained | A Deep Dive

The Backend Engineering Show with Hussein Nasser

Get my database engineering course https://database.husseinnasser.com In this video I do a deep dive in all locks obtained by postgres, I learned a lot while making this video and hope you enjoy it. 0:00 Intro 2:30 What are Locks? 5:30 Overview of Postgres Locks 9:10 Table-Level Locks 11:40 ACCESS EXCLUSIVE 17:40 ACCESS SHARE 19:00 ROW SHARE 20:15 ROW EXCLUSIVE 21:15 SHARE UPDATE EXCLUSIVE 23:30 SHARE 24:50 SHARE ROW EXCLUSIVE 25:18 EXCLUSIVE 25:30 Table Lock Conflict Matrix 28:30 Row-Level Locks 30:00 FOR UPDATE 33:00 FOR NO KEY UPDATE 34:00 FOR SHARE 34:40 FOR KEY SHARE 35:10 Row Lock Conflict Matrix 39:25 Page-Level Locks 42:00 Deadlocks 46:00 Advistory Locks 47:20 Summary https://www.postgresql.org/docs/current/explicit-locking.html

49m

Mar 19, 2023

Pinterest moves to HTTP/3

The Backend Engineering Show with Hussein Nasser

Pinterest moves to HTTP/3 on all their clients and edge CDNs this year. They witnessed interesting gains but not without good lesson learned. The main one was the mismatch of alt-svc vs DNS ttls. I cover this on the next episode of the backend engineering course. 0:00 Intro 2:00 Moving h2 to h3 through alt-svc 5:00 Why HTTP/3 6:00 HTTP/1 vs HTTP/2 9:00 TCP Head of Line blocking in HTTP/2 11:00 How HTTP/3 addresses HOL 12:15 Connection Migration 13:30 Stream level congestion control 14:10 1-RTT - 0-RTT 15:41 Pinterest challenges moving HTTP/3 19:00 Migration 21:15 Future work 22:30 Summary article https://medium.com/pinterest-engineering/pinterest-is-now-on-http-3-608fb5581094 Fundamentals of Backend Engineering Design patterns udemy course (link redirects to udemy with coupon) https://backend.husseinnasser.com

25m

Mar 16, 2023

Why Loom Users got each others’ sessions on March 7th 2023

The Backend Engineering Show with Hussein Nasser

On March 7 2023, Loom users started seeing each others data as a result of cookies getting leaked from the CDN. This loom security breach is really critical. Let us discuss 0:00 Intro 1:00 Why Cookies 2:00 How this happens 5:50 What caused it? 7:30 How Loom solved it? 8:20 Reading the RCA 10:30 Remedies

14m

Mar 14, 2023

How Discord Stores Trillions of Messages - A deep dive

The Backend Engineering Show with Hussein Nasser

Discord engineering goes into details of how they migrated from Cassandra to ScyllaDB, improved the performance of their reads and writes and rearchitected their backend to support the new load. It is an interesting episode lets get into it 0:00 Intro 1:50 Relational vs Distributed 7:00 The Cassandra Troubles 11:00 SnowFlake vs UUID 14:30 B+Tree 19:20 B+Tree and SSDs 25:30 LSM Trees 31:00 Hot partitions 36:00 Cassandra Garbage Collector Pauses 40:00 Changing the Architecture 45:00 The Data Services 55:00 The Migration 1:02:00 Zoned Named Spaces 1:04:00 Summary Article here How Discord Stores Trillions of Messages https://discord.com/blog/how-discord-stores-trillions-of-messages https://discord.com/blog/how-discord-stores-trillions-of-messages

1h 9m

Mar 11, 2023

Postgres Architecture | The Backend Engineering Show

The Backend Engineering Show with Hussein Nasser

Creating a listener on the backend application that accepts connections is simple. You listen on an address-port pair, connection attempts to that address and port will get added to an accept queue; The application accepts connections from the queue and start reading the data stream sent on the connection. However, what part of your application does the accepting and what part does the reading and what part does the execution? You can architect your application in many ways based on your use cases. I have a medium post just exploring the different options. In this video I explore the PostgreSQL process architecture in details. Please note that the information here is derived from both the Postgres doc https://www.postgresql.org/docs/current/index.html and code https://github.com/postgres/postgres/tree/master/src/backend. Discussions about scalability and performance are solely based on my opinions. 0:00 Intro 1:30 Overview 3:30 Postgres MVCC 5:30 Processes vs Threads 7:40 Postmaster Process 8:00 Backend Processes 13:30 Shared Buffers 14:52 Background Workers 17:18 Auxiliary Processes 17:45 Background Writer 22:30 Checkpointer 23:40 Logger 24:06 Autovacuum Launcher and Workers 25:30 WAL Processes 28:53 Startup Process Read full article https://medium.com/@hnasr/postgresql-process-architecture-f21e16459907

34m

Feb 16, 2023

PODCAST