DNS Resolution: Hành trình từ Domain Name đến IP Address


Giới thiệu

Bạn gõ google.com vào trình duyệt, nhấn Enter, và website xuất hiện trong chớp mắt. Có vẻ đơn giản phải không? Nhưng đằng sau đó là một trong những hệ thống phân tán phức tạp nhất từng được xây dựng - Domain Name System (DNS).

Mỗi giây, có hàng triệu DNS queries xảy ra trên toàn cầu. Nó là “danh bạ điện thoại của Internet” - dịch từ tên domain mà con người dễ nhớ sang địa chỉ IP mà máy tính hiểu được. Và điều kỳ diệu là: nó diễn ra trong chưa đầy 100ms, với availability 99.999%.

Trong 10 năm làm việc với infrastructure và DevOps, tôi đã gặp vô số incidents liên quan đến DNS - từ misconfiguration đơn giản cho đến sophisticated DDoS attacks. Hiểu DNS không chỉ giúp bạn debug nhanh hơn, mà còn design systems scalable hơn.

Hôm nay, chúng ta sẽ đi sâu vào từng bước của DNS resolution, từ browser cache cho đến root servers, và học cách optimize performance.

Khái niệm Cốt lõi

DNS: Hệ thống Phân cấp Phân tán

DNS (Domain Name System) là một distributed, hierarchical database dịch domain names thành IP addresses.

Analogy của Hệ thống Bưu điện:

Imagine bạn muốn gửi thư cho “John Smith” ở Việt Nam:

John Smith
123 Main Street
Ward 5, District 1
Ho Chi Minh City
Vietnam

Bưu điện xử lý từng level:

  1. Vietnam → Route đến Việt Nam (Root Server)
  2. Ho Chi Minh City → Route đến TP.HCM (TLD Server)
  3. District 1 → Route đến Quận 1 (Authoritative Server)
  4. Ward 5, 123 Main Street → Tìm địa chỉ chính xác
  5. John Smith → Giao thư

DNS hoạt động tương tự, nhưng ngược lại (từ phải sang trái)!

Cấu trúc Domain Name

www.example.com.
 │    │      │   └─ Root (.)
 │    │      └───── Top-Level Domain (TLD)
 │    └──────────── Second-Level Domain (SLD)
 └───────────────── Subdomain

Các loại TLD:

  • Generic TLD (gTLD): .com, .net, .org, .edu
  • Country Code TLD (ccTLD): .vn, .uk, .jp, .de
  • New gTLD: .tech, .dev, .app, .xyz

Fully Qualified Domain Name (FQDN): www.example.com. (chú ý dấu chấm cuối - root!)

DNS Record Types

Các loại DNS records phổ biến:

Record TypePurposeExample
AIPv4 addressexample.com → 93.184.216.34
AAAAIPv6 addressexample.com → 2606:2800:220:1:…
CNAMECanonical name (alias)www → example.com
MXMail servermail → mail.example.com (priority 10)
TXTText data (SPF, DKIM)“v=spf1 include:_spf.google.com”
NSName serverns1.example.com
SOAStart of AuthorityPrimary NS, admin email, serial
PTRReverse lookup34.216.184.93 → example.com
SRVService location_sip._tcp → server:port
CAACertificate Authorityissue “letsencrypt.org”

Technical Deep Dive

DNS Resolution Process: 8 Bước Chi tiết

User Browser                    Resolver Cache
    │                               │
    │  1. Query: google.com        │
    │─────────────────────────────>│
    │                               │  2. Check cache
    │                               │     Not found!
    │                               │
    │                               │  3. Query Root Server
    │                               │─────────────────────────>
    │                               │                         Root Server
    │                               │  4. Response: .com TLD   (.com, .net, .org...)
    │                               │<─────────────────────────┘
    │                               │
    │                               │  5. Query TLD Server
    │                               │─────────────────────────>
    │                               │                         TLD Server
    │                               │  6. Response: NS for     (Manages .com domains)
    │                               │     google.com
    │                               │<─────────────────────────┘
    │                               │
    │                               │  7. Query Authoritative
    │                               │─────────────────────────>
    │                               │                         Authoritative NS
    │                               │  8. Response: IP address (google.com servers)
    │                               │     142.250.185.46
    │                               │<─────────────────────────┘
    │                               │
    │  9. Return: 142.250.185.46   │
    │<─────────────────────────────┘

    │  10. Connect to IP
    │──────────────────────────────────────────────────>
                                                        Google Server

[Insert Diagram: DNS Resolution Flow]

Bước-by-Bước Breakdown

1. Browser Cache Check:

// Pseudo-code của browser DNS cache
const browserCache = new Map<string, DNSCacheEntry>();

interface DNSCacheEntry {
  domain: string;
  ip: string;
  ttl: number;
  timestamp: number;
}

function resolveDomain(domain: string): string | null {
  const cached = browserCache.get(domain);
  
  if (cached) {
    const age = Date.now() - cached.timestamp;
    if (age < cached.ttl * 1000) {
      console.log('✅ Browser cache hit!');
      return cached.ip;
    } else {
      console.log('⏰ Cache expired, removing...');
      browserCache.delete(domain);
    }
  }
  
  console.log('❌ Cache miss, querying OS...');
  return null;
}

Check browser cache trong Chrome:

chrome://net-internals/#dns

2. OS Cache Check:

# Linux/Mac - check system DNS cache
sudo systemd-resolve --statistics   # systemd-resolved
dscacheutil -cachedump -entries     # macOS

# Windows
ipconfig /displaydns

3-4. Root Server Query:

13 root server clusters trên thế giới (A-M):

a.root-servers.net  (Verisign)
b.root-servers.net  (USC-ISI)
c.root-servers.net  (Cogent)
...
m.root-servers.net  (WIDE Project)

Root server chỉ biết TLD servers, không biết google.com là gì!

Query: google.com?
Response: "Tôi không biết google.com, nhưng .com TLD server ở đây:
  a.gtld-servers.net (192.5.6.30)
  b.gtld-servers.net (192.33.14.30)
  ..."

5-6. TLD Server Query:

TLD server cho .com manage hàng triệu domains:

Query: google.com?
Response: "Authoritative NS cho google.com là:
  ns1.google.com (216.239.32.10)
  ns2.google.com (216.239.34.10)
  ns3.google.com (216.239.36.10)
  ns4.google.com (216.239.38.10)"

7-8. Authoritative Server Query:

Đây là server “chính thức” cho google.com:

Query: google.com A record?
Response: "142.250.185.46
  TTL: 300 seconds
  Additional: IPv6 address, MX records, etc."

DNS Query Types

1. Recursive Query: Client → Resolver: “Tìm google.com cho tôi, tôi sẽ đợi!”

2. Iterative Query: Resolver → Root/TLD: “Bạn biết google.com không?” Root: “Không, nhưng hỏi server này đi”

3. Non-recursive Query: Resolver có cache rồi, trả lời ngay!

DNS Packet Structure

DNS Header (12 bytes):

 0  1  2  3  4  5  6  7  8  9  10 11 12 13 14 15
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                      ID                       |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|QR|   Opcode  |AA|TC|RD|RA|   Z    |   RCODE   |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                    QDCOUNT                    |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                    ANCOUNT                    |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                    NSCOUNT                    |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+
|                    ARCOUNT                    |
+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+--+

Fields:
- QR: Query (0) or Response (1)
- RD: Recursion Desired
- RA: Recursion Available
- RCODE: Response Code (0=success, 3=NXDOMAIN)

Code Implementation

Ví dụ 1: DNS Resolver từ đầu (Python)

# simple_dns_resolver.py
import socket
import struct
from typing import List, Tuple

class DNSResolver:
    def __init__(self, dns_server: str = '8.8.8.8'):
        """
        Initialize DNS resolver
        Default: Google Public DNS (8.8.8.8)
        """
        self.dns_server = dns_server
        self.dns_port = 53
        self.sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
        self.sock.settimeout(5.0)
    
    def build_dns_query(self, domain: str, query_type: int = 1) -> bytes:
        """
        Build DNS query packet
        query_type: 1=A, 28=AAAA, 15=MX, 16=TXT
        """
        # Transaction ID (random)
        transaction_id = struct.pack('!H', 0x1234)
        
        # Flags: Standard query, recursion desired
        flags = struct.pack('!H', 0x0100)
        
        # Questions: 1, Answers: 0, Authority: 0, Additional: 0
        qdcount = struct.pack('!H', 1)
        ancount = struct.pack('!H', 0)
        nscount = struct.pack('!H', 0)
        arcount = struct.pack('!H', 0)
        
        # DNS Header (12 bytes)
        header = transaction_id + flags + qdcount + ancount + nscount + arcount
        
        # Question Section
        question = b''
        for part in domain.split('.'):
            question += struct.pack('!B', len(part))
            question += part.encode()
        question += b'\x00'  # End of domain
        
        # Query type (A record) and class (IN)
        question += struct.pack('!HH', query_type, 1)
        
        return header + question
    
    def parse_dns_response(self, response: bytes) -> List[str]:
        """
        Parse DNS response packet
        """
        # Skip header (12 bytes)
        offset = 12
        
        # Skip question section
        while response[offset] != 0:
            offset += response[offset] + 1
        offset += 5  # Null byte + type + class
        
        # Parse answer section
        answers = []
        ancount = struct.unpack('!H', response[6:8])[0]
        
        for _ in range(ancount):
            # Skip name (compression pointer)
            if (response[offset] & 0xC0) == 0xC0:
                offset += 2
            else:
                while response[offset] != 0:
                    offset += response[offset] + 1
                offset += 1
            
            # Parse type, class, TTL, data length
            rtype, rclass, ttl, rdlength = struct.unpack(
                '!HHIH', 
                response[offset:offset+10]
            )
            offset += 10
            
            # Parse A record (IPv4)
            if rtype == 1 and rdlength == 4:
                ip = '.'.join(str(b) for b in response[offset:offset+4])
                answers.append(ip)
            
            offset += rdlength
        
        return answers
    
    def resolve(self, domain: str) -> List[str]:
        """
        Resolve domain to IP addresses
        """
        print(f"🔍 Resolving {domain}...")
        
        # Build query
        query = self.build_dns_query(domain)
        
        # Send query
        self.sock.sendto(query, (self.dns_server, self.dns_port))
        print(f"📤 Query sent to {self.dns_server}")
        
        # Receive response
        response, _ = self.sock.recvfrom(512)
        print(f"📩 Response received ({len(response)} bytes)")
        
        # Parse response
        ips = self.parse_dns_response(response)
        
        return ips
    
    def close(self):
        self.sock.close()

# Usage
if __name__ == "__main__":
    resolver = DNSResolver()
    
    domains = ['google.com', 'github.com', 'stackoverflow.com']
    
    for domain in domains:
        try:
            ips = resolver.resolve(domain)
            print(f"✅ {domain} resolves to:")
            for ip in ips:
                print(f"   - {ip}")
            print()
        except Exception as e:
            print(f"❌ Error resolving {domain}: {e}\n")
    
    resolver.close()

Ví dụ 2: DNS với Caching (TypeScript/Node.js)

// dns-resolver-with-cache.ts
import * as dns from 'dns';
import { promisify } from 'util';

const dnsResolve4 = promisify(dns.resolve4);
const dnsResolve6 = promisify(dns.resolve6);

interface DNSCacheEntry {
  ips: string[];
  timestamp: number;
  ttl: number;
}

class DNSResolverWithCache {
  private cache: Map<string, DNSCacheEntry> = new Map();
  private stats = {
    cacheHits: 0,
    cacheMisses: 0,
    totalQueries: 0,
  };

  async resolve(domain: string, ttl: number = 300): Promise<string[]> {
    this.stats.totalQueries++;

    // Check cache first
    const cached = this.cache.get(domain);
    if (cached) {
      const age = (Date.now() - cached.timestamp) / 1000;
      
      if (age < cached.ttl) {
        this.stats.cacheHits++;
        console.log(`✅ Cache hit for ${domain} (age: ${age.toFixed(1)}s)`);
        return cached.ips;
      } else {
        console.log(`⏰ Cache expired for ${domain}`);
        this.cache.delete(domain);
      }
    }

    // Cache miss - query DNS
    this.stats.cacheMisses++;
    console.log(`❌ Cache miss for ${domain}, querying DNS...`);

    try {
      const startTime = Date.now();
      
      // Try IPv4
      const ips = await dnsResolve4(domain);
      
      const queryTime = Date.now() - startTime;
      console.log(`📡 DNS query completed in ${queryTime}ms`);

      // Store in cache
      this.cache.set(domain, {
        ips,
        timestamp: Date.now(),
        ttl,
      });

      return ips;
    } catch (error) {
      throw new Error(`DNS resolution failed: ${error.message}`);
    }
  }

  async resolveMultiple(domains: string[]): Promise<Map<string, string[]>> {
    console.log(`\n🔍 Resolving ${domains.length} domains...\n`);

    const results = new Map<string, string[]>();

    // Resolve all domains in parallel
    await Promise.all(
      domains.map(async (domain) => {
        try {
          const ips = await this.resolve(domain);
          results.set(domain, ips);
        } catch (error) {
          console.error(`❌ Failed to resolve ${domain}: ${error.message}`);
          results.set(domain, []);
        }
      })
    );

    return results;
  }

  clearCache() {
    this.cache.clear();
    console.log('🧹 Cache cleared');
  }

  getStats() {
    const hitRate = this.stats.totalQueries > 0
      ? (this.stats.cacheHits / this.stats.totalQueries) * 100
      : 0;

    return {
      ...this.stats,
      hitRate: hitRate.toFixed(2) + '%',
      cacheSize: this.cache.size,
    };
  }

  printStats() {
    const stats = this.getStats();
    console.log('\n📊 DNS Resolver Statistics:');
    console.log(`   Total Queries: ${stats.totalQueries}`);
    console.log(`   Cache Hits: ${stats.cacheHits}`);
    console.log(`   Cache Misses: ${stats.cacheMisses}`);
    console.log(`   Hit Rate: ${stats.hitRate}`);
    console.log(`   Cache Size: ${stats.cacheSize} entries\n`);
  }
}

// Usage
async function main() {
  const resolver = new DNSResolverWithCache();

  const domains = [
    'google.com',
    'github.com',
    'stackoverflow.com',
    'reddit.com',
    'youtube.com',
  ];

  // First resolution (all cache misses)
  console.log('=== First Resolution ===');
  let results = await resolver.resolveMultiple(domains);
  
  for (const [domain, ips] of results) {
    console.log(`\n${domain}:`);
    ips.forEach(ip => console.log(`  - ${ip}`));
  }

  resolver.printStats();

  // Second resolution (all cache hits!)
  console.log('\n=== Second Resolution (should hit cache) ===');
  results = await resolver.resolveMultiple(domains);
  
  resolver.printStats();

  // Resolve again after 2 seconds
  console.log('\n⏳ Waiting 2 seconds...\n');
  await new Promise(resolve => setTimeout(resolve, 2000));
  
  console.log('=== Third Resolution ===');
  await resolver.resolve('google.com', 1); // TTL = 1 second
  
  resolver.printStats();
}

main().catch(console.error);

Ví dụ 3: DNS Load Testing

# dns_load_test.py
import time
import socket
import threading
from typing import List
import statistics

class DNSLoadTester:
    def __init__(self, domains: List[str], dns_server: str = '8.8.8.8'):
        self.domains = domains
        self.dns_server = dns_server
        self.results = []
        self.lock = threading.Lock()
    
    def query_dns(self, domain: str) -> float:
        """Single DNS query với timing"""
        start = time.perf_counter()
        
        try:
            socket.getaddrinfo(domain, None)
            elapsed = (time.perf_counter() - start) * 1000  # ms
            return elapsed
        except Exception as e:
            return -1
    
    def worker(self, worker_id: int, iterations: int):
        """Worker thread thực hiện queries"""
        for i in range(iterations):
            for domain in self.domains:
                latency = self.query_dns(domain)
                
                with self.lock:
                    self.results.append({
                        'worker': worker_id,
                        'domain': domain,
                        'latency': latency,
                        'timestamp': time.time()
                    })
    
    def run_test(self, num_workers: int = 10, iterations: int = 10):
        """Run load test"""
        print(f"🚀 Starting DNS load test")
        print(f"   Workers: {num_workers}")
        print(f"   Iterations per worker: {iterations}")
        print(f"   Domains: {len(self.domains)}")
        print(f"   Total queries: {num_workers * iterations * len(self.domains)}")
        print()
        
        start_time = time.time()
        
        # Create threads
        threads = []
        for i in range(num_workers):
            t = threading.Thread(target=self.worker, args=(i, iterations))
            threads.append(t)
            t.start()
        
        # Wait for completion
        for t in threads:
            t.join()
        
        elapsed = time.time() - start_time
        
        # Analyze results
        self.print_results(elapsed)
    
    def print_results(self, total_time: float):
        """Print test results"""
        successful = [r['latency'] for r in self.results if r['latency'] > 0]
        failed = len([r for r in self.results if r['latency'] < 0])
        
        if successful:
            print(f"\n📊 Load Test Results:")
            print(f"   Total Time: {total_time:.2f}s")
            print(f"   Total Queries: {len(self.results)}")
            print(f"   Successful: {len(successful)}")
            print(f"   Failed: {failed}")
            print(f"   QPS: {len(self.results) / total_time:.2f}")
            print()
            print(f"   Latency Stats:")
            print(f"     Min: {min(successful):.2f}ms")
            print(f"     Max: {max(successful):.2f}ms")
            print(f"     Avg: {statistics.mean(successful):.2f}ms")
            print(f"     Median: {statistics.median(successful):.2f}ms")
            print(f"     P95: {sorted(successful)[int(len(successful) * 0.95)]:.2f}ms")
            print(f"     P99: {sorted(successful)[int(len(successful) * 0.99)]:.2f}ms")

# Usage
if __name__ == "__main__":
    domains = [
        'google.com',
        'youtube.com',
        'facebook.com',
        'amazon.com',
        'wikipedia.org'
    ]
    
    tester = DNSLoadTester(domains)
    tester.run_test(num_workers=20, iterations=50)

Performance và Optimization

1. TTL (Time To Live) Strategy

# Check TTL của một domain
dig +nocmd google.com +noall +answer

# Output:
# google.com.    300    IN    A    142.250.185.46
#                ^^^--- TTL = 300 seconds (5 minutes)

TTL Trade-offs:

TTL ValueProsCons
Low (60s)Fast propagation, easy rollbackHigh DNS load, more latency
Medium (300s)BalancedStandard choice
High (86400s)Minimal DNS queriesSlow propagation (24h!)

2. DNS Prefetching

<!-- Prefetch DNS cho external resources -->
<link rel="dns-prefetch" href="//cdn.example.com">
<link rel="dns-prefetch" href="//api.example.com">
<link rel="dns-prefetch" href="//analytics.google.com">

<!-- Browser sẽ resolve các domains này sớm! -->

3. DNS over HTTPS (DoH)

import requests

def dns_query_doh(domain: str, dns_server: str = 'https://1.1.1.1/dns-query'):
    """
    Query DNS over HTTPS (encrypted!)
    """
    params = {
        'name': domain,
        'type': 'A'
    }
    
    headers = {
        'Accept': 'application/dns-json'
    }
    
    response = requests.get(dns_server, params=params, headers=headers)
    data = response.json()
    
    ips = [answer['data'] for answer in data.get('Answer', [])]
    return ips

# Usage
ips = dns_query_doh('google.com')
print(f"IPs via DoH: {ips}")

Benefits:

  • ✅ Encrypted (không thể sniff)
  • ✅ Bypass DNS censorship
  • ✅ Privacy protection

Kết luận

DNS là một trong những hệ thống critical nhất của Internet, nhưng cũng là một trong những hệ thống ít được chú ý nhất - cho đến khi nó break!

Key Takeaways:

  1. Hierarchical system: Root → TLD → Authoritative
  2. Multiple layers of caching: Browser → OS → Resolver → Authoritative
  3. DNS uses UDP port 53 - fast but unreliable (TCP fallback nếu > 512 bytes)
  4. TTL is critical: Balance between freshness and performance
  5. Security matters: DNSSEC, DoH, DoT để protect against attacks

Lời khuyên từ kinh nghiệm:

“DNS is often forgotten until it breaks. Monitor it, cache it, and have a backup plan. 90% of ‘website down’ issues I’ve seen started with DNS.”

Best Practices:

  • ✅ Use múltiple name servers (minimum 2, ideally 4+)
  • ✅ Monitor DNS resolution latency
  • ✅ Set appropriate TTLs (300s is good default)
  • ✅ Implement DNS caching ở application layer
  • ✅ Use CDN với anycast DNS
  • ✅ Consider DNS failover strategies
  • ✅ Enable DNSSEC for security

Remember: DNS là single point of failure. Invest time để hiểu và monitor nó!


Tài liệu tham khảo:

  • RFC 1034, 1035: Domain Names - Implementation and Specification
  • RFC 7871: Client Subnet in DNS Queries
  • DNS Performance Testing Tools: dnsperf, dnstop
  • Public DNS Servers: 8.8.8.8 (Google), 1.1.1.1 (Cloudflare)

Về tác giả: Senior Software Engineer với 10+ năm kinh nghiệm infrastructure và DevOps. Đam mê distributed systems và performance optimization.