Python Snippets

Efficient JSON Log Parser for Large Files

This snippet demonstrates how to efficiently parse large JSON log files line by line, which is useful for processing log data without loading the entire file into memory.

import json
from typing import Iterator, Dict

def parse_large_json_log(file_path: str) -> Iterator[Dict]:
    """
    Efficiently reads and parses a large JSON log file line by line.
    Each line should be a valid JSON object (common in log formats).

    Args:
        file_path (str): Path to the JSON log file.

    Yields:
        Dict: Parsed JSON object for each line.
    """
    with open(file_path, 'r', encoding='utf-8') as file:
        for line in file:
            try:
                yield json.loads(line.strip())
            except json.JSONDecodeError:
                print(f"Skipping invalid JSON line: {line[:100]}...")

# Example usage:
if __name__ == "__main__":
    log_file = "server_logs.json"
    for log_entry in parse_large_json_log(log_file):
        print(log_entry.get('timestamp'), log_entry.get('message'))

Explanation

Why This Is Useful

Common Use Cases

How to Run

  1. Save the code to a file (e.g., json_log_parser.py).
  2. Ensure your log file contains one JSON object per line (standard in many logging setups).
  3. Run the script:
    python json_log_parser.py
    

    Replace server_logs.json with your log file path.

Example Input/Output

Input (server_logs.json):

{"timestamp": "2024-01-01T12:00:00Z", "level": "INFO", "message": "Service started"}
{"timestamp": "2024-01-01T12:01:00Z", "level": "ERROR", "message": "Disk full"}

Output:

2024-01-01T12:00:00Z Service started  
2024-01-01T12:01:00Z Disk full  

This snippet is optimized for real-world log processing and avoids common pitfalls like memory overuse.