# C++ Function/Global Parser Tool - Database Output Summary ## Overview This tool parses C++ source files using Tree-sitter to extract function and global variable information along with their memory addresses from comments. The extracted data is stored in an SQLite database for analysis and lookup purposes. ## Database Schema The tool creates an SQLite database (default: `gh.db`) with three main tables: ### 1. Functions Table ```sql CREATE TABLE Functions ( filepath TEXT, name TEXT, address TEXT, type INTEGER, PRIMARY KEY (name, filepath) ); ``` Where type is one of the following: - 0: Auto - 1: Fix - 2: Stub - 3: Ref **Purpose**: Stores function definitions that have function bodies (actual implementations) - `filepath`: Source file path where the function is defined - `name`: Function name (identifier) - `address`: 8-character hexadecimal memory address extracted from comments - **Primary Key**: Combination of name and filepath (allows same function name in different files) ### 2. Imports Table ```sql CREATE TABLE Imports ( filepath TEXT, name TEXT, address TEXT, type INTEGER, PRIMARY KEY (name, filepath) ); ``` **Purpose**: Stores function declarations without bodies (imports/forward declarations) - Same schema as Functions table - Distinguishes between function definitions and declarations - Useful for tracking external function references ### 3. Globals Table ```sql CREATE TABLE Globals ( filepath TEXT, name TEXT, address TEXT ); ``` **Purpose**: Stores global variable declarations marked with `extern` - `filepath`: Source file path where the global is declared - `name`: Global variable name (identifier) - `address`: 8-character hexadecimal memory address from comments - **No Primary Key**: Allows duplicate global names across files ## Address Format The tool extracts addresses from C++ comments using this regex pattern: ```regex //\s*([0-9a-fA-F]{8}) ``` **Expected Comment Format**: ```cpp void myFunction(); // 12345678 extern int globalVar; // ABCDEF00 ``` - Addresses must be exactly 8 hexadecimal characters - Can be uppercase or lowercase - Must be in a C++ line comment (`//`) - Whitespace after `//` is optional ## Tool Modes ### 1. Functions Mode (`-m functions`) - **Default mode** - Parses C++ files for function definitions and declarations - Populates `Functions` and `Imports` tables - Distinguishes between functions with bodies vs. declarations only ### 2. Globals Mode (`-m globals`) - Parses C++ files for `extern` global variable declarations - Populates `Globals` table - Only processes variables marked with `extern` storage class ### 3. Duplicates Mode (`-m duplicates`) - **Analysis mode** - doesn't process files - Checks existing database for duplicate addresses and names - Reports conflicts across all tables - Returns exit code 1 if duplicates found, 0 if clean ### 4. Dump-Tree Mode (`-m dump-tree`) - **Debug mode** - doesn't use database - Outputs Tree-sitter AST for debugging parsing issues - Useful for understanding how the parser interprets source code ## Data Quality Checks The tool includes built-in validation: ### Duplicate Address Detection - Scans all tables for addresses used multiple times - Reports format: `"DUPLICATE ADDRESS: {address} appears {count} times in: {entries}"` - Cross-references Functions, Imports, and Globals tables ### Duplicate Name Detection - Checks for function names appearing in multiple files - Checks for global names appearing in multiple files - Helps identify naming conflicts and potential issues ## Usage Examples ### Basic Function Extraction ```bash ./tool file1.cpp file2.cpp -d output.db -m functions ``` ### Global Variable Extraction ```bash ./tool globals.h -d output.db -m globals ``` ### Batch Processing with File List ```bash ./tool -l filelist.txt -d output.db -m functions ``` ### Quality Assurance Check ```bash ./tool -d output.db -m duplicates ``` ## Database Queries for Users ### Find Function by Name ```sql SELECT * FROM Functions WHERE name = 'functionName'; SELECT * FROM Imports WHERE name = 'functionName'; ``` ### Find All Symbols at Address ```sql SELECT 'Function' as type, name, filepath FROM Functions WHERE address = '12345678' UNION ALL SELECT 'Import' as type, name, filepath FROM Imports WHERE address = '12345678' UNION ALL SELECT 'Global' as type, name, filepath FROM Globals WHERE address = '12345678'; ``` ### List All Functions in File ```sql SELECT name, address FROM Functions WHERE filepath = 'path/to/file.cpp' ORDER BY name; ``` ### Find Functions Without Addresses ```sql SELECT name, filepath FROM Functions WHERE address = '' OR address IS NULL; ``` ### Address Range Analysis ```sql SELECT name, address, filepath FROM Functions WHERE CAST(address AS INTEGER) BETWEEN 0x10000000 AND 0x20000000 ORDER BY CAST(address AS INTEGER); ``` ## Integration Considerations - **Database Format**: Standard SQLite3 - compatible with most tools and languages - **File Paths**: Relative to the game source directory, meaning there will be gh_auto, gh_fix subfolders. (relative to the game_re folder in repo root) - **Address Format**: Always 8-character hex strings (32 bit addresses) - pad with leading zeros if needed - **Case Sensitivity**: Function/global names are case-sensitive as per C++ standards - **Unicode Support**: Handles UTF-8 encoded source files This database serves as a comprehensive symbol table for reverse engineering, debugging, and code analysis workflows.