5.4 KiB
C++ Function/Global Parser Tool - Database Output Summary
Overview
This tool parses C++ source files using Tree-sitter to extract function and global variable information along with their memory addresses from comments. The extracted data is stored in an SQLite database for analysis and lookup purposes.
Database Schema
The tool creates an SQLite database (default: gh.db
) with three main tables:
1. Functions Table
CREATE TABLE Functions (
filepath TEXT,
name TEXT,
address TEXT,
type INTEGER,
PRIMARY KEY (name, filepath)
);
Where type is one of the following:
- 0: Auto
- 1: Fix
- 2: Stub
- 3: Ref
Purpose: Stores function definitions that have function bodies (actual implementations)
filepath
: Source file path where the function is definedname
: Function name (identifier)address
: 8-character hexadecimal memory address extracted from comments- Primary Key: Combination of name and filepath (allows same function name in different files)
2. Imports Table
CREATE TABLE Imports (
filepath TEXT,
name TEXT,
address TEXT,
type INTEGER,
PRIMARY KEY (name, filepath)
);
Purpose: Stores function declarations without bodies (imports/forward declarations)
- Same schema as Functions table
- Distinguishes between function definitions and declarations
- Useful for tracking external function references
3. Globals Table
CREATE TABLE Globals (
filepath TEXT,
name TEXT,
address TEXT
);
Purpose: Stores global variable declarations marked with extern
filepath
: Source file path where the global is declaredname
: Global variable name (identifier)address
: 8-character hexadecimal memory address from comments- No Primary Key: Allows duplicate global names across files
Address Format
The tool extracts addresses from C++ comments using this regex pattern:
//\s*([0-9a-fA-F]{8})
Expected Comment Format:
void myFunction(); // 12345678
extern int globalVar; // ABCDEF00
- Addresses must be exactly 8 hexadecimal characters
- Can be uppercase or lowercase
- Must be in a C++ line comment (
//
) - Whitespace after
//
is optional
Tool Modes
1. Functions Mode (-m functions
)
- Default mode
- Parses C++ files for function definitions and declarations
- Populates
Functions
andImports
tables - Distinguishes between functions with bodies vs. declarations only
2. Globals Mode (-m globals
)
- Parses C++ files for
extern
global variable declarations - Populates
Globals
table - Only processes variables marked with
extern
storage class
3. Duplicates Mode (-m duplicates
)
- Analysis mode - doesn't process files
- Checks existing database for duplicate addresses and names
- Reports conflicts across all tables
- Returns exit code 1 if duplicates found, 0 if clean
4. Dump-Tree Mode (-m dump-tree
)
- Debug mode - doesn't use database
- Outputs Tree-sitter AST for debugging parsing issues
- Useful for understanding how the parser interprets source code
Data Quality Checks
The tool includes built-in validation:
Duplicate Address Detection
- Scans all tables for addresses used multiple times
- Reports format:
"DUPLICATE ADDRESS: {address} appears {count} times in: {entries}"
- Cross-references Functions, Imports, and Globals tables
Duplicate Name Detection
- Checks for function names appearing in multiple files
- Checks for global names appearing in multiple files
- Helps identify naming conflicts and potential issues
Usage Examples
Basic Function Extraction
./tool file1.cpp file2.cpp -d output.db -m functions
Global Variable Extraction
./tool globals.h -d output.db -m globals
Batch Processing with File List
./tool -l filelist.txt -d output.db -m functions
Quality Assurance Check
./tool -d output.db -m duplicates
Database Queries for Users
Find Function by Name
SELECT * FROM Functions WHERE name = 'functionName';
SELECT * FROM Imports WHERE name = 'functionName';
Find All Symbols at Address
SELECT 'Function' as type, name, filepath FROM Functions WHERE address = '12345678'
UNION ALL
SELECT 'Import' as type, name, filepath FROM Imports WHERE address = '12345678'
UNION ALL
SELECT 'Global' as type, name, filepath FROM Globals WHERE address = '12345678';
List All Functions in File
SELECT name, address FROM Functions WHERE filepath = 'path/to/file.cpp'
ORDER BY name;
Find Functions Without Addresses
SELECT name, filepath FROM Functions WHERE address = '' OR address IS NULL;
Address Range Analysis
SELECT name, address, filepath FROM Functions
WHERE CAST(address AS INTEGER) BETWEEN 0x10000000 AND 0x20000000
ORDER BY CAST(address AS INTEGER);
Integration Considerations
- Database Format: Standard SQLite3 - compatible with most tools and languages
- File Paths: Relative to the game source directory, meaning there will be gh_auto, gh_fix subfolders. (relative to the game_re folder in repo root)
- Address Format: Always 8-character hex strings (32 bit addresses) - pad with leading zeros if needed
- Case Sensitivity: Function/global names are case-sensitive as per C++ standards
- Unicode Support: Handles UTF-8 encoded source files
This database serves as a comprehensive symbol table for reverse engineering, debugging, and code analysis workflows.