187 lines
5.4 KiB
Markdown
187 lines
5.4 KiB
Markdown
# C++ Function/Global Parser Tool - Database Output Summary
|
|
|
|
## Overview
|
|
This tool parses C++ source files using Tree-sitter to extract function and global variable information along with their memory addresses from comments. The extracted data is stored in an SQLite database for analysis and lookup purposes.
|
|
|
|
## Database Schema
|
|
|
|
The tool creates an SQLite database (default: `gh.db`) with three main tables:
|
|
|
|
### 1. Functions Table
|
|
```sql
|
|
CREATE TABLE Functions (
|
|
filepath TEXT,
|
|
name TEXT,
|
|
address TEXT,
|
|
type INTEGER,
|
|
PRIMARY KEY (name, filepath)
|
|
);
|
|
```
|
|
|
|
Where type is one of the following:
|
|
|
|
- 0: Auto
|
|
- 1: Fix
|
|
- 2: Stub
|
|
- 3: Ref
|
|
|
|
**Purpose**: Stores function definitions that have function bodies (actual implementations)
|
|
- `filepath`: Source file path where the function is defined
|
|
- `name`: Function name (identifier)
|
|
- `address`: 8-character hexadecimal memory address extracted from comments
|
|
- **Primary Key**: Combination of name and filepath (allows same function name in different files)
|
|
|
|
### 2. Imports Table
|
|
```sql
|
|
CREATE TABLE Imports (
|
|
filepath TEXT,
|
|
name TEXT,
|
|
address TEXT,
|
|
type INTEGER,
|
|
PRIMARY KEY (name, filepath)
|
|
);
|
|
```
|
|
|
|
**Purpose**: Stores function declarations without bodies (imports/forward declarations)
|
|
- Same schema as Functions table
|
|
- Distinguishes between function definitions and declarations
|
|
- Useful for tracking external function references
|
|
|
|
### 3. Globals Table
|
|
```sql
|
|
CREATE TABLE Globals (
|
|
filepath TEXT,
|
|
name TEXT,
|
|
address TEXT
|
|
);
|
|
```
|
|
|
|
**Purpose**: Stores global variable declarations marked with `extern`
|
|
- `filepath`: Source file path where the global is declared
|
|
- `name`: Global variable name (identifier)
|
|
- `address`: 8-character hexadecimal memory address from comments
|
|
- **No Primary Key**: Allows duplicate global names across files
|
|
|
|
## Address Format
|
|
|
|
The tool extracts addresses from C++ comments using this regex pattern:
|
|
```regex
|
|
//\s*([0-9a-fA-F]{8})
|
|
```
|
|
|
|
**Expected Comment Format**:
|
|
```cpp
|
|
void myFunction(); // 12345678
|
|
extern int globalVar; // ABCDEF00
|
|
```
|
|
|
|
- Addresses must be exactly 8 hexadecimal characters
|
|
- Can be uppercase or lowercase
|
|
- Must be in a C++ line comment (`//`)
|
|
- Whitespace after `//` is optional
|
|
|
|
## Tool Modes
|
|
|
|
### 1. Functions Mode (`-m functions`)
|
|
- **Default mode**
|
|
- Parses C++ files for function definitions and declarations
|
|
- Populates `Functions` and `Imports` tables
|
|
- Distinguishes between functions with bodies vs. declarations only
|
|
|
|
### 2. Globals Mode (`-m globals`)
|
|
- Parses C++ files for `extern` global variable declarations
|
|
- Populates `Globals` table
|
|
- Only processes variables marked with `extern` storage class
|
|
|
|
### 3. Duplicates Mode (`-m duplicates`)
|
|
- **Analysis mode** - doesn't process files
|
|
- Checks existing database for duplicate addresses and names
|
|
- Reports conflicts across all tables
|
|
- Returns exit code 1 if duplicates found, 0 if clean
|
|
|
|
### 4. Dump-Tree Mode (`-m dump-tree`)
|
|
- **Debug mode** - doesn't use database
|
|
- Outputs Tree-sitter AST for debugging parsing issues
|
|
- Useful for understanding how the parser interprets source code
|
|
|
|
## Data Quality Checks
|
|
|
|
The tool includes built-in validation:
|
|
|
|
### Duplicate Address Detection
|
|
- Scans all tables for addresses used multiple times
|
|
- Reports format: `"DUPLICATE ADDRESS: {address} appears {count} times in: {entries}"`
|
|
- Cross-references Functions, Imports, and Globals tables
|
|
|
|
### Duplicate Name Detection
|
|
- Checks for function names appearing in multiple files
|
|
- Checks for global names appearing in multiple files
|
|
- Helps identify naming conflicts and potential issues
|
|
|
|
## Usage Examples
|
|
|
|
### Basic Function Extraction
|
|
```bash
|
|
./tool file1.cpp file2.cpp -d output.db -m functions
|
|
```
|
|
|
|
### Global Variable Extraction
|
|
```bash
|
|
./tool globals.h -d output.db -m globals
|
|
```
|
|
|
|
### Batch Processing with File List
|
|
```bash
|
|
./tool -l filelist.txt -d output.db -m functions
|
|
```
|
|
|
|
### Quality Assurance Check
|
|
```bash
|
|
./tool -d output.db -m duplicates
|
|
```
|
|
|
|
## Database Queries for Users
|
|
|
|
### Find Function by Name
|
|
```sql
|
|
SELECT * FROM Functions WHERE name = 'functionName';
|
|
SELECT * FROM Imports WHERE name = 'functionName';
|
|
```
|
|
|
|
### Find All Symbols at Address
|
|
```sql
|
|
SELECT 'Function' as type, name, filepath FROM Functions WHERE address = '12345678'
|
|
UNION ALL
|
|
SELECT 'Import' as type, name, filepath FROM Imports WHERE address = '12345678'
|
|
UNION ALL
|
|
SELECT 'Global' as type, name, filepath FROM Globals WHERE address = '12345678';
|
|
```
|
|
|
|
### List All Functions in File
|
|
```sql
|
|
SELECT name, address FROM Functions WHERE filepath = 'path/to/file.cpp'
|
|
ORDER BY name;
|
|
```
|
|
|
|
### Find Functions Without Addresses
|
|
```sql
|
|
SELECT name, filepath FROM Functions WHERE address = '' OR address IS NULL;
|
|
```
|
|
|
|
### Address Range Analysis
|
|
```sql
|
|
SELECT name, address, filepath FROM Functions
|
|
WHERE CAST(address AS INTEGER) BETWEEN 0x10000000 AND 0x20000000
|
|
ORDER BY CAST(address AS INTEGER);
|
|
```
|
|
|
|
## Integration Considerations
|
|
|
|
- **Database Format**: Standard SQLite3 - compatible with most tools and languages
|
|
- **File Paths**: Relative to the game source directory, meaning there will be gh_auto, gh_fix subfolders. (relative to the game_re folder in repo root)
|
|
- **Address Format**: Always 8-character hex strings (32 bit addresses) - pad with leading zeros if needed
|
|
- **Case Sensitivity**: Function/global names are case-sensitive as per C++ standards
|
|
- **Unicode Support**: Handles UTF-8 encoded source files
|
|
|
|
This database serves as a comprehensive symbol table for reverse engineering, debugging, and code analysis workflows.
|