Add notes
This commit is contained in:
parent
3d40dc7e80
commit
7c18d04724
|
@ -0,0 +1,177 @@
|
|||
# C++ Function/Global Parser Tool - Database Output Summary
|
||||
|
||||
## Overview
|
||||
This tool parses C++ source files using Tree-sitter to extract function and global variable information along with their memory addresses from comments. The extracted data is stored in an SQLite database for analysis and lookup purposes.
|
||||
|
||||
## Database Schema
|
||||
|
||||
The tool creates an SQLite database (default: `gh.db`) with three main tables:
|
||||
|
||||
### 1. Functions Table
|
||||
```sql
|
||||
CREATE TABLE Functions (
|
||||
filepath TEXT,
|
||||
name TEXT,
|
||||
address TEXT,
|
||||
PRIMARY KEY (name, filepath)
|
||||
);
|
||||
```
|
||||
|
||||
**Purpose**: Stores function definitions that have function bodies (actual implementations)
|
||||
- `filepath`: Source file path where the function is defined
|
||||
- `name`: Function name (identifier)
|
||||
- `address`: 8-character hexadecimal memory address extracted from comments
|
||||
- **Primary Key**: Combination of name and filepath (allows same function name in different files)
|
||||
|
||||
### 2. Imports Table
|
||||
```sql
|
||||
CREATE TABLE Imports (
|
||||
filepath TEXT,
|
||||
name TEXT,
|
||||
address TEXT,
|
||||
PRIMARY KEY (name, filepath)
|
||||
);
|
||||
```
|
||||
|
||||
**Purpose**: Stores function declarations without bodies (imports/forward declarations)
|
||||
- Same schema as Functions table
|
||||
- Distinguishes between function definitions and declarations
|
||||
- Useful for tracking external function references
|
||||
|
||||
### 3. Globals Table
|
||||
```sql
|
||||
CREATE TABLE Globals (
|
||||
filepath TEXT,
|
||||
name TEXT,
|
||||
address TEXT
|
||||
);
|
||||
```
|
||||
|
||||
**Purpose**: Stores global variable declarations marked with `extern`
|
||||
- `filepath`: Source file path where the global is declared
|
||||
- `name`: Global variable name (identifier)
|
||||
- `address`: 8-character hexadecimal memory address from comments
|
||||
- **No Primary Key**: Allows duplicate global names across files
|
||||
|
||||
## Address Format
|
||||
|
||||
The tool extracts addresses from C++ comments using this regex pattern:
|
||||
```regex
|
||||
//\s*([0-9a-fA-F]{8})
|
||||
```
|
||||
|
||||
**Expected Comment Format**:
|
||||
```cpp
|
||||
void myFunction(); // 12345678
|
||||
extern int globalVar; // ABCDEF00
|
||||
```
|
||||
|
||||
- Addresses must be exactly 8 hexadecimal characters
|
||||
- Can be uppercase or lowercase
|
||||
- Must be in a C++ line comment (`//`)
|
||||
- Whitespace after `//` is optional
|
||||
|
||||
## Tool Modes
|
||||
|
||||
### 1. Functions Mode (`-m functions`)
|
||||
- **Default mode**
|
||||
- Parses C++ files for function definitions and declarations
|
||||
- Populates `Functions` and `Imports` tables
|
||||
- Distinguishes between functions with bodies vs. declarations only
|
||||
|
||||
### 2. Globals Mode (`-m globals`)
|
||||
- Parses C++ files for `extern` global variable declarations
|
||||
- Populates `Globals` table
|
||||
- Only processes variables marked with `extern` storage class
|
||||
|
||||
### 3. Duplicates Mode (`-m duplicates`)
|
||||
- **Analysis mode** - doesn't process files
|
||||
- Checks existing database for duplicate addresses and names
|
||||
- Reports conflicts across all tables
|
||||
- Returns exit code 1 if duplicates found, 0 if clean
|
||||
|
||||
### 4. Dump-Tree Mode (`-m dump-tree`)
|
||||
- **Debug mode** - doesn't use database
|
||||
- Outputs Tree-sitter AST for debugging parsing issues
|
||||
- Useful for understanding how the parser interprets source code
|
||||
|
||||
## Data Quality Checks
|
||||
|
||||
The tool includes built-in validation:
|
||||
|
||||
### Duplicate Address Detection
|
||||
- Scans all tables for addresses used multiple times
|
||||
- Reports format: `"DUPLICATE ADDRESS: {address} appears {count} times in: {entries}"`
|
||||
- Cross-references Functions, Imports, and Globals tables
|
||||
|
||||
### Duplicate Name Detection
|
||||
- Checks for function names appearing in multiple files
|
||||
- Checks for global names appearing in multiple files
|
||||
- Helps identify naming conflicts and potential issues
|
||||
|
||||
## Usage Examples
|
||||
|
||||
### Basic Function Extraction
|
||||
```bash
|
||||
./tool file1.cpp file2.cpp -d output.db -m functions
|
||||
```
|
||||
|
||||
### Global Variable Extraction
|
||||
```bash
|
||||
./tool globals.h -d output.db -m globals
|
||||
```
|
||||
|
||||
### Batch Processing with File List
|
||||
```bash
|
||||
./tool -l filelist.txt -d output.db -m functions
|
||||
```
|
||||
|
||||
### Quality Assurance Check
|
||||
```bash
|
||||
./tool -d output.db -m duplicates
|
||||
```
|
||||
|
||||
## Database Queries for Users
|
||||
|
||||
### Find Function by Name
|
||||
```sql
|
||||
SELECT * FROM Functions WHERE name = 'functionName';
|
||||
SELECT * FROM Imports WHERE name = 'functionName';
|
||||
```
|
||||
|
||||
### Find All Symbols at Address
|
||||
```sql
|
||||
SELECT 'Function' as type, name, filepath FROM Functions WHERE address = '12345678'
|
||||
UNION ALL
|
||||
SELECT 'Import' as type, name, filepath FROM Imports WHERE address = '12345678'
|
||||
UNION ALL
|
||||
SELECT 'Global' as type, name, filepath FROM Globals WHERE address = '12345678';
|
||||
```
|
||||
|
||||
### List All Functions in File
|
||||
```sql
|
||||
SELECT name, address FROM Functions WHERE filepath = 'path/to/file.cpp'
|
||||
ORDER BY name;
|
||||
```
|
||||
|
||||
### Find Functions Without Addresses
|
||||
```sql
|
||||
SELECT name, filepath FROM Functions WHERE address = '' OR address IS NULL;
|
||||
```
|
||||
|
||||
### Address Range Analysis
|
||||
```sql
|
||||
SELECT name, address, filepath FROM Functions
|
||||
WHERE CAST(address AS INTEGER) BETWEEN 0x10000000 AND 0x20000000
|
||||
ORDER BY CAST(address AS INTEGER);
|
||||
```
|
||||
|
||||
## Integration Considerations
|
||||
|
||||
- **Database Format**: Standard SQLite3 - compatible with most tools and languages
|
||||
- **File Paths**: Relative to the game source directory, meaning there will be gh_auto, gh_fix subfolders. (relative to the game_re folder in repo root)
|
||||
- **Address Format**: Always 8-character hex strings (32 bit addresses) - pad with leading zeros if needed
|
||||
- **Case Sensitivity**: Function/global names are case-sensitive as per C++ standards
|
||||
- **Unicode Support**: Handles UTF-8 encoded source files
|
||||
|
||||
This database serves as a comprehensive symbol table for reverse engineering, debugging, and code analysis workflows.
|
Loading…
Reference in New Issue