Rewrite the README of sctrace

This commit is contained in:
Tate, Hongliang Tian 2025-12-10 22:01:16 +08:00 committed by Tao Su
parent 90dd7451de
commit 4013104c12
1 changed files with 212 additions and 133 deletions

View File

@ -1,189 +1,268 @@
# Syscall Compatibility Tracer
Syscall Compatibility Tracer (`sctrace`) is a powerful system call compatibility
verification tool that analyzes and validates system call against user-defined
patterns. Written in
[SCML (System Call Matching Language)](https://asterinas.github.io/book/kernel/linux-compatibility/syscall-flag-coverage/system-call-matching-language.html),
these patterns describe supported functionality of system calls.
`sctrace` supports both real-time monitoring of running programs and post-analysis of
existing trace logs, providing comprehensive insights into system call compatibility
with intuitive pattern matching and visual feedback.
Syscall Compatibility Tracer (`sctrace`) is an oracle
to answer the following questions
for developers and users of a Linux ABI-compatible OS
(such as [Asterinas](https://github.com/asterinas/asterinas)):
**Is a target Linux application supported by the OS?
If not, where are the gaps?**
## Features
## Motivation
- **Pattern-based filtering**: Define system call patterns using SCML syntax
- **Dual mode operation**:
- Online mode: Real-time tracing of running programs
- Offline mode: Analysis of existing strace log files
- **Multi-threaded support**: Automatic handling of multi-threaded program traces with syscall reconstruction.
When tracing multi-threaded programs, strace may split system calls across multiple lines due to thread interleaving.
`sctrace` automatically handles this reconstruction.
- **Multiple SCML files support**: Specify multiple `.scml` files as arguments to load all of them.
Each file maintains its own scope for bitflags and struct definitions, preventing cross-file pollution.
There are tons of Linux ABI-compatible OSes out there:
some of them have been deployed at scale
(e.g., [HongMeng Kernel](https://www.usenix.org/conference/osdi24/presentation/chen-haibo) and [gVisor](https://gvisor.dev/)),
some are in rapid development
(e.g., [Asterinas](https://github.com/asterinas/asterinas)),
some target niche markets
(e.g., [Occlum](https://github.com/occlum/occlum)),
some are research prototypes
(e.g., [Graphene](https://grapheneproject.io/)),
and some are just hobby projects
(e.g., [Maestro](https://github.com/maestro-os/maestro)).
They all provide a subset of Linux system calls and features
so that at least some Linux applications can run on them _unmodified_.
## How to build and install
But here is a pain point for the developers and early adopters
of such a Linux ABI-compatible OS:
**when a Linux application is to be ported to this (imperfectly)
Linux ABI-compatible OS,
how can we know beforehand if the target application
is supposed to be supported or not?
And if not, where are the gaps?**
### Prerequisites
- [**strace**](https://strace.io/) version 5.15 or higher (for online mode)
- Debian/Ubuntu: `sudo apt install strace`
- Fedora/RHEL: `sudo dnf install strace`
- Rust toolchain
### Build instructions
Make sure you have Rust installed, then build the project:
A common practice is to run the target Linux application
with the classic [strace](https://strace.io/) tool,
which traces and prints all system calls invoked by an application.
For example, running a Hello World program with `strace`:
```bash
cargo build --release
strace ./hello_world
```
The binary will be available at `target/release/sctrace`.
would generate output as shown below:
### Installation instructions
To install the binary (for example, to `/usr/local/bin`),
you can use:
```bash
sudo cp target/release/sctrace /usr/local/bin/
```
execve("./hello_world", ["./hello_world"], 0xffffffd3f710 /* 4 vars */) = 0
brk(NULL) = 0xaaaabdc1b000
mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xffff890f4000
openat(AT_FDCWD, "/lib/aarch64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 3
read(3, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0\267\0\1\0\0\0\360\206\2\0\0\0\0\0"..., 832) = 832
fstat(3, {st_mode=S_IFREG|0755, st_size=1722920, ...}) = 0
...
write(1, "Hello, World!\n", 14) = 14
exit_group(0) = ?
```
Or you can install from `crates.io` directly (Recommended):
As `strace` captures all interactions between the application
and the OS kernel,
its log provides sufficient information to assess if there are any
compatibility issues.
But `strace`-ing a complex application might output millions of lines of log.
It would be too tedious for a human to review.
Writing an ad-hoc log processing script or tool would greatly reduce the human labor.
But this approach is error-prone and its results would be inaccurate
as it lacks the ground truth about all supported (or unsupported)
Linux system calls of the target OS.
This is where `sctrace` can be a life saver.
## Introduction
We introduce Syscall Compatibility Tracer (`sctrace`),
a tool that checks whether all system calls invoked
by a target Linux application are supported
by a target Linux ABI-compatible OS or not.
To achieve this goal,
it combines the classic `strace` tool
with a mini domain-specific language called
[System Call Matching Language (SCML)](https://asterinas.github.io/book/kernel/linux-compatibility/syscall-flag-coverage/system-call-matching-language.html).
SCML adopts a `strace`-inspired syntax,
with which one can specify all supported system call patterns
in a concise, accurate, and human-readable way.
The `sctrace` tool originates from the [Asterinas](https://github.com/asterinas/asterinas) project
and is released so that it may be useful to the wider OS community.
## Getting Started
### Installation
The `sctrace` tool has two prerequisites:
* [**strace**](https://strace.io/) version 5.15 or higher
* Install on Debian/Ubuntu: `sudo apt install strace`
* Install on Fedora/RHEL: `sudo dnf install strace`
* The Rust toolchain (the version `nightly-2025-12-06` is tested; other versions may be supported as well)
* Install via [Rustup](https://rustup.rs/): `curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh`
To install the `sctrace` tool, execute the following command:
```bash
cargo install sctrace
```
This will automatically download, build, and install the latest version of `sctrace`.
## Usage
### Basic Syntax
### Basic syntax
```bash
sctrace <SCML_FILE1> [SCML_FILE2 ...] [OPTIONS] -- [program] [args...]
sctrace <scml_file>... -- <prog> <arg>...
```
### Options
* `<scml_file>...` gives one or more SCML files
that specify the supported system call patterns of a target OS.
* `<prog>` gives the name or path of the target Linux program.
* `<arg>...` provides zero, one, or more arguments for `<prog>`.
- `--input <FILE>`: Specify input file for offline mode
- `--quiet`: Enable quiet mode (only show unsupported calls)
### A Hello World Example
### Online Mode (Real-time tracing)
As `sctrace` requires SCML files as its input,
let's write a simple one called `open.scml`:
Trace a program in real-time:
```c
// Some (not all) valid flag patterns for the `open` and `openat` syscalls
open_flags = O_RDONLY | O_WRONLY | O_RDWR | O_CLOEXEC;
// Some (not all) valid patterns for the `open` syscall
open(path, flags = <open_flags>);
// Some (not all) valid patterns for the `openat` syscall
openat(dirfd, path, flags = <open_flags>);
```
This file describes some (not all) valid patterns
for Linux's [`open` and `openat`](https://man7.org/linux/man-pages/man2/open.2.html) system calls.
SCML's syntax and semantics resemble those of `strace` and the C language.
For more explanation about SCML syntax,
see its [documentation](https://asterinas.github.io/book/kernel/linux-compatibility/syscall-flag-coverage/system-call-matching-language.html).
To see `sctrace` in action,
we now use it to track the execution of a simple command
that prints out the content of `open.scml`:
```bash
sctrace pattern1.scml pattern2.scml -- ls -la
sctrace file_ops.scml network.scml --quiet -- ./my_program arg1 arg2
sctrace open.scml -- cat open.scml
```
### Offline Mode (Log file analysis)
The output would look like below:
Analyze an existing strace log file:
```
1045884 execve("/usr/bin/cat", ["cat", "open.scml"], 0x5d08ad413588 /* 25 vars */) = 0 (unsupported)
1045884 brk(NULL) = 0x5a3f59f86000 (unsupported)
1045884 mmap(NULL, 8192, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7d321cf1f000 (unsupported)
1045884 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) (unsupported)
1045884 openat(AT_FDCWD</home/sutao>, "/etc/ld.so.cache", O_RDONLY|O_CLOEXEC) = 5</etc/ld.so.cache>
1045884 fstat(5</etc/ld.so.cache>, {st_mode=S_IFREG|0644, st_size=39847, ...}) = 0 (unsupported)
1045884 mmap(NULL, 39847, PROT_READ, MAP_PRIVATE, 5</etc/ld.so.cache>, 0) = 0x7d321cf15000 (unsupported)
1045884 close(5</etc/ld.so.cache>) = 0 (unsupported)
1045884 openat(AT_FDCWD</home/sutao>, "/lib/x86_64-linux-gnu/libc.so.6", O_RDONLY|O_CLOEXEC) = 5</usr/lib/x86_64-linux-gnu/libc.so.6>
1045884 read(5</usr/lib/x86_64-linux-gnu/libc.so.6>, "\177ELF\2\1\1\3\0\0\0\0\0\0\0\0\3\0>\0\1\0\0\0\220\243\2\0\0\0\0\0"..., 832) = 832 (unsupported)
```
The `(unsupported)` tag is appended to almost every system call entry
(except `open` and `openat`)
as `sctrace` only recognizes the valid patterns specified by `open.scml`.
Expanding the pattern rules in the input SCML files would allow `sctrace`
to recognize more system calls, as we will show later.
## User Guide
### Command-Line Interface (CLI)
The `sctrace` tool supports two modes:
the *online* and *offline* modes.
#### Online Mode (Real-time Tracking)
In the online mode, `sctrace` tracks a running command in real time.
This is the mode we described in the Hello World example.
Its complete CLI syntax is shown below:
```bash
sctrace pattern1.scml pattern2.scml --input trace.log
sctrace file_ops.scml network.scml --input trace.log --quiet
sctrace <scml_file>... [--quiet] -- <prog> <arg>...
```
**Note**: When generating strace logs for offline analysis, use `-yy` and `-f` flags:
If the `--quiet` option is given,
then only the **unsupported** system calls are shown in the output,
making it easier to spot compatibility gaps.
#### Offline Mode (Log Analysis)
The offline mode does not run a user-given command;
instead, it analyzes a user-given `strace` log of a command.
The CLI syntax is shown below:
```bash
strace -yy -f -o trace.log ls -la
sctrace <scml_file>... [--quiet] --input <strace_log>
```
- `-yy`: Print paths associated with file descriptor arguments
- `-f`: Trace child processes created by fork/vfork/clone
## Examples
### Example 1: Basic File Operations
Create `file_ops.scml`:
```scml
openat(dirfd, flags = O_RDONLY | O_WRONLY | O_RDWR, mode);
read(fd, buf, count = <INTEGER>);
write(fd, buf, count = <INTEGER>);
close(fd);
```
Run:
```bash
sctrace file_ops.scml -- cat /etc/passwd
```
### Example 2: Network Operations
Create `network.scml`:
```scml
socket(domain = AF_INET | AF_INET6, type = SOCK_STREAM | SOCK_DGRAM, protocol);
connect(sockfd, addr, addrlen);
send(sockfd, buf, len, flags);
recv(sockfd, buf, len, flags);
```
Run:
```bash
sctrace network.scml -- curl http://example.com
```
### Example 3: Using Asterinas Compatibility Patterns
Use the provided directory [syscall-flag-coverage](../../book/src/kernel/linux-compatibility/syscall-flag-coverage) (work in progress) and
test with various commands:
The input file `strace_log` is expected to be generated
using the following specific form of `strace`:
```bash
# Monitor file system operations
sctrace $(find . -name "*.scml") -- tree .
# Monitor process information calls
sctrace $(find . -name "*.scml") -- top
# Monitor network operations
sctrace $(find . -name "*.scml") -- ping 127.0.0.1
strace -yy -f -o <strace_log> <prog> <arg>...
```
### Example 4: Offline Analysis
The meaning of the `--quiet` option is the same as that in the online mode.
### Using `sctrace` for Asterinas
The syscall coverage of Asterinas has been formally [documented](https://asterinas.github.io/book/kernel/linux-compatibility/syscall-flag-coverage/system-call-matching-language.html) in SCML.
The entire set of SCML files can be found
in the [`syscall-flag-coverage/`](../../book/src/kernel/linux-compatibility/syscall-flag-coverage/) directory of the Asterinas book.
To fetch these SCML files, run the following command:
```bash
# Generate trace log
strace -yy -f -o trace.log ls -la
# Analyze with sctrace
sctrace patterns.scml --input trace.log
git clone --depth 1 https://github.com/asterinas/asterinas
cd asterinas/book/src/kernel/linux-compatibility/syscall-flag-coverage/
```
## Output
You can now leverage these files to
check if a program can be ported to Asterinas:
`sctrace` provides colored output to distinguish between supported and unsupported system calls:
- **Supported calls**: Normal output (or hidden in quiet mode)
- **Unsupported calls**: Highlighted in red with "unsupported" message
### Example Output
```
openat(AT_FDCWD, "/etc/passwd", O_RDONLY) = 3
read(3, "root:x:0:0:root:/root:/bin/bash\n"..., 4096) = 1234
close(3) = 0
chmod("/tmp/test", 0755) (unsupported)
```bash
sctrace $(find . -name "*.scml") -- <prog> <arg>...
```
## Dependencies
In the Asterinas development Docker image,
we have pre-installed the `sctrace` tool.
For convenience,
the Docker image sets an environment variable called `ASTER_SCML`,
which is the list of all Asterinas SCML files.
This helps simplify using `sctrace` for Asterinas.
- `clap`: Command-line argument parsing
- `regex`: Regular expression support
- `nom`: Parser combinator library
- `nix`: Unix system interface for process management
```bash
sctrace $ASTER_SCML [--quiet] -- <prog> <arg>...
sctrace $ASTER_SCML [--quiet] --input <strace_log>
```
## Troubleshooting
### Troubleshooting
### Permission Issues
For online tracing, you may need elevated privileges:
For online tracing, you may need elevated privileges
to attach to the target process using `ptrace`:
```bash
sudo sctrace patterns.scml -- target_program
```
## Developer Guide
The source code of `sctrace` resides within the Asterinas project.
So the first step is to download the Asterinas codebase:
```bash
git clone https://github.com/asterinas/asterinas
```
The `sctrace` tool can be located in `tools/sctrace/`:
```bash
cd tools/sctrace
```
The tool is written in Rust.
So you will need to use Cargo to build and test it.
```bash
cargo build
cargo test
```