0x00 前言

Netflix 工程师开发的 Gadget Inspector 是一个用于挖掘 Java 反序列化漏洞利用链的工具，网上有两个同名资料 Automated Discovery of Deserialization Gadget Chains：论文、PPT。看懂工作流程并不难，入口类非常清晰明了，主要是逆拓扑排序、JVM 模拟（本地变量表、操作数栈）的部分比较晦涩。鉴于我四舍五入算是零基础接触 Java，所以前面先补充一些相关知识，后面再详细解析 Gadget Inspector 的代码。

加了注释的源码：jckling/gadgetinspector

0x01 预备知识

1.1 Java 字节码

看美团的字节码增强技术探索里面的介绍就够了，内容包括 java 的字节码、asm 框架、Javassist 框架以及 instrument 类库。

限定名（qualified names）：名称、.、标识符，例如：demo.servlet.HelloServlet，有些地方用 / 代替点号

简单名称（simple name）：单个标识符，例如：test

完全限定名（fully qualified names）：每个原始类型、命名包、顶级类和顶级接口都有一个完全限定名，有的是简单名称有的是限定名，详见 6.7. Fully Qualified Names and Canonical Names

描述符（Descriptor）

字段（Field）描述符

FieldDescriptor:
FieldType

FieldType:
    BaseType
    ObjectType
    ArrayType
   
BaseType:
    B
    C
    D
    F
    I
    J
    S
    Z
   
ObjectType:
    L ClassName ;

ArrayType:
    [ ComponentType

ComponentType:
    FieldType

说明

BaseType Character	Type	Interpretation
B	byte	signed byte
C	char	Unicode character code point in the Basic Multilingual Plane, encoded with UTF-16
D	double	double-precision floating-point value
F	float	single-precision floating-point value
I	int	integer
J	long	long integer
L	ClassName	; reference an instance of class ClassName
S	short	signed short
Z	boolean	true or false
[	reference	one array dimension

方法（Method）描述符

MethodDescriptor:
    ( ParameterDescriptor* ) ReturnDescriptor

ParameterDescriptor:
    FieldType

ReturnDescriptor:
    FieldType
    VoidDescriptor

VoidDescriptor:
    V

1.2 JVM

Gadget Inspector 的 TaintTrackingMethodVisitor 中模拟了 JVM 的本地变量表（Local Variable Table）和操作数栈（Operand Stack），用于进行污点分析。

栈帧（Stack Frame） 是用于支持虚拟机进行方法调用和方法执行的数据结构，每个方法从调用到执行完成的过程，都对应着一个栈帧在虚拟机栈里从入栈到出栈的过程。

本地变量表（Local Variable Table） 存储了方法参数和方法内定义的局部变量，隐式传入实例对象本身 this 。

操作数栈（Operand Stack） 由操作码控制元素的出/入栈，操作数栈中的元素可以是任意 Java 数据类型。

入栈：本地变量表或对象实例的字段中的元素（常量/变量）
出栈：将栈中元素写入本地变量表或返回给方法调用者（返回栈顶）
栈中元素的长度可能为 0、1、2，这里一个单位为 32 位

1.3 ASM

访问者模式

访问者模式的核心思想是为了访问比较复杂的数据结构，不去改变数据结构，而是把对数据的操作抽象出来，在“访问”的过程中以回调形式在访问者中处理操作逻辑。如果要新增一组操作，那么只需要增加一个新的访问者。

代码组织架构

org.objectweb.asm			# Core API，核心包
org.objectweb.asm.commons		# 基于 core 和 tree 的类适配器
org.objectweb.asm.signature		# 泛型定义的相关操作 API，核心包的扩充
org.objectweb.asm.tree			# Tree API，实现复杂的类转换
org.objectweb.asm.tree.analysis	        # 基于 tree 包提供的静态字节码分析框架
org.objectweb.asm.util			# 用于调试的类访问器和适配器
org.objectweb.asm.xml			# 弃用

工作流程

ClassReader 类解析 class 文件（事件生成）
- 调用作为参数传递给 accept 方法的 ClassVisitor 实例上相应的 visitXxx 方法
ClassVisitor 类将所有方法调用委派给另一个 ClassVisitor 实例（事件过滤）
ClassWriter 类是 ClassVisitor 抽象类的子类（事件消费）

为了对类文件进行“观察”，需要继承和重写访问者（Visitor），然后调用 ClassReader.accept 方法执行访问，该方法将按顺序调用参数 ClassVisitor 中的方法，没有重写的则调用父类 ClassVisitor 默认的方法；观察到方法时，将按顺序调用 MethodVisitor 中的方法，没有重写的也调用默认方法。

/**
 * 构造函数
 * 构造一个新的 ClassReader 对象
 *
 * @param classFile 要读取的 JVMS ClassFile 结构
 * @return
 */
public ClassReader(byte[] classFile)


/**
 * 用给定的访问者访问传递给此 ClassReader 的构造函数的 JVMS ClassFile 结构
 *
 * @param classVisitor    访问者
 * @param parsingOptions  用于解析此类的选项（SKIP_CODE, SKIP_DEBUG, SKIP_FRAMES, EXPAND_FRAMES）
 * @return
 */
public void accept(ClassVisitor classVisitor, int parsingOptions)

访问者

1. ClassVisitor

方法调用顺序（访问顺序）

visit [ visitSource ] [ visitModule ][ visitNestHost ][ visitOuterClass ] ( visitAnnotation | visitTypeAnnotation | visitAttribute )* ( visitNestMember | [ * visitPermittedSubclass ] | visitInnerClass | visitRecordComponent | visitField | visitMethod )* visitEnd

Gadget Inspector 中涉及的方法

/**
 * 构造函数
 *
 * @param	api     访问者实现的 ASM API 版本。必须是 Opcodes.ASM4、Opcodes.ASM5、Opcodes.ASM6、Opcodes.ASM7 之一。
 * @return
 */
public ClassVisitor(int api)

/**
 * 访问类的头部
 *
 * @param	version		类版本
 * @param	access		类的访问标志（Opcodes）
 * @param	name		类的内部名称（完全限定名）
 * @param	signature	类的签名
 * @param	superName	父类的内部名称
 * @param	interfaces	类接口的内部名称
 * @return
 */
public void visit(int version, int access, java.lang.String name, java.lang.String signature, java.lang.String superName, java.lang.String[] interfaces)


/**
 * 访问类的外围类，当类具有外围类时自动调用此方法
 *
 * @param	owner		类的外围类的内部名称
 * @param	name		包含类的方法的名称，如果类没有包含在其外围类的方法中，则为空
 * @param	descriptor	包含类的方法的描述符，如果类没有包含在其外围类的方法中，则为空
 * @return
 */
public void visitOuterClass(java.lang.String owner, java.lang.String name, java.lang.String descriptor)


/**
 * 访问内部类，这个内部类不一定是被访问的类的成员
 *
 * @param	owner		内部类的内部名称
 * @param	outerName	内部类所属的类的内部名称，对于非成员类可能为空
 * @param	innerName	内部类在其外围类中的(简单)名称，对于匿名内部类可能为空
 * @param	access		内部类最初在外围类中声明的访问标志
 * @return
 */
public void visitInnerClass(java.lang.String name, java.lang.String outerName, java.lang.String innerName, int access)


/**
 * 访问类的字段
 *
 * @param	access		字段的访问标志
 * @param	name	    字段的名称
 * @param	descriptor	字段的描述符
 * @param	signature	字段的签名
 * @param	value       字段的初始值，仅针对静态字段
 * @return
 */
public FieldVisitor visitField(int access, java.lang.String name, java.lang.String descriptor, java.lang.String signature, java.lang.Object value)


/**
 * 访问类的一个方法（方法定义）
 * 这个方法在每次调用时必须返回一个新的 MethodVisitor 实例（或null）
 *
 * @param	access		方法的访问标志（Opcodes）
 * @param	name		方法的名称
 * @param	descriptor	方法的描述符
 * @param	signature	方法的签名，如果方法参数、返回类型和异常不使用泛型类型，则可能为空
 * @param	exceptions	方法异常类的内部名称，可能为空
 * @return
 */
public MethodVisitor visitMethod(int access, java.lang.String name, java.lang.String descriptor, java.lang.String signature, java.lang.String[] exceptions)


/**
 * 该方法是最后一个被调用的方法，用于通知访问者该类的所有字段和方法已被访问
 */
public void visitEnd()

2. MethodVisitor

方法调用顺序（访问顺序），visit<i>X</i>Insn 按照字节码指令顺序调用。

1
2
3

( visitParameter )* [ visitAnnotationDefault ] ( visitAnnotation | visitAnnotableParameterCount | visitParameterAnnotation visitTypeAnnotation | visitAttribute )* [ visitCode ( visitFrame | visit<i>X</i>Insn | visitLabel | visitInsnAnnotation | visitTryCatchBlock | visitTryCatchAnnotation | visitLocalVariable | visitLocalVariableAnnotation | visitLineNumber )* visitMaxs ] visitEnd

In addition, the visit<i>X</i>Insn and visitLabel methods must be called in the sequential order of the bytecode instructions of the visited code.

Gadget Inspector 中涉及的方法

/**
 * 启动对方法代码的访问，如果有的话（即非抽象方法）
 */
public void visitCode()


/**
 * 访问局部变量和操作数堆栈元素的当前状态
 *
 * @param	type		堆栈映射帧的类型（stack map frame）
 * @param	numLocal	被访问帧中的局部变量数量
 * @param	local		被访问帧中的局部变量类型
 * @param	numStack	被访问帧中的操作数堆栈元素个数
 * @param	stack		被访问帧中的操作数堆栈元素类型
 * @return
 */
public void visitFrame(int type, int numLocal, java.lang.Object[] local, int numStack, java.lang.Object[] stack)


/**
 * 访问零操作数的指令
 *
 * @param	opcode	要访问的指令的操作码
 * @return
 */
public void visitInsn(int opcode)


/**
 * 访问单个 int 类型操作数的指令
 *
 * @param	opcode	要访问的指令的操作码：BIPUSH、SIPUSH、NEWARRAY
 * @param	operand	要访问的指令的操作数
 * @return
 */
public void visitIntInsn(int opcode, int operand)


/**
 * 访问局部变量指令
 * 局部变量指令是加载或存储局部变量值的指令
 *
 * @param	opcode	要访问的指令的操作码：ILOAD、LLOAD、FLOAD、DLOAD、ALOAD、ISTORE、LSTORE、FSTORE、DSTORE、ASTORE、RET
 * @param	var		要访问的指令的操作数（局部变量的下标）
 * @return
 */
public void visitVarInsn(int opcode, int var)


/**
 * 访问类型指令
 * 类型指令是以类的内部名称作为参数的指令
 *
 * @param	opcode	要访问的指令的操作码：NEW、ANEWARRAY、CHECKCAST、INSTANCEOF
 * @param	type	要访问的指令的操作数（对象或数组类的内部名称）
 * @return
 */
public void visitTypeInsn(int opcode, java.lang.String type)


/**
 * 访问字段指令
 * 字段指令是加载或存储对象字段值的指令
 *
 * @param	opcode		要访问的指令的操作码：GETSTATIC、PUTSTATIC、GETFIELD、PUTFIELD
 * @param	owner		字段所有者类的内部名称
 * @param	name		字段的名称
 * @param	descriptor	字段的描述符
 * @return
 */
public void visitFieldInsn(int opcode, java.lang.String owner, java.lang.String name, java.lang.String descriptor)


/**
 * 访问方法指令
 * 方法指令是调用方法的指令
 *
 * @param	opcode		要访问的指令的操作码：INVOKEVIRTUAL、INVOKESPECIAL、INVOKESTATIC、INVOKEINTERFACE
 * @param	owner		方法所有者类的内部名称
 * @param	name		方法的名称
 * @param	descriptor	方法的描述符
 * @param	isInterface	方法的所有者类是否为接口
 * @return
 */
public void visitMethodInsn(int opcode, java.lang.String owner, java.lang.String name, java.lang.String descriptor, boolean isInterface)


/**
 * 访问 invokedynamic 指令
 *
 * @param	name						方法的名称
 * @param	descriptor					方法的描述符
 * @param	bootstrapMethodHandle		引导方法
 * @param	bootstrapMethodArguments	引导方法的常量参数
 * @return
 */
public void visitInvokeDynamicInsn(java.lang.String name, java.lang.String descriptor, Handle bootstrapMethodHandle, java.lang.Object... bootstrapMethodArguments)


/**
 * 访问跳转指令
 * 跳转指令是可以跳转到另一条指令的指令
 *
 * @param	opcode	要访问的指令的操作码：IFEQ、IFNE、IFLT、IFGE、IFGT、IFLE、IF_ICMPEQ、IF_ICMPNE、IF_ICMPLT、IF_ICMPGE、IF_ICMPGT、IF_ICMPLE、IF_ACMPEQ、IF_ACMPNE、GOTO、JSR、IFNULL、IFNONNULL
 * @param	label	要访问的指令的操作数（标签，指定跳转指令可以跳转到的指令）
 * @return
 */
public void visitJumpInsn(int opcode, Label label)


/**
 * 访问标签
 * 标签指定紧随其后的指令
 *
 * @param	label	标签对象
 * @return
 */
public void visitLabel(Label label)


/**
 * 访问 LDC 指令
 *
 * @param	value	要加载到堆栈上的常数
 * @return
 */
public void visitLdcInsn(java.lang.Object value)


/**
 * 访问 IINC 指令
 *
 * @param	value		要递增的局部变量的索引
 * @param	increment	递增的数量
 * @return
 */
public void visitIincInsn(int var, int increment)


/**
 * 访问 TABLESWITCH 指令
 *
 * @param	min		最小键值
 * @param	max		最大键值
 * @param	dflt	默认处理程序块的开始部分
 * @param	labels	处理程序块的开始
 * @return
 */
public void visitTableSwitchInsn(int min, int max, Label dflt, Label... labels)


/**
 * 访问 LOOKUPSWITCH 指令
 *
 * @param	dflt	默认处理程序块的开始部分
 * @param	keys	键值
 * @param	labels	处理程序块的开始
 * @return
 */
public void visitLookupSwitchInsn(Label dflt, int[] keys, Label[] labels)


/**
 * 访问 MULTIANEWARRAY 指令
 *
 * @param	descriptor		数组类型的描述符
 * @param	numDimensions	要分配的数组的维数
 * @return
 */
public void visitMultiANewArrayInsn(java.lang.String descriptor, int numDimensions)


/**
 * 访问指令上的注释
 * 必须在带注释的指令之后调用此方法，对同一指令可以多次调用
 *
 * @param	typeRef		对注释类型的引用	
 * @param	typePath	typeRef 中带注释的类型参数/通配符绑定/数组元素类型/静态内部类型的路径
 * @param	descriptor	注释类的描述符
 * @param	visible		运行时是否可见
 * @return
 */
public AnnotationVisitor visitInsnAnnotation(int typeRef, TypePath typePath, java.lang.String descriptor, boolean visible)


/**
 * 访问 try catch 块
 *
 * @param	start	异常处理程序范围的开始（包含）	
 * @param	end		异常处理程序范围的结束（不包含）
 * @param	handler	异常处理程序代码的开头
 * @param	type	由处理程序处理的异常类型的内部名称，或 null 来捕获任何异常（对 finally 块）
 * @return
 */
public void visitTryCatchBlock(Label start, Label end, Label handler, java.lang.String type)


/**
 * 访问异常处理程序类型上的注释
 * 必须在 visitTryCatchBlock 之后调用，对同一个异常处理程序可以多次调用
 * 
 * @param	typeRef		对注释类型的引用	
 * @param	typePath	typeRef 中带注释的类型参数/通配符绑定/数组元素类型/静态内部类型的路径
 * @param	descriptor	注释类的描述符
 * @param	visible		运行时是否可见
 * @return
 */
public AnnotationVisitor visitTryCatchAnnotation(int typeRef, TypePath typePath, java.lang.String descriptor, boolean visible)


/**
 * 访问方法的最大堆栈大小和最大局部变量数量
 * 
 * @param	maxStack	方法的最大堆栈大小
 * @param	maxLocals	方法的最大局部变量数
 * @return
 */
public void visitMaxs(int maxStack, int maxLocals)


/**
 * 访问方法的末尾
 * 该方法是最后一个被调用的方法，用于通知访问者该方法的所有注释和属性已经被访问
 */
public void visitEnd()

3. FieldVisitor

方法调用顺序（访问顺序）

1	( visitAnnotation \| visitTypeAnnotation \| visitAttribute )* visitEnd

其他

JSRInlinerAdapter 用于简化代码分析，删除 JSR 指令并内联引用的子例程（没懂）

A MethodVisitor that removes JSR instructions and inlines the referenced subroutines

/**
 * 构造函数
 *
 * @param	methodVisitor   将生成的内联方法代码发送到的方法访问者
 * @param	access		    方法的访问标志（Opcodes）
 * @param	name		    方法的名称
 * @param	descriptor      方法的描述符
 * @param	signature	    方法的签名
 * @param	exceptions      方法异常类的内部名称
 * @return
 */
public JSRInlinerAdapter(MethodVisitor methodVisitor,
                         int access,
                         java.lang.String name,
                         java.lang.String descriptor,
                         java.lang.String signature,
                         java.lang.String[] exceptions)

0x02 项目结构

项目中包含三种检测实现，在以下三个目录下：javaserial 针对 Java 原生序列化，jackson 针对 Jackson（JSON 库），xstream 针对 XStream（XML 库），同时在 config 目录下实现了各自的配置接口。

.
├── config      # 配置接口和具体实现
│   ├── ConfigRepository.java
│   ├── GIConfig.java
│   ├── JacksonDeserializationConfig.java
│   ├── JavaDeserializationConfig.java
│   └── XstreamDeserializationConfig.java
├── data        # 数据的存储格式以及读取方法
│   ├── ClassReference.java
│   ├── DataFactory.java
│   ├── DataLoader.java
│   ├── GraphCall.java
│   ├── InheritanceDeriver.java
│   ├── InheritanceMap.java
│   ├── MethodReference.java
│   └── Source.java
├── jackson     # JSON 库
│   ├── JacksonImplementationFinder.java
│   ├── JacksonSerializableDecider.java
│   └── JacksonSourceDiscovery.java
├── javaserial  # 原生序列化
│   ├── SimpleImplementationFinder.java
│   ├── SimpleSerializableDecider.java
│   └── SimpleSourceDiscovery.java
├── xstream     # XML 库
│   ├── CustomXstreamSerializableDecider.java
│   └── XstreamSerializableDecider.java
├── CallGraphDiscovery.java             # 方法调用链中的污点参数传递关系
├── ClassResourceEnumerator.java        # 类枚举器
├── GadgetChainDiscovery.java           # 搜索利用链
├── GadgetInspector.java                # 主类，程序入口
├── ImplementationFinder.java           # 接口，获取目标方法的子类实现
├── MethodDiscovery.java                # 类信息、方法信息、继承信息
├── PassthroughDiscovery.java           # 方法的返回值与参数的关系
├── SerializableDecider.java            # 序列化决策者接口
├── SourceDiscovery.java                # 污点源信息
├── TaintTrackingMethodVisitor.java     # 方法访问者
└── Util.java                           # 工具函数

gadgetinspector/data

主要是数据格式的定义。

1. DataLoader

定义了数据的读写方式，根据数据工厂方法（DataFactory）进行读写，loadData 返回的是动态数组，源码中多处调用进行数据遍历。

/**
 * 根据数据工厂接口解析数据到对象
 *
 * @param filePath 文件路径
 * @param factory  工厂方法
 * @param <T>      类型
 * @return
 * @throws IOException
 */
public static <T> List<T> loadData(Path filePath, DataFactory<T> factory) throws IOException {
    final List<String> lines = Files.readLines(filePath.toFile(), StandardCharsets.UTF_8);
    final List<T> values = new ArrayList<T>(lines.size());
    for (String line : lines) {
        values.add(factory.parse(line.split("\t", -1)));
    }
    return values;
}

/**
 * 根据数据工厂接口将数据写入文件
 *
 * @param filePath 文件路径
 * @param factory  工厂方法
 * @param values   待写入的数据
 * @param <T>      类型
 * @throws IOException
 */
public static <T> void saveData(Path filePath, DataFactory<T> factory, Collection<T> values) throws IOException {
    try (BufferedWriter writer = Files.newWriter(filePath.toFile(), StandardCharsets.UTF_8)) {
        for (T value : values) {
            final String[] fields = factory.serialize(value);
            if (fields == null) {
                continue;
            }
            StringBuilder sb = new StringBuilder();
            for (String field : fields) {
                if (field == null) {
                    sb.append("\t");
                } else {
                    sb.append("\t").append(field);
                }
            }
            writer.write(sb.substring(1));
            writer.write("\n");
        }
    }
}

然后利用上面的方法实现读取类信息（classes.dat）和方法信息（methods.dat），返回存储键值的 Map，源码中多次调用用于搜索。

/**
 * 从 classes.dat 加载类信息
 *
 * @return
 */
public static Map<ClassReference.Handle, ClassReference> loadClasses() {
    try {
        Map<ClassReference.Handle, ClassReference> classMap = new HashMap<>();
        for (ClassReference classReference : loadData(Paths.get("classes.dat"), new ClassReference.Factory())) {
            classMap.put(classReference.getHandle(), classReference);
        }
        return classMap;
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}

/**
 * 从 methods.dat 加载方法信息
 *
 * @return
 */
public static Map<MethodReference.Handle, MethodReference> loadMethods() {
    try {
        Map<MethodReference.Handle, MethodReference> methodMap = new HashMap<>();
        for (MethodReference methodReference : loadData(Paths.get("methods.dat"), new MethodReference.Factory())) {
            methodMap.put(methodReference.getHandle(), methodReference);
        }
        return methodMap;
    } catch (IOException e) {
        throw new RuntimeException(e);
    }
}

2. DataFactory

数据工厂接口，定义数据的存储格式。

public interface DataFactory<T> {
    T parse(String[] fields);
    String[] serialize(T obj);
}

3. ClassReference

定义类信息的描述方式，这些信息具体都使用 asm 访问者记录。

public class ClassReference {
    private final String name;          // 类名
    private final String superClass;    // 父类名
    private final String[] interfaces;  // 接口
    private final boolean isInterface;  // 是否为接口
    private final Member[] members;     // 字段/属性/成员

    public static class Member {
        private final String name;                  // 名称
        private final int modifiers;                // 访问修饰符
        private final ClassReference.Handle type;   // 所属类

        public Member(String name, int modifiers, Handle type) {
            this.name = name;
            this.modifiers = modifiers;
            this.type = type;
        }

        public String getName() {
            return name;
        }

        public int getModifiers() {
            return modifiers;
        }

        public Handle getType() {
            return type;
        }
    }

    public ClassReference(String name, String superClass, String[] interfaces, boolean isInterface, Member[] members) {
        this.name = name;
        this.superClass = superClass;
        this.interfaces = interfaces;
        this.isInterface = isInterface;
        this.members = members;
    }

    public String getName() {
        return name;
    }

    public String getSuperClass() {
        return superClass;
    }

    public String[] getInterfaces() {
        return interfaces;
    }

    public boolean isInterface() {
        return isInterface;
    }

    public Member[] getMembers() {
        return members;
    }

    public Handle getHandle() {
        return new Handle(name);
    }

    public static class Handle {
        private final String name;  // 类名

        public Handle(String name) {
            this.name = name;
        }

        public String getName() {
            return name;
        }

        @Override
        public boolean equals(Object o) {
            if (this == o) return true;
            if (o == null || getClass() != o.getClass()) return false;

            Handle handle = (Handle) o;

            return name != null ? name.equals(handle.name) : handle.name == null;
        }

        @Override
        public int hashCode() {
            return name != null ? name.hashCode() : 0;
        }
    }
...
}

定义类信息的读写格式：类名父类名接口A,接口B,接口C 是否为接口字段1!字段1访问标志!字段1类型!字段2!字段2访问标志!字段2类型

public static class Factory implements DataFactory<ClassReference> {
    @Override
    public ClassReference parse(String[] fields) {
        String[] interfaces;
        if (fields[2].equals("")) {
            interfaces = new String[0];
        } else {
            interfaces = fields[2].split(",");
        }
        String[] memberEntries = fields[4].split("!");
        Member[] members = new Member[memberEntries.length / 3];
        for (int i = 0; i < members.length; i++) {
            members[i] = new Member(memberEntries[3 * i], Integer.parseInt(memberEntries[3 * i + 1]),
                    new ClassReference.Handle(memberEntries[3 * i + 2]));
        }
        return new ClassReference(
                fields[0],
                fields[1].equals("") ? null : fields[1],
                interfaces,
                Boolean.parseBoolean(fields[3]),
                members);
    }

    @Override
    public String[] serialize(ClassReference obj) {
        String interfaces;
        if (obj.interfaces.length > 0) {
            StringBuilder interfacesSb = new StringBuilder();
            for (String iface : obj.interfaces) {
                interfacesSb.append(",").append(iface);
            }
            interfaces = interfacesSb.substring(1);
        } else {
            interfaces = "";
        }
        StringBuilder members = new StringBuilder();
        for (Member member : obj.members) {
            members.append("!").append(member.getName())
                    .append("!").append(Integer.toString(member.getModifiers()))
                    .append("!").append(member.getType().getName());
        }
        return new String[]{
                obj.name,
                obj.superClass,
                interfaces,
                Boolean.toString(obj.isInterface),
                members.length() == 0 ? null : members.substring(1)
        };
    }
}

4. MethodReference

定义方法信息的描述方式，同样使用 asm 访问者记录。

public class MethodReference {
    private final ClassReference.Handle classReference; // 所属类
    private final String name;                          // 方法名
    private final String desc;                          // 描述符
    private final boolean isStatic;                     // 是否为静态方法

    public MethodReference(ClassReference.Handle classReference, String name, String desc, boolean isStatic) {
        this.classReference = classReference;
        this.name = name;
        this.desc = desc;
        this.isStatic = isStatic;
    }

    public ClassReference.Handle getClassReference() {
        return classReference;
    }

    public String getName() {
        return name;
    }

    public String getDesc() {
        return desc;
    }

    public boolean isStatic() {
        return isStatic;
    }

    public Handle getHandle() {
        return new Handle(classReference, name, desc);
    }

    public static class Handle {
        private final ClassReference.Handle classReference; // 所属类
        private final String name;                          // 方法名
        private final String desc;                          // 描述符

        public Handle(ClassReference.Handle classReference, String name, String desc) {
            this.classReference = classReference;
            this.name = name;
            this.desc = desc;
        }

        public ClassReference.Handle getClassReference() {
            return classReference;
        }

        public String getName() {
            return name;
        }

        public String getDesc() {
            return desc;
        }

        @Override
        public boolean equals(Object o) {
            if (this == o) return true;
            if (o == null || getClass() != o.getClass()) return false;

            Handle handle = (Handle) o;

            if (classReference != null ? !classReference.equals(handle.classReference) : handle.classReference != null)
                return false;
            if (name != null ? !name.equals(handle.name) : handle.name != null) return false;
            return desc != null ? desc.equals(handle.desc) : handle.desc == null;
        }

        @Override
        public int hashCode() {
            int result = classReference != null ? classReference.hashCode() : 0;
            result = 31 * result + (name != null ? name.hashCode() : 0);
            result = 31 * result + (desc != null ? desc.hashCode() : 0);
            return result;
        }
    }
...
}

定义方法信息的读写格式：类名方法名方法描述符是否为静态方法

public static class Factory implements DataFactory<MethodReference> {
    @Override
    public MethodReference parse(String[] fields) {
        return new MethodReference(
                new ClassReference.Handle(fields[0]),
                fields[1],
                fields[2],
                Boolean.parseBoolean(fields[3]));
    }

    @Override
    public String[] serialize(MethodReference obj) {
        return new String[] {
                obj.classReference.getName(),
                obj.name,
                obj.desc,
                Boolean.toString(obj.isStatic),
        };
    }
}

5. inheritanceMap

定义继承信息的描述方式，包括 子类->父类集合、父类->子类集合 两个 Map 类型变量，根据类信息得出，具体实现在 InheritanceMap 类中。

public class InheritanceMap {
    private final Map<ClassReference.Handle, Set<ClassReference.Handle>> inheritanceMap;    // 子类->父类集合
    private final Map<ClassReference.Handle, Set<ClassReference.Handle>> subClassMap;       // 父类->子类集合

    /**
     * 构造函数，从 `子类->父类集合` 得出 `父类->子类集合`
     *
     * @param inheritanceMap 继承关系
     */
    public InheritanceMap(Map<ClassReference.Handle, Set<ClassReference.Handle>> inheritanceMap) {
        this.inheritanceMap = inheritanceMap;
        subClassMap = new HashMap<>();
        for (Map.Entry<ClassReference.Handle, Set<ClassReference.Handle>> entry : inheritanceMap.entrySet()) {
            ClassReference.Handle child = entry.getKey();
            for (ClassReference.Handle parent : entry.getValue()) {
                // 如果 key 不存在，则创建，最后返回 value
                subClassMap.computeIfAbsent(parent, k -> new HashSet<>()).add(child);
            }
        }
    }

    public Set<Map.Entry<ClassReference.Handle, Set<ClassReference.Handle>>> entrySet() {
        return inheritanceMap.entrySet();
    }

    /**
     * 返回父类集合
     *
     * @param clazz 目标类
     * @return
     */
    public Set<ClassReference.Handle> getSuperClasses(ClassReference.Handle clazz) {
        Set<ClassReference.Handle> parents = inheritanceMap.get(clazz);
        if (parents == null) {
            return null;
        }
        return Collections.unmodifiableSet(parents);
    }

    /**
     * 判断目标父类是否为目标子类的父类
     *
     * @param clazz      目标子类
     * @param superClass 目标父类
     * @return
     */
    public boolean isSubclassOf(ClassReference.Handle clazz, ClassReference.Handle superClass) {
        Set<ClassReference.Handle> parents = inheritanceMap.get(clazz);
        if (parents == null) {
            return false;
        }
        return parents.contains(superClass);
    }

    /**
     * 返回子类集合
     *
     * @param clazz 目标类
     * @return
     */
    public Set<ClassReference.Handle> getSubClasses(ClassReference.Handle clazz) {
        Set<ClassReference.Handle> subClasses = subClassMap.get(clazz);
        if (subClasses == null) {
            return null;
        }
        return Collections.unmodifiableSet(subClasses);
    }

    /**
     * 存储继承关系：子类->父类集合
     *
     * @throws IOException
     */
    public void save() throws IOException {
        // inheritanceMap.dat 数据格式：
        // 类名 父类或超类或接口类1 父类或超类或接口类2 父类或超类或接口类3 ...
        DataLoader.saveData(Paths.get("inheritanceMap.dat"), new InheritanceMapFactory(), inheritanceMap.entrySet());
    }

    /**
     * 从 inheritanceMap.dat 加载继承关系信息
     *
     * @return
     * @throws IOException
     */
    public static InheritanceMap load() throws IOException {
        Map<ClassReference.Handle, Set<ClassReference.Handle>> inheritanceMap = new HashMap<>();
        for (Map.Entry<ClassReference.Handle, Set<ClassReference.Handle>> entry : DataLoader.loadData(
                Paths.get("inheritanceMap.dat"), new InheritanceMapFactory())) {
            inheritanceMap.put(entry.getKey(), entry.getValue());
        }
        return new InheritanceMap(inheritanceMap);
    }
...
}

定义继承信息的读写格式（仅针对 子类->父类集合）：类名父类/超类/接口类

private static class InheritanceMapFactory implements DataFactory<Map.Entry<ClassReference.Handle, Set<ClassReference.Handle>>> {
    @Override
    public Map.Entry<ClassReference.Handle, Set<ClassReference.Handle>> parse(String[] fields) {
        ClassReference.Handle clazz = new ClassReference.Handle(fields[0]);
        Set<ClassReference.Handle> superClasses = new HashSet<>();
        for (int i = 1; i < fields.length; i++) {
            superClasses.add(new ClassReference.Handle(fields[i]));
        }
        return new AbstractMap.SimpleEntry<>(clazz, superClasses);
    }

    @Override
    public String[] serialize(Map.Entry<ClassReference.Handle, Set<ClassReference.Handle>> obj) {
        final String[] fields = new String[obj.getValue().size()+1];
        fields[0] = obj.getKey().getName();
        int i = 1;
        for (ClassReference.Handle handle : obj.getValue()) {
            fields[i++] = handle.getName();
        }
        return fields;
    }
}

6. InheritanceDeriver

实现继承信息和重写方法信息的收集，存储重写信息时以缩进表示重写方法，具体存储格式在 GadgetChainDiscovery 中给出。

public class InheritanceDeriver {
    private static final Logger LOGGER = LoggerFactory.getLogger(InheritanceDeriver.class);

    /**
     * 获取继承信息：子类->父类集合、父类->子类集合
     *
     * @param classMap 类信息
     * @return
     */
    public static InheritanceMap derive(Map<ClassReference.Handle, ClassReference> classMap) {
        LOGGER.debug("Calculating inheritance for " + (classMap.size()) + " classes...");
        Map<ClassReference.Handle, Set<ClassReference.Handle>> implicitInheritance = new HashMap<>();
        // 遍历所有类
        for (ClassReference classReference : classMap.values()) {
            if (implicitInheritance.containsKey(classReference.getHandle())) {
                throw new IllegalStateException("Already derived implicit classes for " + classReference.getName());
            }
            Set<ClassReference.Handle> allParents = new HashSet<>();

            // 获取 classReference 的所有父类、超类、接口类
            getAllParents(classReference, classMap, allParents);

            // 添加缓存：类名->所有的父类、超类、接口类
            implicitInheritance.put(classReference.getHandle(), allParents);
        }
        return new InheritanceMap(implicitInheritance);
    }

    /**
     * 获取目标类的所有父类、超类、接口类
     *
     * @param classReference 目标类
     * @param classMap       类信息
     * @param allParents     父类、超类、接口类
     */
    private static void getAllParents(ClassReference classReference, Map<ClassReference.Handle, ClassReference> classMap, Set<ClassReference.Handle> allParents) {
        Set<ClassReference.Handle> parents = new HashSet<>();   // 已知当前父类和接口
        // 把当前 classReference 类的所有父类添加到 parents
        if (classReference.getSuperClass() != null) {
            parents.add(new ClassReference.Handle(classReference.getSuperClass()));
        }
        // 把当前 classReference 类实现的所有接口添加到 parents
        for (String iface : classReference.getInterfaces()) {
            parents.add(new ClassReference.Handle(iface));
        }

        // 从类数据集合中，遍历找出 classReference 的直接父类/接口
        for (ClassReference.Handle immediateParent : parents) { // 查找直接父类信息
            ClassReference parentClassReference = classMap.get(immediateParent);
            if (parentClassReference == null) {
                LOGGER.debug("No class id for " + immediateParent.getName());
                continue;
            }

            // 添加到 allParents 父类集合中
            allParents.add(parentClassReference.getHandle());
            // 递归查找，直到把 classReference 类的所有父类、超类、接口类都添加到 allParents
            getAllParents(parentClassReference, classMap, allParents);  // 递归查找父类的父类
        }
    }

    /**
     * 获取类的所有重写方法
     *
     * @param inheritanceMap 继承关系
     * @param methodMap      方法信息
     * @return
     */
    public static Map<MethodReference.Handle, Set<MethodReference.Handle>> getAllMethodImplementations(
            InheritanceMap inheritanceMap, Map<MethodReference.Handle, MethodReference> methodMap) {
        // 存储类的方法，类->方法集合
        Map<ClassReference.Handle, Set<MethodReference.Handle>> methodsByClass = new HashMap<>();
        // 遍历方法信息，获取类->方法集合
        for (MethodReference.Handle method : methodMap.keySet()) {
            ClassReference.Handle classReference = method.getClassReference();  // 获取类
            if (!methodsByClass.containsKey(classReference)) {  // 避免重复
                Set<MethodReference.Handle> methods = new HashSet<>();  // 存储方法
                methods.add(method);
                methodsByClass.put(classReference, methods);
            } else {
                methodsByClass.get(classReference).add(method); // 添加方法
            }
        }

        // 存储继承关系，父类->子类集合
        Map<ClassReference.Handle, Set<ClassReference.Handle>> subClassMap = new HashMap<>();
        for (Map.Entry<ClassReference.Handle, Set<ClassReference.Handle>> entry : inheritanceMap.entrySet()) {
            // 从 子类->父类集合 中取出父类
            for (ClassReference.Handle parent : entry.getValue()) {
                if (!subClassMap.containsKey(parent)) { // 避免重复
                    Set<ClassReference.Handle> subClasses = new HashSet<>();    // 存储子类
                    subClasses.add(entry.getKey());
                    subClassMap.put(parent, subClasses);
                } else {
                    subClassMap.get(parent).add(entry.getKey());    // 添加子类
                }
            }
        }

        // 查找重写方法
        Map<MethodReference.Handle, Set<MethodReference.Handle>> methodImplMap = new HashMap<>();
        // 遍历方法集合
        for (MethodReference method : methodMap.values()) {
            // Static methods cannot be overriden
            if (method.isStatic()) {    // 静态方法不能被重写
                continue;
            }

            // 存储重写方法
            Set<MethodReference.Handle> overridingMethods = new HashSet<>();
            Set<ClassReference.Handle> subClasses = subClassMap.get(method.getClassReference());    // 方法所属类的子类集合
            if (subClasses != null) {
                // 遍历子类
                for (ClassReference.Handle subClass : subClasses) {
                    // This class extends ours; see if it has a matching method
                    Set<MethodReference.Handle> subClassMethods = methodsByClass.get(subClass); // 类的方法集合
                    if (subClassMethods != null) {
                        for (MethodReference.Handle subClassMethod : subClassMethods) {
                            // 判断方法名称和描述符是否相等
                            if (subClassMethod.getName().equals(method.getName()) && subClassMethod.getDesc().equals(method.getDesc())) {
                                overridingMethods.add(subClassMethod);
                            }
                        }
                    }
                }
            }

            // 如果存在重写方法，则保存到 methodImplMap 中
            if (overridingMethods.size() > 0) {
                methodImplMap.put(method.getHandle(), overridingMethods);
            }
        }

        return methodImplMap;
    }
}

7. GraphCall

定义污点在调用关系中的传递信息，指的是被调用方法的参数受调用者方法的参数影响，使用 asm 访问者记录，涉及模拟 JVM 的一些操作，具体实现在 CallGraphDiscovery 类中。

public class GraphCall {
    private final MethodReference.Handle callerMethod;  // 调用者（方法）
    private final MethodReference.Handle targetMethod;  // 被调用者（方法）
    private final int callerArgIndex;   // 调用者（方法）的参数索引
    private final String callerArgPath; // 参数对象的哪个字段被传递
    private final int targetArgIndex;   // 被调用者（方法）的参数索引

    public GraphCall(MethodReference.Handle callerMethod, MethodReference.Handle targetMethod, int callerArgIndex, String callerArgPath, int targetArgIndex) {
        this.callerMethod = callerMethod;
        this.targetMethod = targetMethod;
        this.callerArgIndex = callerArgIndex;
        this.callerArgPath = callerArgPath;
        this.targetArgIndex = targetArgIndex;
    }

    public MethodReference.Handle getCallerMethod() {
        return callerMethod;
    }

    public MethodReference.Handle getTargetMethod() {
        return targetMethod;
    }

    public int getCallerArgIndex() {
        return callerArgIndex;
    }

    public String getCallerArgPath() {
        return callerArgPath;
    }

    public int getTargetArgIndex() {
        return targetArgIndex;
    }

    @Override
    public boolean equals(Object o) {   // 比较方法
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;

        GraphCall graphCall = (GraphCall) o;

        if (callerArgIndex != graphCall.callerArgIndex) return false;
        if (targetArgIndex != graphCall.targetArgIndex) return false;
        if (callerMethod != null ? !callerMethod.equals(graphCall.callerMethod) : graphCall.callerMethod != null)
            return false;
        if (targetMethod != null ? !targetMethod.equals(graphCall.targetMethod) : graphCall.targetMethod != null)
            return false;
        return callerArgPath != null ? callerArgPath.equals(graphCall.callerArgPath) : graphCall.callerArgPath == null;
    }

    @Override
    public int hashCode() { // 存储到键值数据格式中调用的比较方法
        int result = callerMethod != null ? callerMethod.hashCode() : 0;
        result = 31 * result + (targetMethod != null ? targetMethod.hashCode() : 0);
        result = 31 * result + callerArgIndex;
        result = 31 * result + (callerArgPath != null ? callerArgPath.hashCode() : 0);
        result = 31 * result + targetArgIndex;
        return result;
    }
...
}

定义读写格式：父类，父方法，父方法描述符，子类，被调方法，被调方法描述，父方法参数索引，父方法参数名，被调方法参数索引

public static class Factory implements DataFactory<GraphCall> {
    @Override
    public GraphCall parse(String[] fields) {
        return new GraphCall(
                new MethodReference.Handle(new ClassReference.Handle(fields[0]), fields[1], fields[2]),
                new MethodReference.Handle(new ClassReference.Handle(fields[3]), fields[4], fields[5]),
                Integer.parseInt(fields[6]),
                fields[7],
                Integer.parseInt(fields[8]));
    }

    @Override
    public String[] serialize(GraphCall obj) {
        return new String[]{
                obj.callerMethod.getClassReference().getName(), obj.callerMethod.getName(), obj.callerMethod.getDesc(),
                obj.targetMethod.getClassReference().getName(), obj.targetMethod.getName(), obj.targetMethod.getDesc(),
                Integer.toString(obj.callerArgIndex),
                obj.callerArgPath,
                Integer.toString(obj.targetArgIndex),
        };
    }
}

8. Source

定义污点源信息的描述方式，由实现了抽象类 SourceDiscovery 的类搜索和记录。

public class Source {
    private final MethodReference.Handle sourceMethod;  // 所属方法
    private final int taintedArgIndex;                  // 传递污点的参数索引

    public Source(MethodReference.Handle sourceMethod, int taintedArgIndex) {
        this.sourceMethod = sourceMethod;
        this.taintedArgIndex = taintedArgIndex;
    }

    public MethodReference.Handle getSourceMethod() {
        return sourceMethod;
    }

    public int getTaintedArgIndex() {
        return taintedArgIndex;
    }
...
}

定义污点源信息的读写格式：类名方法名方法描述符参数索引

public static class Factory implements DataFactory<Source> {
    @Override
    public Source parse(String[] fields) {
        return new Source(
                new MethodReference.Handle(new ClassReference.Handle(fields[0]), fields[1], fields[2]),
                Integer.parseInt(fields[3])
        );
    }

    @Override
    public String[] serialize(Source obj) {
        return new String[]{
                obj.sourceMethod.getClassReference().getName(), obj.sourceMethod.getName(), obj.sourceMethod.getDesc(),
                Integer.toString(obj.taintedArgIndex),
        };
    }
}

gadgetinspector

实现检测需要实现的抽象类和接口，其他类的解析放到 0x03 工作流程 一节。

1. SerializableDecider

序列化决策者接口，判断类是否可序列化。

/**
 * Represents logic to decide if a class is serializable. The simple case (implemented by
 * {@link SimpleSerializableDecider}) just checks if the class implements serializable. Other use-cases may have more
 * complicated logic.
 */
public interface SerializableDecider extends Function<ClassReference.Handle, Boolean> { // 序列化决策者
}

2. ImplementationFinder

接口，用于查找可序列化的重写方法，即判断方法所属类是否可序列化。

1
2
3

public interface ImplementationFinder {
    Set<MethodReference.Handle> getImplementations(MethodReference.Handle target); // 查找可序列化的重写方法
}

3. SourceDiscovery

抽象类，实现了污点源信息的存储方法，子类需要实现污点源的具体查找方法。

public abstract class SourceDiscovery { // 抽象类

    // 保存找到的污点源
    private final List<Source> discoveredSources = new ArrayList<>();

    /**
     * 添加污点源
     *
     * @param source 污点
     */
    protected final void addDiscoveredSource(Source source) {
        discoveredSources.add(source);
    }


    /**
     * 查找污点源
     *
     * @throws IOException
     */
    public void discover() throws IOException {
        // 加载类信息
        Map<ClassReference.Handle, ClassReference> classMap = DataLoader.loadClasses();
        // 加载函数信息
        Map<MethodReference.Handle, MethodReference> methodMap = DataLoader.loadMethods();
        // 加载继承信息
        InheritanceMap inheritanceMap = InheritanceMap.load();

        // 调用实现类的 discover 方法
        discover(classMap, methodMap, inheritanceMap);
    }

    /**
     * 抽象方法 -> 具体实现
     *
     * @param classMap       类信息
     * @param methodMap      方法信息
     * @param inheritanceMap 继承信息
     */
    public abstract void discover(Map<ClassReference.Handle, ClassReference> classMap,
                                  Map<MethodReference.Handle, MethodReference> methodMap,
                                  InheritanceMap inheritanceMap);

    /**
     * 使用工厂方法存储污点源信息
     *
     * @throws IOException
     */
    public void save() throws IOException {
        DataLoader.saveData(Paths.get("sources.dat"), new Source.Factory(), discoveredSources);
    }
}

gadgetinspector/config

定义配置。

1. GIConfig

配置接口，所有检测实现都必须实现该接口。

public interface GIConfig {
    // 配置名称
    String getName();

    // 序列化决策者
    SerializableDecider getSerializableDecider(Map<MethodReference.Handle, MethodReference> methodMap, InheritanceMap inheritanceMap);

    // 查找可序列化的重写方法
    ImplementationFinder getImplementationFinder(Map<MethodReference.Handle, MethodReference> methodMap,
                                                 Map<MethodReference.Handle, Set<MethodReference.Handle>> methodImplMap,
                                                 InheritanceMap inheritanceMap);

    // 查找污点源
    SourceDiscovery getSourceDiscovery();
}

2. ConfigRepository

定义配置列表，用于返回配置。

public class ConfigRepository {
    // 配置列表
    private static final List<GIConfig> ALL_CONFIGS = Collections.unmodifiableList(Arrays.asList(
            new JavaDeserializationConfig(),        // Java 原生序列化
            new JacksonDeserializationConfig(),     // Jackson（Json）
            new XstreamDeserializationConfig()));   // XStream（XML）

    /**
     * 返回配置
     *
     * @param name 配置名称
     * @return
     */
    public static GIConfig getConfig(String name) {
        for (GIConfig config : ALL_CONFIGS) {
            if (config.getName().equals(name)) {
                return config;
            }
        }
        return null;
    }
}

3. GIConfig 接口实现

JavaDeserializationConfig

配置名称：jserial
序列化决策者：gadgetinspector/javaserial/SimpleSerializableDecider
查找可序列化的重写方法：gadgetinspector/javaserial/SimpleImplementationFinder
查找污点源：gadgetinspector/javaserial/SimpleSourceDiscovery

public class JavaDeserializationConfig implements GIConfig {
    @Override
    public String getName() {
        return "jserial";
    }

    @Override
    public SerializableDecider getSerializableDecider(Map<MethodReference.Handle, MethodReference> methodMap, InheritanceMap inheritanceMap) {
        return new SimpleSerializableDecider(inheritanceMap);
    }

    @Override
    public ImplementationFinder getImplementationFinder(Map<MethodReference.Handle, MethodReference> methodMap,
                                                        Map<MethodReference.Handle, Set<MethodReference.Handle>> methodImplMap,
                                                        InheritanceMap inheritanceMap) {
        return new SimpleImplementationFinder(getSerializableDecider(methodMap, inheritanceMap), methodImplMap);
    }

    @Override
    public SourceDiscovery getSourceDiscovery() {
        return new SimpleSourceDiscovery();
    }
}

JacksonDeserializationConfig

配置名称：jackson
序列化决策者：gadgetinspector/jackson/JacksonSerializableDecider
查找可序列化的重写方法：gadgetinspector/jackson/JacksonImplementationFinder
查找污点源：gadgetinspector/jackson/JacksonSourceDiscovery

XstreamDeserializationConfig

配置名称：xstream
序列化决策者：gadgetinspector/xstream/XstreamSerializableDecider、gadgetinspector/xstream/CustomXstreamSerializableDecider
查找可序列化的重写方法：gadgetinspector/javaserial/SimpleImplementationFinder
查找污点源：gadgetinspector/javaserial/SimpleSourceDiscovery

gadgetinspector/javaserial

针对 Java 原生序列化的反序列化利用链检测实现。

1. SimpleSerializableDecider

实现 SerializableDecider 接口，判断类是否可序列化。

public class SimpleSerializableDecider implements SerializableDecider {
    private final Map<ClassReference.Handle, Boolean> cache = new HashMap<>();  // 缓存判断结果，类->是否可序列化
    private final InheritanceMap inheritanceMap;    // 继承信息

    public SimpleSerializableDecider(InheritanceMap inheritanceMap) {
        this.inheritanceMap = inheritanceMap;
    }

    /**
     * 判断类是否可以序列化，并将判断结果添加到缓存
     *
     * @param handle 类
     * @return
     */
    @Override
    public Boolean apply(ClassReference.Handle handle) {
        Boolean cached = cache.get(handle);
        if (cached != null) {
            return cached;
        }

        Boolean result = applyNoCache(handle);

        cache.put(handle, result);
        return result;
    }

    /**
     * 判断类是否可以序列化
     *
     * @param handle 类
     * @return
     */
    private Boolean applyNoCache(ClassReference.Handle handle) {

        // 判断类是否在黑名单内
        if (isBlacklistedClass(handle)) {
            return false;
        }

        // 判断是否有直接或间接实现 java/io/Serializable 序列化接口
        if (inheritanceMap.isSubclassOf(handle, new ClassReference.Handle("java/io/Serializable"))) {
            return true;
        }

        return false;
    }

    /**
     * 判断类是否在黑名单内
     *
     * @param clazz 类
     * @return
     */
    private static boolean isBlacklistedClass(ClassReference.Handle clazz) {
        if (clazz.getName().startsWith("com/google/common/collect/")) {
            return true;
        }

        // Serialization of these classes has been disabled since clojure 1.9.0
        // https://github.com/clojure/clojure/commit/271674c9b484d798484d134a5ac40a6df15d3ac3
        if (clazz.getName().equals("clojure/core/proxy$clojure/lang/APersistentMap$ff19274a")
                || clazz.getName().equals("clojure/inspector/proxy$javax/swing/table/AbstractTableModel$ff19274a")) {
            return true;
        }

        return false;
    }
}

2. SimpleImplementationFinder

实现 ImplementationFinder 接口，返回目标方法的可序列化重写方法（包括目标方法本身）。

public class SimpleImplementationFinder implements ImplementationFinder {
    private final SerializableDecider serializableDecider;  // 序列化决策者
    private final Map<MethodReference.Handle, Set<MethodReference.Handle>> methodImplMap;   // 重写方法

    public SimpleImplementationFinder(SerializableDecider serializableDecider, Map<MethodReference.Handle, Set<MethodReference.Handle>> methodImplMap) {
        this.serializableDecider = serializableDecider;
        this.methodImplMap = methodImplMap;
    }

    @Override
    public Set<MethodReference.Handle> getImplementations(MethodReference.Handle target) {
        // 存储可序列化的重写方法
        Set<MethodReference.Handle> allImpls = new HashSet<>();

        // Assume that the target method is always available, even if not serializable; the target may just be a local
        // instance rather than something an attacker can control.
        allImpls.add(target);   // 默认认为目标方法可序列化

        // 遍历重写方法
        Set<MethodReference.Handle> subClassImpls = methodImplMap.get(target);
        if (subClassImpls != null) {
            for (MethodReference.Handle subClassImpl : subClassImpls) {
                // 判断是否可序列化
                if (Boolean.TRUE.equals(serializableDecider.apply(subClassImpl.getClassReference()))) {
                    allImpls.add(subClassImpl); // 添加到 allImpls
                }
            }
        }

        return allImpls;
    }
}

3. SimpleSourceDiscovery

继承 SourceDiscovery 抽象类，实现具体的污点源查找方法 discover。遍历类信息和方法信息，根据定义的 5 条规则搜索污点源：

方法所属类可以序列化，且方法为无参数 void 类型的 finalize 方法
方法所属类可以序列化，且方法为接受 ObjectInputStream 类型参数的 void 类型的 readObject 方法
类可以序列化，且为 InvocationHandler 的子类
方法所属类可以序列化，且方法为无参数 int 类型的 hashCode 方法或接受 Object 类型参数的 boolean 类型的 equals 方法
方法所属类可以序列化，且该类为 groovy Closure 的子类、方法为 call 或 doCall

public class SimpleSourceDiscovery extends SourceDiscovery {
    @Override
    public void discover(Map<ClassReference.Handle, ClassReference> classMap,
                         Map<MethodReference.Handle, MethodReference> methodMap,
                         InheritanceMap inheritanceMap) {

        // 序列化决策者，用于判断类是否可序列化
        final SerializableDecider serializableDecider = new SimpleSerializableDecider(inheritanceMap);

        // 遍历方法
        for (MethodReference.Handle method : methodMap.keySet()) {
            // 判断所属类是否可序列化
            if (Boolean.TRUE.equals(serializableDecider.apply(method.getClassReference()))) {
                // 如果是 finalize 方法则认为是受污染的
                if (method.getName().equals("finalize") && method.getDesc().equals("()V")) {
                    addDiscoveredSource(new Source(method, 0));
                }
            }
        }

        // 遍历方法，和上面的类似（可以合并）
        // If a class implements readObject, the ObjectInputStream passed in is considered tainted
        for (MethodReference.Handle method : methodMap.keySet()) {
            if (Boolean.TRUE.equals(serializableDecider.apply(method.getClassReference()))) {
                // 如果所属类实现了 readObject，则传入的 ObjectInputStream 参数被认为是受污染的
                if (method.getName().equals("readObject") && method.getDesc().equals("(Ljava/io/ObjectInputStream;)V")) {
                    addDiscoveredSource(new Source(method, 1));
                }
            }
        }

        // 遍历类
        // Using the proxy trick, anything extending serializable and invocation handler is tainted.
        for (ClassReference.Handle clazz : classMap.keySet()) {
            // 判断类是否可序列化，且是否为 InvocationHandler 的子类
            if (Boolean.TRUE.equals(serializableDecider.apply(clazz))
                    && inheritanceMap.isSubclassOf(clazz, new ClassReference.Handle("java/lang/reflect/InvocationHandler"))) {
                // 使用代理时，任何扩展 InvocationHandler 的类都被认为受污染
                MethodReference.Handle method = new MethodReference.Handle(
                        clazz, "invoke", "(Ljava/lang/Object;Ljava/lang/reflect/Method;[Ljava/lang/Object;)Ljava/lang/Object;");

                addDiscoveredSource(new Source(method, 0));
            }
        }

        // 遍历方法，和上面的类似（可以合并）
        // hashCode() or equals() are accessible entry points using standard tricks of putting those objects
        // into a HashMap.
        for (MethodReference.Handle method : methodMap.keySet()) {
            if (Boolean.TRUE.equals(serializableDecider.apply(method.getClassReference()))) {
                // 如果是 hashCode 方法则认为是受污染的（注意描述符）
                if (method.getName().equals("hashCode") && method.getDesc().equals("()I")) {
                    addDiscoveredSource(new Source(method, 0));
                }
                // 如果是 equals 方法则认为是受污染的（注意描述符）
                if (method.getName().equals("equals") && method.getDesc().equals("(Ljava/lang/Object;)Z")) {
                    addDiscoveredSource(new Source(method, 0));
                    addDiscoveredSource(new Source(method, 1));
                }
            }
        }

        // 遍历方法，和上面的类似（可以合并）
        // Using a comparator proxy, we can jump into the call() / doCall() method of any groovy Closure and all the
        // args are tainted.
        // https://github.com/frohoff/ysoserial/blob/master/src/main/java/ysoserial/payloads/Groovy1.java
        for (MethodReference.Handle method : methodMap.keySet()) {
            // 使用比较器代理，可以跳转到任何 groovy Closure 的 call()/doCall() 方法，所有的参数都被污染
            if (Boolean.TRUE.equals(serializableDecider.apply(method.getClassReference()))
                    && inheritanceMap.isSubclassOf(method.getClassReference(), new ClassReference.Handle("groovy/lang/Closure"))
                    && (method.getName().equals("call") || method.getName().equals("doCall"))) {

                addDiscoveredSource(new Source(method, 0));
                Type[] methodArgs = Type.getArgumentTypes(method.getDesc());
                for (int i = 0; i < methodArgs.length; i++) {
                    addDiscoveredSource(new Source(method, i + 1));
                }
            }
        }
    }

    public static void main(String[] args) throws Exception {
        SourceDiscovery sourceDiscovery = new SimpleSourceDiscovery();
        sourceDiscovery.discover();
        sourceDiscovery.save();
    }
}

0x03 工作流程

准备工作
- 配置 log4j 向控制台输出日志
- 配置 config 为默认值 jserial（Java 反序列化）
- 接受参数：--resume 保留 .dat 文件、--config 指定分析类型
- 根据参数读取 war/jar 包路径，返回 URLClassLoader
- 初始化类枚举加载器 ClassResourceEnumerator
MethodDiscovery：类信息、方法信息、继承信息
- classes.dat：类名、父类名、类接口名、是否为接口、类的所有字段（成员）
- methods.dat：类名、方法名、描述符、是否为静态方法
- inheritanceMap.dat：类名、父类/超类/接口类（直接/间接父类）
PassthroughDiscovery：数据流信息，即方法参数是否能够影响其返回值
- 如果存在方法将参数传递给被调方法时，需要先判断被调方法返回值与被调方法参数的关系。
- passthrough.dat：类名、方法名、方法描述符、污点参数索引
CallGraphDiscovery：方法调用关系信息
- callgraph.dat：方法所属类名，方法名，方法描述符，被调方法所属类名，被调方法名，被调方法描述，方法参数索引，方法参数对象的字段名称，被调方法参数索引
SourceDiscovery：查找污点源
- sources.dat：类名，方法名，描述符，参数索引
GadgetChainDiscovery：重写信息、利用链信息
- methodimpl.dat：类名，方法名，描述符
- gadget-chains.txt：类名.方法名描述符 (参数索引)

1. Util

根据 java 包路径列表返回 URLClassLoader，后续用于读取相应的 java 包（war、jar）。

public class Util {
    private static final Logger LOGGER = LoggerFactory.getLogger(Util.class);

    /**
     * 根据 war 包路径列表，构造并返回 URLClassLoader
     *
     * @param warPath 包路径列表
     * @return
     * @throws IOException
     */
    public static ClassLoader getWarClassLoader(Path warPath) throws IOException {
        // 创建临时文件夹
        final Path tmpDir = Files.createTempDirectory("exploded-war");
        // Delete the temp directory at shutdown
        Runtime.getRuntime().addShutdownHook(new Thread(() -> {
            try {
                deleteDirectory(tmpDir);
            } catch (IOException e) {
                LOGGER.error("Error cleaning up temp directory " + tmpDir.toString(), e);
            }
        }));

        // 复制到临时文件夹
        // Extract to war to the temp directory
        try (JarInputStream jarInputStream = new JarInputStream(Files.newInputStream(warPath))) {
            JarEntry jarEntry;
            while ((jarEntry = jarInputStream.getNextJarEntry()) != null) {
                Path fullPath = tmpDir.resolve(jarEntry.getName());
                if (!jarEntry.isDirectory()) {
                    Path dirName = fullPath.getParent();
                    if (dirName == null) {
                        throw new IllegalStateException("Parent of item is outside temp directory.");
                    }
                    if (!Files.exists(dirName)) {
                        Files.createDirectories(dirName);
                    }
                    try (OutputStream outputStream = Files.newOutputStream(fullPath)) {
                        copy(jarInputStream, outputStream);
                    }
                }
            }
        }

        // 存储包路径
        final List<URL> classPathUrls = new ArrayList<>();
        classPathUrls.add(tmpDir.resolve("WEB-INF/classes").toUri().toURL());
        Files.list(tmpDir.resolve("WEB-INF/lib")).forEach(p -> {
            try {
                classPathUrls.add(p.toUri().toURL());
            } catch (MalformedURLException e) {
                throw new RuntimeException(e);
            }
        });
        URLClassLoader classLoader = new URLClassLoader(classPathUrls.toArray(new URL[classPathUrls.size()]));
        return classLoader;
    }

    /**
     * 根据 jar 包路径列表，构造并返回 URLClassLoader
     *
     * @param jarPaths 包路径列表
     * @return
     * @throws IOException
     */
    public static ClassLoader getJarClassLoader(Path... jarPaths) throws IOException {
        // 存储包路径
        final List<URL> classPathUrls = new ArrayList<>(jarPaths.length);
        // 遍历包路径列表
        for (Path jarPath : jarPaths) {
            if (!Files.exists(jarPath) || Files.isDirectory(jarPath)) { // 查找文件
                throw new IllegalArgumentException("Path \"" + jarPath + "\" is not a path to a file.");
            }
            classPathUrls.add(jarPath.toUri().toURL()); //
        }
        URLClassLoader classLoader = new URLClassLoader(classPathUrls.toArray(new URL[classPathUrls.size()]));
        return classLoader;
    }

    /**
     * Recursively delete the directory root and all its contents
     *
     * @param root Root directory to be deleted
     * @throws IOException
     */
    public static void deleteDirectory(Path root) throws IOException {
        Files.walkFileTree(root, new SimpleFileVisitor<Path>() {
            @Override
            public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException {
                Files.delete(file);
                return FileVisitResult.CONTINUE;
            }

            @Override
            public FileVisitResult postVisitDirectory(Path dir, IOException exc) throws IOException {
                Files.delete(dir);
                return FileVisitResult.CONTINUE;
            }
        });
    }

    /**
     * Copy inputStream to outputStream. Neither stream is closed by this method.
     *
     * @param inputStream
     * @param outputStream
     * @throws IOException
     */
    public static void copy(InputStream inputStream, OutputStream outputStream) throws IOException {
        final byte[] buffer = new byte[4096];
        int n;
        while ((n = inputStream.read(buffer)) > 0) {
            outputStream.write(buffer, 0, n);
        }
    }
}

2. ClassResourceEnumerator

定义类资源接口。

public static interface ClassResource {
    public InputStream getInputStream() throws IOException; // 读取文件
    public String getName();    // 文件名
}

类资源具体实现：

PathClassResource：直接从路径读取类文件，用于通过 JRT 文件系统读取路径下的类文件（运行时）
ClassLoaderClassResource：使用已有的 ClassLoader 读取类文件

// 从路径读取类文件
private static class PathClassResource implements ClassResource {
    private final Path path;

    private PathClassResource(Path path) {
        this.path = path;
    }

    @Override
    public InputStream getInputStream() throws IOException {
        return Files.newInputStream(path);
    }

    @Override
    public String getName() {
        return path.toString();
    }
}

// 使用 ClassLoader 读取类文件
private static class ClassLoaderClassResource implements ClassResource {
    private final ClassLoader classLoader;
    private final String resourceName;

    private ClassLoaderClassResource(ClassLoader classLoader, String resourceName) {
        this.classLoader = classLoader;
        this.resourceName = resourceName;
    }

    @Override
    public InputStream getInputStream() throws IOException {
        return classLoader.getResourceAsStream(resourceName);
    }
    
    @Override
    public String getName() {
        return resourceName;
    }
}

返回运行时的所有类和指定 java 包中的类，这里的运行时类指 JDK 中的类。

/**
 * 返回 java 运行时的类和指定的 java 包中的类
 *
 * @return
 * @throws IOException
 */
public Collection<ClassResource> getAllClasses() throws IOException {
    // 先加载运行时类（bootstrap classes）
    Collection<ClassResource> result = new ArrayList<>(getRuntimeClasses());
    // 使用 ClassLoader 加载用户指定的 java 包
    for (ClassPath.ClassInfo classInfo : ClassPath.from(classLoader).getAllClasses()) {
        result.add(new ClassLoaderClassResource(classLoader, classInfo.getResourceName()));
    }
    return result;
}

/**
 * 返回运行时的类
 *
 * @return
 * @throws IOException
 */
private Collection<ClassResource> getRuntimeClasses() throws IOException {
    // Java8 及以前的运行时类可以通过读取 rt.jar 文件获取
    // A hacky way to get the current JRE's rt.jar. Depending on the class loader, rt.jar may be in the
    // bootstrap classloader so all the JDK classes will be excluded from classpath scanning with this!
    // However, this only works up to Java 8, since after that Java uses some crazy module magic.
    URL stringClassUrl = Object.class.getResource("String.class");
    URLConnection connection = stringClassUrl.openConnection();
    Collection<ClassResource> result = new ArrayList<>();
    if (connection instanceof JarURLConnection) {
        URL runtimeUrl = ((JarURLConnection) connection).getJarFileURL();
        URLClassLoader classLoader = new URLClassLoader(new URL[]{runtimeUrl});
        for (ClassPath.ClassInfo classInfo : ClassPath.from(classLoader).getAllClasses()) {
            result.add(new ClassLoaderClassResource(classLoader, classInfo.getResourceName()));
        }
        return result;
    }
    // Java9 及以后的运行时类通过 JRT 文件系统读取路径下的类文件
    // https://stackoverflow.com/questions/1240387/where-are-the-java-system-packages-stored/53897006#53897006
    // Try finding all the JDK classes using the Java9+ modules method:
    try {
        FileSystem fs = FileSystems.getFileSystem(URI.create("jrt:/"));
        Files.walk(fs.getPath("/")).forEach(p -> {
            if (p.toString().toLowerCase().endsWith(".class")) {
                result.add(new PathClassResource(p));
            }
        });
    } catch (ProviderNotFoundException e) {
        // Do nothing; this is expected on versions below Java9
    }
    return result;
}

3. GadgetInspector

程序入口 main，先做一些准备工作，然后分 5 步走挖掘利用链。

首先判断参数是否为空，使用 Gadget Inspector 至少要指定一个待分析的 java 包，若参数为空则打印使用帮助。

if (args.length == 0) {
    printUsage();   // 打印使用帮助
    System.exit(1);
}

配置日志输出、.dat 文件保留、挖掘类型。

// 配置 log4j 用于输出日志
configureLogging();

// 是否保留所有的 .dat 文件
boolean resume = false;

// 挖掘类型，默认为 java 原生序列化
GIConfig config = ConfigRepository.getConfig("jserial");    // 实现 SerializableDecider、ImplementationFinder、SourceDiscovery

解析参数，可选参数包括：

--resume：是否保留文件，默认不保留
--config xxx：指定挖掘类型，默认 Java 原生序列化 jserial

int argIndex = 0;
while (argIndex < args.length) {
    String arg = args[argIndex];
    if (!arg.startsWith("--")) {
        break;
    }
    if (arg.equals("--resume")) {
        // 保留 .dat 文件
        resume = true;
    } else if (arg.equals("--config")) {
        // 指定挖掘类型
        config = ConfigRepository.getConfig(args[++argIndex]);
        if (config == null) {
            throw new IllegalArgumentException("Invalid config name: " + args[argIndex]);
        }
    } else {
        throw new IllegalArgumentException("Unexpected argument: " + arg);
    }
    argIndex += 1;
}

根据参数读取 war 包或 jar 包，可以指定 1 个 war 包或多个 jar 包。

// 实际上是 URLClassLoader
final ClassLoader classLoader;

// 对指定文件根据 war、spring-boot jar、普通 jar 包的方式载入对于字节码文件，并返回 URLClassLoader
if (args.length == argIndex + 1 && args[argIndex].toLowerCase().endsWith(".war")) {
    // 构造 war 文件路径
    Path path = Paths.get(args[argIndex]);
    LOGGER.info("Using WAR classpath: " + path);
    // 实现为 URLClassLoader，加载 war 包下的 WEB-INF/lib 和 WEB-INF/classes
    classLoader = Util.getWarClassLoader(path);
} else {
    // 构造 jar 文件路径，可配置多个
    final Path[] jarPaths = new Path[args.length - argIndex];
    for (int i = 0; i < args.length - argIndex; i++) {
        Path path = Paths.get(args[argIndex + i]).toAbsolutePath();
        if (!Files.exists(path)) {
            throw new IllegalArgumentException("Invalid jar path: " + path);
        }
        jarPaths[i] = path;
    }
    LOGGER.info("Using classpath: " + Arrays.toString(jarPaths));
    // 实现为 URLClassLoader，加载所有指定的 jar
    classLoader = Util.getJarClassLoader(jarPaths);
}

使用上面得到的 ClassLoader 初始化类枚举加载器。

1	final ClassResourceEnumerator classResourceEnumerator = new ClassResourceEnumerator(classLoader);

根据 resume 变量的值决定是否删除 .dat 文件，挖掘到的利用链存储在 gadget-chains.txt 中。

if (!resume) {
    // Delete all existing dat files
    LOGGER.info("Deleting stale data...");
    // 挖掘到的利用链存储在 gadget-chains.txt 中，不删除
    for (String datFile : Arrays.asList("classes.dat", "methods.dat", "inheritanceMap.dat",
            "passthrough.dat", "callgraph.dat", "sources.dat", "methodimpl.dat")) {
        final Path path = Paths.get(datFile);
        if (Files.exists(path)) {
            Files.delete(path);
        }
    }
}

挖掘过程中判断是否存在 .dat 文件，核心步骤如下：

// Perform the various discovery steps
if (!Files.exists(Paths.get("classes.dat")) || !Files.exists(Paths.get("methods.dat"))
        || !Files.exists(Paths.get("inheritanceMap.dat"))) {
    LOGGER.info("Running method discovery...");
    MethodDiscovery methodDiscovery = new MethodDiscovery();
    methodDiscovery.discover(classResourceEnumerator);
    methodDiscovery.save(); // 保存类信息、方法信息、继承信息
}

if (!Files.exists(Paths.get("passthrough.dat"))) {
    LOGGER.info("Analyzing methods for passthrough dataflow...");
    PassthroughDiscovery passthroughDiscovery = new PassthroughDiscovery();
    passthroughDiscovery.discover(classResourceEnumerator, config);
    passthroughDiscovery.save();    // 保存数据流信息（方法参数和返回值的关系信息）
}

if (!Files.exists(Paths.get("callgraph.dat"))) {
    LOGGER.info("Analyzing methods in order to build a call graph...");
    CallGraphDiscovery callGraphDiscovery = new CallGraphDiscovery();
    callGraphDiscovery.discover(classResourceEnumerator, config);
    callGraphDiscovery.save();  // 保存调用关系信息（调用者方法与被调方法之间的参数传递）
}

if (!Files.exists(Paths.get("sources.dat"))) {
    LOGGER.info("Discovering gadget chain source methods...");
    SourceDiscovery sourceDiscovery = config.getSourceDiscovery();
    sourceDiscovery.discover();
    sourceDiscovery.save(); // 保存污点源信息
}

{
    LOGGER.info("Searching call graph for gadget chains...");
    GadgetChainDiscovery gadgetChainDiscovery = new GadgetChainDiscovery(config);
    gadgetChainDiscovery.discover();    // 保存重写信息、利用链信息
}

核心步骤看起来有多简单实际实现就有多复杂（不是），下面就展开核心步骤的内容。

4. MethodDiscovery

discover 方法主要完成的是读取类文件并利用 asm 的访问者记录类信息、方法信息。

/**
 * 使用访问者记录类信息和方法信息
 *
 * @param classResourceEnumerator 类枚举器
 * @throws Exception
 */
public void discover(final ClassResourceEnumerator classResourceEnumerator) throws Exception {
    // 遍历所有的类
    for (ClassResourceEnumerator.ClassResource classResource : classResourceEnumerator.getAllClasses()) {
        try (InputStream in = classResource.getInputStream()) { // 读取类文件
            ClassReader cr = new ClassReader(in);   // 创建 ClassReader，后续调用 accept 方法解析类文件
            try {
                // 继承 asm 的 ClassVisitor(MethodVisitor) 实现对类文件的观察，记录类信息和方法信息
                // 重写方法的调用顺序（没有重写的调用默认方法）：visit -> visitField -> visitMethod -> visitEnd
                cr.accept(new MethodDiscoveryClassVisitor(), ClassReader.EXPAND_FRAMES);    // 以扩展格式访问堆栈映射帧
            } catch (Exception e) {
                LOGGER.error("Exception analyzing: " + classResource.getName(), e);
            }
        }
    }
}

MethodDiscoveryClassVisitor 类继承了 asm 中的 ClassVisitor，重写了四个访问者方法。

private String name;            // 类的内部名称
private String superName;       // 父类的内部名称
private String[] interfaces;    // 类接口的内部名称
boolean isInterface;            // 是否为接口
private List<ClassReference.Member> members;    // 类的所有字段
private ClassReference.Handle classHandle;      // 引用

private MethodDiscoveryClassVisitor() throws SQLException {
    super(Opcodes.ASM6);
}

visit 方法在类访问的开始时调用（即 ClassReader.accept 调用的第一个访问者方法），记录类名、父类名、接口名、是否为接口，创建动态数组用于在 visitField 中记录字段信息。

@Override
public void visit(int version, int access, String name, String signature, String superName, String[] interfaces) {  // 类访问开始（调用的第一个方法）
    // 记录类信息
    this.name = name;
    this.superName = superName;
    this.interfaces = interfaces;
    this.isInterface = (access & Opcodes.ACC_INTERFACE) != 0;
    this.members = new ArrayList<>();   // 字段信息（成员）
    this.classHandle = new ClassReference.Handle(name); // 当前类

    // 调用父类方法
    super.visit(version, access, name, signature, superName, interfaces);
}

visitField 方法用于记录类的字段信息，包括名称、访问标志、类型，根据访问标志 access 判断是否为静态变量，因为静态变量不可控所以不当作可能的污点。

public FieldVisitor visitField(int access, String name, String desc,    // 访问字段
                               String signature, Object value) {
    if ((access & Opcodes.ACC_STATIC) == 0) { // 跳过静态成员
        Type type = Type.getType(desc); // 类型
        String typeName;
        if (type.getSort() == Type.OBJECT || type.getSort() == Type.ARRAY) {    // 对象或数组
            typeName = type.getInternalName();  // 内部名称
        } else {
            typeName = type.getDescriptor();    // 描述符
        }
        // 记录字段信息，保存到 members
        members.add(new ClassReference.Member(name, access, new ClassReference.Handle(typeName)));
    }

    // 调用父类方法
    return super.visitField(access, name, desc, signature, value);
}

visitMethod 方法用于记录方法信息，包括所属类名、方法名、描述符、是否为静态方法，同样根据访问标志判断是否为静态方法。

@Override
public MethodVisitor visitMethod(int access, String name, String desc, String signature, String[] exceptions) { // 访问方法
    boolean isStatic = (access & Opcodes.ACC_STATIC) != 0;  // 是否为静态方法
    
    // 记录方法信息，保存到 discoveredMethods
    discoveredMethods.add(new MethodReference(
            classHandle,    // 所属类
            name,
            desc,
            isStatic));

    // 调用父类方法
    return super.visitMethod(access, name, desc, signature, exceptions);
}

visitEnd 方法在类访问结束时调用，（即 ClassReader.accept 调用的最后一个访问者方法），此时类的字段信息已经记录完毕，可以记录下完整的类信息，包括类名、父类名、接口名、是否为接口、字段信息。

@Override
public void visitEnd() {    // 类访问结束（调用的最后一个方法）
    ClassReference classReference = new ClassReference(
            name,
            superName,
            interfaces,
            isInterface,
            members.toArray(new ClassReference.Member[members.size()])); // 把所有找到的字段封装
    
    // 记录类信息，保存到 discoveredClasses
    discoveredClasses.add(classReference);
    
    // 调用父类方法
    super.visitEnd();
}

save 方法存储收集到的类信息和方法信息，同时调用 InheritanceDeriver.derive 获取继承信息并保存。

/**
 * 使用工厂方法存储数据
 *
 * @throws IOException
 */
public void save() throws IOException {
    // classes.dat 数据格式：
    // 类名 父类名 接口A,接口B,接口C 是否为接口 字段1!字段1描述符!字段1类型!字段2!字段2描述符!字段2类型
    DataLoader.saveData(Paths.get("classes.dat"), new ClassReference.Factory(), discoveredClasses);
    
    // methods.dat 数据格式：
    // 类名 方法名 方法描述符 是否为静态方法
    DataLoader.saveData(Paths.get("methods.dat"), new MethodReference.Factory(), discoveredMethods);
    
    // 形成 类名(ClassReference.Handle)->类(ClassReference) 的映射关系
    Map<ClassReference.Handle, ClassReference> classMap = new HashMap<>();
    for (ClassReference clazz : discoveredClasses) {
        classMap.put(clazz.getHandle(), clazz);
    }
    
    // 对上面的类信息进行递归整合，得到 `子类->父类集合` 的继承信息，保存到 inheritanceMap.dat
    InheritanceDeriver.derive(classMap).save();
}

5. PassthroughDiscovery

discover 方法主要执行了三个步骤：① 搜索方法调用信息，即每个方法都调用了哪些方法；② 将调用信息进行逆拓扑排序，为了便于后续分析；③ 分析每个方法的参数，判断是否能够传递污染，即方法的返回结果是否可以被其参数影响。

例如以下两个方法中，foo 方法的返回结果可以被参数控制，而 bar 方法的返回结果无法被控制。因此如果污点（攻击者的输入数据）走到 bar 方法就不能再继续下去了。

String foo(String v) {
    return v;
}

String bar(String v) {
    return "test";
}

discover 方法的具体实现如下：

// 方法调用信息：方法->调用的方法集合
private final Map<MethodReference.Handle, Set<MethodReference.Handle>> methodCalls = new HashMap<>();
// 数据流信息：方法->传递污染的参数索引
private Map<MethodReference.Handle, Set<Integer>> passthroughDataflow;

/**
 * 得到每个方法能够传递污染的参数（索引）集合
 *
 * @param classResourceEnumerator 类枚举器
 * @param config                  配置
 * @throws IOException
 */
public void discover(final ClassResourceEnumerator classResourceEnumerator, final GIConfig config) throws IOException {
    // 加载方法信息
    Map<MethodReference.Handle, MethodReference> methodMap = DataLoader.loadMethods();
    // 加载类信息
    Map<ClassReference.Handle, ClassReference> classMap = DataLoader.loadClasses();
    // 加载继承信息（inheritanceMap：子类->父类集合，subClassMap：父类->子类集合）
    InheritanceMap inheritanceMap = InheritanceMap.load();
    
    // 搜索方法的调用关系（methodCalls）并得到 `类名->类资源` 映射集合
    Map<String, ClassResourceEnumerator.ClassResource> classResourceByName = discoverMethodCalls(classResourceEnumerator);
    
    // 对方法的调用关系进行逆拓扑排序
    List<MethodReference.Handle> sortedMethods = topologicallySortMethodCalls();
    
    // 分析每个方法能够传递污染的参数
    // classResourceByName  类资源集合
    // classMap             类信息
    // inheritanceMap       继承信息
    // sortedMethods        方法集合（经逆拓扑排序）
    // SerializableDecider  序列化决策者
    passthroughDataflow = calculatePassthroughDataflow(classResourceByName, classMap, inheritanceMap, sortedMethods,
            config.getSerializableDecider(methodMap, inheritanceMap));
}

discoverMethodCalls 方法利用 asm 的访问者记录方法调用的方法集合信息，同时存储类名和类资源的映射关系。

/**
 * 搜索方法调用信息：方法->被调用方法集合
 * 存储类资源映射信息：类名->类资源
 *
 * @param classResourceEnumerator 类枚举器
 * @return
 * @throws IOException
 */
private Map<String, ClassResourceEnumerator.ClassResource> discoverMethodCalls(final ClassResourceEnumerator classResourceEnumerator) throws IOException {
    // 类名->类资源
    Map<String, ClassResourceEnumerator.ClassResource> classResourcesByName = new HashMap<>();
    
    // 遍历所有的类
    for (ClassResourceEnumerator.ClassResource classResource : classResourceEnumerator.getAllClasses()) {
        try (InputStream in = classResource.getInputStream()) { // 读取类文件
            ClassReader cr = new ClassReader(in);   // 创建 ClassReader，后续调用 accept 方法解析类文件
            try {
                // 继承 asm 的 ClassVisitor(MethodVisitor) 实现对类文件的观察
                MethodCallDiscoveryClassVisitor visitor = new MethodCallDiscoveryClassVisitor(Opcodes.ASM6);
                // 重写方法的调用顺序（没有重写的调用默认方法）：visit -> visitMethod -> visitEnd
                cr.accept(visitor, ClassReader.EXPAND_FRAMES);
                // 存储 `类名(String)->类资源(ClassResource)` 的映射关系
                classResourcesByName.put(visitor.getName(), classResource);
            } catch (Exception e) {
                LOGGER.error("Error analyzing: " + classResource.getName(), e);
            }
        }
    }
    return classResourcesByName;
}

MethodCallDiscoveryClassVisitor 类继承了 asm 中的 ClassVisitor，重写了三个访问者方法，并实现了一个返回类名的方法。

private class MethodCallDiscoveryClassVisitor extends ClassVisitor {
    public MethodCallDiscoveryClassVisitor(int api) {   // 访问者实现的 ASM API 版本，必须是 Opcodes.
        super(api);
    }

    private String name = null; // 类名

    // 返回类名
    public String getName() {
        return name;
    }
...
}

visit 方法只记录了当前访问的类的名称。

@Override
public void visit(int version, int access, String name, String signature,
                  String superName, String[] interfaces) {
    // 调用父类方法
    super.visit(version, access, name, signature, superName, interfaces);
    
    if (this.name != null) {
        throw new IllegalStateException("ClassVisitor already visited a class!");
    }
    
    // 记录类名
    this.name = name;
}

visitMethod 方法使用 MethodCallDiscoveryMethodVisitor 类（继承了 asm 中的 MethodVisitor）观察方法，并调用 JSRInlinerAdapter 简化代码分析。

@Override
public MethodVisitor visitMethod(int access, String name, String desc,
                                 String signature, String[] exceptions) {
    MethodVisitor mv = super.visitMethod(access, name, desc, signature, exceptions);
    // 创建 MethodCallDiscoveryMethodVisitor 观察方法
    MethodCallDiscoveryMethodVisitor modelGeneratorMethodVisitor = new MethodCallDiscoveryMethodVisitor(
            api, mv, this.name, name, desc);
    
    // 简化代码分析，删除 JSR 指令并内联引用的子例程
    return new JSRInlinerAdapter(modelGeneratorMethodVisitor, access, name, desc, signature, exceptions);
}

visitEnd 方法直接调用的父类方法，在这里不重写应该可以。

@Override
public void visitEnd() {
    super.visitEnd();
}

MethodCallDiscoveryMethodVisitor 类继承了 asm 中的 MethodVisitor，只重写了 visitMethodInsn 方法，用于访问调用方法的指令；利用 calledMethods 记录当前访问的方法调用的所有方法，然后记录到 methodCalls 变量中，这里不注意点就看混了😵。

private class MethodCallDiscoveryMethodVisitor extends MethodVisitor {
    // 方法调用的方法集合
    private final Set<MethodReference.Handle> calledMethods;

    /**
     * 方法访问者构造函数
     *
     * @param api   ASM API 版本
     * @param mv    MethodVisitor 实例
     * @param owner 方法所属类的类名
     * @param name  方法的名称
     * @param desc  方法的描述符
     */
    public MethodCallDiscoveryMethodVisitor(final int api, final MethodVisitor mv,
                                            final String owner, String name, String desc) {
        super(api, mv);
        // 调用的方法集合，初始化
        this.calledMethods = new HashSet<>();
        // 存储到 PassthroughDiscovery 的 methodCalls 中
        methodCalls.put(new MethodReference.Handle(new ClassReference.Handle(owner), name, desc), calledMethods);
    }

    /**
     * 访问方法指令
     * 方法指令是调用方法的指令
     *
     * @param opcode 调用操作码：INVOKEVIRTUAL, INVOKESPECIAL, INVOKESTATIC, INVOKEINTERFACE
     * @param owner  被调用的方法所属类的类名
     * @param name   被调用的方法
     * @param desc   被调用方法的描述符
     * @param itf    被调用的类是否为接口
     */
    @Override
    public void visitMethodInsn(int opcode, String owner, String name, String desc, boolean itf) {
        // 记录调用的方法，存储到 MethodCallDiscoveryMethodVisitor 的 calledMethods 中
        calledMethods.add(new MethodReference.Handle(new ClassReference.Handle(owner), name, desc));
        super.visitMethodInsn(opcode, owner, name, desc, itf);
    }
}

topologicallySortMethodCalls 方法对方法调用的方法集合进行逆拓扑排序，用于后续判断方法参数与返回值的关系，举个例子：

方法 parentMethod 在返回前调用了 Obj.childMethod，因为 Obj.childMethod 的参数 carg 与返回值有关，同时 parentMethod 将其返回值作为自己的返回结果，所以最后可以判定 parentMethod 的参数 arg 和返回值有关。

因此要先判断子方法返回值与子方法参数的关系，再判断父方法返回值与参数的关系，这样才能判断方法参数与返回值的关系。

public String parentMethod(String arg){
    String vul = Obj.childMethod(arg);
    return vul;
}

public String childMethod(String carg){
    return carg.toString();
}

为了实现先判断子方法后判断父方法，这里就需要进行逆拓扑排序，逆拓扑排序使用栈实现，变量 dfsStack 和 visitedNodes 用于避免形成环，同时 visitedNodes 还可以避免重复排序，具体的排序操作由 dfsTsort 实现，所有方法调用整合为一个集合。

/**
 * 对方法的调用关系进行逆拓扑排序（按名称逆序）
 *
 * @return
 */
private List<MethodReference.Handle> topologicallySortMethodCalls() {
    // 拷贝方法调用的方法集合
    Map<MethodReference.Handle, Set<MethodReference.Handle>> outgoingReferences = new HashMap<>();
    for (Map.Entry<MethodReference.Handle, Set<MethodReference.Handle>> entry : methodCalls.entrySet()) {
        MethodReference.Handle method = entry.getKey(); // 方法
        outgoingReferences.put(method, new HashSet<>(entry.getValue()));    // 调用的方法集合
    }
    
    // Topological sort methods
    LOGGER.debug("Performing topological sort...");
    Set<MethodReference.Handle> dfsStack = new HashSet<>();     // 避免形成环
    Set<MethodReference.Handle> visitedNodes = new HashSet<>(); // 在调用链出现重合时，避免重复排序
    List<MethodReference.Handle> sortedMethods = new ArrayList<>(outgoingReferences.size());    // 方法调用集合
    for (MethodReference.Handle root : outgoingReferences.keySet()) {
        // 遍历集合中的起始方法，进行递归搜索（DFS），经过逆拓扑排序，调用链的最末端排在最前面，
        // 后续进行参数、返回值、调用链之间的污点传递分析
        dfsTsort(outgoingReferences, sortedMethods, visitedNodes, dfsStack, root);
    }
    LOGGER.debug(String.format("Outgoing references %d, sortedMethods %d", outgoingReferences.size(), sortedMethods.size()));
    
    // 逆拓扑排序后的方法调用集合
    return sortedMethods;
}

/**
 * 逆拓扑排序的具体实现
 *
 * @param outgoingReferences 方法调用的方法集合
 * @param sortedMethods      逆拓扑排序后的方法集合
 * @param visitedNodes       已排序的方法
 * @param stack              栈
 * @param node               待排序的起始方法
 */
private static void dfsTsort(Map<MethodReference.Handle, Set<MethodReference.Handle>> outgoingReferences,
                             List<MethodReference.Handle> sortedMethods, Set<MethodReference.Handle> visitedNodes,
                             Set<MethodReference.Handle> stack, MethodReference.Handle node) {
    // 防止在遍历一条调用链中进入循环
    if (stack.contains(node)) {
        return;
    }

    // 防止对某个方法及被调方法重复排序
    if (visitedNodes.contains(node)) {
        return;
    }

    // 根据起始方法，取出被调用的方法集合
    Set<MethodReference.Handle> outgoingRefs = outgoingReferences.get(node);
    if (outgoingRefs == null) {
        return;
    }

    stack.add(node);    // 入栈，避免递归死循环
    for (MethodReference.Handle child : outgoingRefs) { // 对被调用方法递归进行排序
        dfsTsort(outgoingReferences, sortedMethods, visitedNodes, stack, child);
    }
    stack.remove(node); // 出栈，方法排序完毕
    visitedNodes.add(node);     // 记录已访问的方法，在递归遇到重复方法时可以跳过
    sortedMethods.add(node);    // 记录已排序的方法
}

最后使用 calculatePassthroughDataflow 方法判断每个方法的返回值与参数关系，首先跳过静态代码块，然后利用 asm 的访问者对逆拓扑排序得到的方法集合进行遍历和分析判断。

静态代码块在类加载时调用，只执行一次，且优先于主函数。

/**
 * 分析方法调用集合，获取数据流信息：方法->传递污染的参数索引
 *
 * @param classResourceByName 类资源集合
 * @param classMap            类信息
 * @param inheritanceMap      继承信息
 * @param sortedMethods       所有方法集合（经过逆拓扑排序）
 * @param serializableDecider 序列化决策者
 * @return
 * @throws IOException
 */
private static Map<MethodReference.Handle, Set<Integer>> calculatePassthroughDataflow(Map<String, ClassResourceEnumerator.ClassResource> classResourceByName,
                                                                                      Map<ClassReference.Handle, ClassReference> classMap,
                                                                                      InheritanceMap inheritanceMap,
                                                                                      List<MethodReference.Handle> sortedMethods,
                                                                                      SerializableDecider serializableDecider) throws IOException {
    // 数据流信息：方法、传递污染的参数索引
    final Map<MethodReference.Handle, Set<Integer>> passthroughDataflow = new HashMap<>();

    // 遍历所有方法
    for (MethodReference.Handle method : sortedMethods) {
        // 跳过 static 静态初始化代码（静态代码块）
        if (method.getName().equals("<clinit>")) {
            continue;
        }

        // 获取方法所属类的类资源
        ClassResourceEnumerator.ClassResource classResource = classResourceByName.get(method.getClassReference().getName());
        try (InputStream inputStream = classResource.getInputStream()) {    // 读取类文件
            ClassReader cr = new ClassReader(inputStream);  // 创建 ClassReader，后续调用 accept 方法解析类文件
            try {
                /**
                 * classMap             类信息
                 * inheritanceMap       继承信息
                 * passthroughDataflow  数据流信息，初始为空
                 * serializableDecider  序列化决策者
                 * Opcodes.ASM6         ASM API 版本
                 * method               待观察的方法
                 */
                // 继承 asm 的 ClassVisitor(MethodVisitor) 实现对类文件的观察，记录类信息和方法信息
                PassthroughDataflowClassVisitor cv = new PassthroughDataflowClassVisitor(classMap, inheritanceMap,
                        passthroughDataflow, serializableDecider, Opcodes.ASM6, method);

                // 重写方法的调用顺序（没有重写的调用默认方法）：visit -> visitMethod
                cr.accept(cv, ClassReader.EXPAND_FRAMES);

                // 缓存方法的哪些参数会影响返回值
                passthroughDataflow.put(method, cv.getReturnTaint());
            } catch (Exception e) {
                LOGGER.error("Exception analyzing " + method.getClassReference().getName(), e);
            }
        } catch (IOException e) {
            LOGGER.error("Unable to analyze " + method.getClassReference().getName(), e);
        }
    }
    return passthroughDataflow;
}

PassthroughDataflowClassVisitor 继承了 asm 中的 ClassVisitor，重写 visit 记录方法所属类的名称，重写 visitMethod 对待观察的方法用 PassthroughDataflowMethodVisitor 判断返回值与参数的关系，方法 getReturnTaint 返回能够传递污染的参数索引集合。

private static class PassthroughDataflowClassVisitor extends ClassVisitor {
    Map<ClassReference.Handle, ClassReference> classMap;    // 类信息
    private final MethodReference.Handle methodToVisit;     // 待观察的方法
    private final InheritanceMap inheritanceMap;            // 继承信息
    private final Map<MethodReference.Handle, Set<Integer>> passthroughDataflow;    // 数据流信息：方法->传递污染的参数索引
    private final SerializableDecider serializableDecider;  // 序列化决策者
    
    private String name;    // 类名
    private PassthroughDataflowMethodVisitor passthroughDataflowMethodVisitor;  // 方法访问者
    
    public PassthroughDataflowClassVisitor(Map<ClassReference.Handle, ClassReference> classMap,
                                           InheritanceMap inheritanceMap, Map<MethodReference.Handle, Set<Integer>> passthroughDataflow,
                                           SerializableDecider serializableDecider, int api, MethodReference.Handle methodToVisit) {
        super(api); // ASM API 版本
        this.classMap = classMap;
        this.inheritanceMap = inheritanceMap;
        this.methodToVisit = methodToVisit;
        this.passthroughDataflow = passthroughDataflow;
        this.serializableDecider = serializableDecider;
    }

    @Override
    public void visit(int version, int access, String name, String signature,
                      String superName, String[] interfaces) {
        super.visit(version, access, name, signature, superName, interfaces);
        this.name = name;   // 记录类名
        
        // 不是待观察方法的所属类
        if (!this.name.equals(methodToVisit.getClassReference().getName())) {
            throw new IllegalStateException("Expecting to visit " + methodToVisit.getClassReference().getName() + " but instead got " + this.name);
        }
    }

    @Override
    public MethodVisitor visitMethod(int access, String name, String desc,
                                     String signature, String[] exceptions) {
        // 不是待观察方法
        if (!name.equals(methodToVisit.getName()) || !desc.equals(methodToVisit.getDesc())) {
            return null;
        }
        if (passthroughDataflowMethodVisitor != null) {
            throw new IllegalStateException("Constructing passthroughDataflowMethodVisitor twice!");
        }

        // 调用父类方法，返回新的方法观察者
        // 如果类观察者的 cv 变量为空，则返回 null，否则返回 cv.visitMethod
        MethodVisitor mv = super.visitMethod(access, name, desc, signature, exceptions);

        // 创建方法访问者，判断方法返回值与参数的关系
        // 重写方法的调用顺序（没有重写的调用默认方法）：visitCode -> visitInsn -> visitFieldInsn -> visitMethodInsn
        passthroughDataflowMethodVisitor = new PassthroughDataflowMethodVisitor(
                classMap, inheritanceMap, this.passthroughDataflow, serializableDecider,
                api, mv, this.name, access, name, desc, signature, exceptions);

        // 简化代码分析，删除 JSR 指令并内联引用的子例程
        return new JSRInlinerAdapter(passthroughDataflowMethodVisitor, access, name, desc, signature, exceptions);
    }
    // 返回能够传递污染的参数索引集合
    public Set<Integer> getReturnTaint() {
        if (passthroughDataflowMethodVisitor == null) {
            throw new IllegalStateException("Never constructed the passthroughDataflowmethodVisitor!");
        }
        return passthroughDataflowMethodVisitor.returnTaint;
    }
}

PassthroughDataflowMethodVisitor 继承 TaintTrackingMethodVisitor 实现，重写了其中的 4 个访问者方法：

visitCode：启动对方法代码的访问，把参数全部存到本地变量表
visitInsn：访问零操作数的指令，这里只分析返回指令
visitFieldInsn：访问字段指令，字段指令是加载或存储对象字段值的指令
visitMethodInsn：访问方法指令，方法指令是调用方法的指令

但是 TaintTrackingMethodVisitor 继承 asm 的 MethodVisitor 并重写了大量的方法，模拟 JVM 在处理方法调用中的本地变量表和操作数栈，因此实际调用的访问者方法来自 PassthroughDataflowMethodVisitor、TaintTrackingMethodVisitor、MethodVisitor 三个类。模拟是根据对字节码指令和 JVM 的了解手动进行实现（救命），先解析这里的 4 个重写方法。

数据流信息 passthroughDataflow 初始为空，集合变量 returnTaint 用于记录传递污染的参数索引。

private static class PassthroughDataflowMethodVisitor extends TaintTrackingMethodVisitor<Integer> {
    private final Map<ClassReference.Handle, ClassReference> classMap;              // 类信息
    private final InheritanceMap inheritanceMap;                                    // 继承信息
    private final Map<MethodReference.Handle, Set<Integer>> passthroughDataflow;    // 数据流信息：方法->传递污染的参数索引
    private final SerializableDecider serializableDecider;                          // 序列化决策者
    
    private final int access;               // 访问标志
    private final String desc;              // 描述符
    private final Set<Integer> returnTaint; // 能够传递污染的参数索引集合
    
    public PassthroughDataflowMethodVisitor(Map<ClassReference.Handle, ClassReference> classMap,
                                            InheritanceMap inheritanceMap, Map<MethodReference.Handle,
            Set<Integer>> passthroughDataflow, SerializableDecider serializableDeciderMap, int api, MethodVisitor mv,
                                            String owner, int access, String name, String desc, String signature, String[] exceptions) {
        super(inheritanceMap, passthroughDataflow, api, mv, owner, access, name, desc, signature, exceptions);
        this.classMap = classMap;
        this.inheritanceMap = inheritanceMap;
        this.passthroughDataflow = passthroughDataflow;
        this.serializableDecider = serializableDeciderMap;
        this.access = access;
        this.desc = desc;
        returnTaint = new HashSet<>();
    }
...
}

visitCode 方法将被访问的方法参数记录到本地变量表中，如果是非静态方法，则添加隐式参数 this。

@Override
public void visitCode() {   // 启动对方法代码的访问
    // 调用 TaintTrackingMethodVisitor.visitCode 初始化本地变量表
    super.visitCode();

    // 记录参数到本地变量表 savedVariableState.localVars
    int localIndex = 0;
    int argIndex = 0;
    // 非静态方法，第一个参数（隐式）为对象实例 this
    if ((this.access & Opcodes.ACC_STATIC) == 0) {
        // 调用 TaintTrackingMethodVisitor.setLocalTaint 添加到本地变量表
        setLocalTaint(localIndex, argIndex);
        localIndex += 1;
        argIndex += 1;
    }

    // 遍历参数，根据描述符得出参数类型（占用空间大小）
    for (Type argType : Type.getArgumentTypes(desc)) {
        // 调用 TaintTrackingMethodVisitor.setLocalTaint 添加到本地变量表
        setLocalTaint(localIndex, argIndex);
        localIndex += argType.getSize();
        argIndex += 1;
    }
}

visitInsn 方法将存储在栈顶的返回值（传递污染的参数索引集合，可能为空）中的元素添加到 returnTaint。

@Override
public void visitInsn(int opcode) { // 访问零操作数指令
    // 方法执行完毕后将从栈返回结果给调用者，因此栈顶即返回值
    // 存储可能被污染的返回值到 returnTaint
    switch (opcode) {
        case Opcodes.IRETURN:   // 从当前方法返回 int
        case Opcodes.FRETURN:   // 从当前方法返回 float
        case Opcodes.ARETURN:   // 从当前方法返回对象引用
            // 调用 TaintTrackingMethodVisitor.getStackTaint 读取栈顶，大小为 1（32位）
            returnTaint.addAll(getStackTaint(0));   // 栈空间从内存高位到低位分配空间
            break;
        case Opcodes.LRETURN:   // 从当前方法返回 long
        case Opcodes.DRETURN:   // 从当前方法返回 double
            // 调用 TaintTrackingMethodVisitor.getStackTaint 读取栈顶，大小为 2（64位）
            returnTaint.addAll(getStackTaint(1));
            break;
        case Opcodes.RETURN:    // 从当前方法返回 void
            break;
        default:
            break;
    }

    // 调用 TaintTrackingMethodVisitor.visitInsn 进行出/入栈操作
    super.visitInsn(opcode);
}

visitFieldInsn 方法在读取或存储对象字段的值时调用，这里判断字段是否可序列化，如果可序列化则认为方法所属类的实例对象本身或被调用方法所属类的实例对象是受污染的，将其传递污染的参数索引集合存储到 taint 变量中。
因为可能读取的是方法所属类的实例对象字段，也可能是其他对象，其他对象得通过方法调用读取字段，涉及到方法调用方法，具体见 visitMethodInsn 方法中的分析。最后将栈顶（读取字段的返回值）设置为 taint，这里可能是空的 HashSet。

@Override
public void visitFieldInsn(int opcode, String owner, String name, String desc) {    // 访问字段指令，字段指令是加载或存储对象字段值的指令。
    // 方法执行过程中可能访问对象字段，访问前会进行入栈操作
    switch (opcode) {
        case Opcodes.GETSTATIC: // 获取类的静态字段
            break;
        case Opcodes.PUTSTATIC: // 设置类的静态字段
            break;
        case Opcodes.GETFIELD:  // 获取对象字段
            Type type = Type.getType(desc); // 字段类型
            if (type.getSize() == 1) {
                Boolean isTransient = null; // 如果字段被 transient 关键字修饰，则不可序列化

                // 判断读取的字段所属类是否可序列化，即字段是否可以序列化
                // If a field type could not possibly be serialized, it's effectively transient
                if (!couldBeSerialized(serializableDecider, inheritanceMap, new ClassReference.Handle(type.getInternalName()))) {
                    isTransient = Boolean.TRUE;
                } else {
                    // 若读取的字段所属类可序列化
                    ClassReference clazz = classMap.get(new ClassReference.Handle(owner));
                    while (clazz != null) {
                        // 遍历类的所有字段
                        for (ClassReference.Member member : clazz.getMembers()) {
                            // 是否为目标字段
                            if (member.getName().equals(name)) {
                                // 是否被 transient 关键字修饰
                                isTransient = (member.getModifiers() & Opcodes.ACC_TRANSIENT) != 0;
                                break;
                            }
                        }
                        if (isTransient != null) {
                            break;
                        }
                        // 若找不到目标字段，则向上查找（超类）
                        clazz = classMap.get(new ClassReference.Handle(clazz.getSuperClass()));
                    }
                }
 
                // 能够传递污染的参数索引集合
                Set<Integer> taint;
                if (!Boolean.TRUE.equals(isTransient)) {
                    // 若字段没有被 transient 修饰，则调用 TaintTrackingMethodVisitor.getStackTaint 读取栈顶
                    // 取出的是 this 或某实例对象，即字段所属实例
                    taint = getStackTaint(0);
                } else {
                    // 否则为空
                    taint = new HashSet<>();
                }

                // 调用 TaintTrackingMethodVisitor.visitFieldInsn 进行出/入栈操作
                super.visitFieldInsn(opcode, owner, name, desc);

                // 调用 TaintTrackingMethodVisitor.setStackTaint 将栈顶设置为 taint
                setStackTaint(0, taint);
                return;
            }
            break;
        case Opcodes.PUTFIELD:  // 设置对象字段
            break;
        default:
            throw new IllegalStateException("Unsupported opcode: " + opcode);
    }

    // 调用 TaintTrackingMethodVisitor.visitFieldInsn 进行出/入栈操作
    super.visitFieldInsn(opcode, owner, name, desc);

visitMethodInsn 方法在方法调用方法时调用（绕口令呢😅）

首先记录被调用方法的参数类型（列表），根据是否为静态方法添加第一个隐式参数（被调用方法所属类的实例对象）
然后记录被调用方法的返回值类型长度（0~2），用于最后存储索引集合
模拟被调用方法的操作数栈，如果是构造方法则认为隐式参数能够传递污染，如果被调用方法在已经分析的数据流信息中则直接取出相应的参数索引集合，保存到 resultTaint 变量中
调用父类方法 TaintTrackingMethodVisitor.visitMethodInsn 执行真正的出/入栈模拟，然后将参数索引集合存储到栈顶
最后根据被调用方法的返回值类型长度将 resultTaint 也合并到栈顶

调用方法时会创建新的栈帧存储用到的相关数据，因此当调用到 visitMethodInsn 时会创建新的栈帧，其操作数栈中是被调用方法的参数（而不是当前方法）。

@Override
public void visitMethodInsn(int opcode, String owner, String name, String desc, boolean itf) {  // 访问方法指令，方法指令是调用方法的指令。
    // 根据描述符得出被调用方法的参数类型（占用空间大小）
    Type[] argTypes = Type.getArgumentTypes(desc);

    // 非静态方法的第一个参数是对象本身，即 this
    if (opcode != Opcodes.INVOKESTATIC) {
        Type[] extendedArgTypes = new Type[argTypes.length + 1];
        System.arraycopy(argTypes, 0, extendedArgTypes, 1, argTypes.length);
        extendedArgTypes[0] = Type.getObjectType(owner);    // 对象类型
        argTypes = extendedArgTypes;
    }

    // 根据描述符获取被调用方法的返回值类型大小
    int retSize = Type.getReturnType(desc).getSize();
    // 能够传递污染的参数索引集合
    Set<Integer> resultTaint;
    switch (opcode) {
        case Opcodes.INVOKESTATIC:      // 调用静态方法
        case Opcodes.INVOKEVIRTUAL:     // 调用实例方法
        case Opcodes.INVOKESPECIAL:     // 调用超类构造方法，实例初始化方法，私有方法
        case Opcodes.INVOKEINTERFACE:   // 调用接口方法
            // 模拟操作数栈
            final List<Set<Integer>> argTaint = new ArrayList<Set<Integer>>(argTypes.length);
            // 调用方法前先把操作数入栈
            for (int i = 0; i < argTypes.length; i++) {
                argTaint.add(null);
            }

            // 记录数据起始位置
            int stackIndex = 0;
            for (int i = 0; i < argTypes.length; i++) {
                Type argType = argTypes[i];
                if (argType.getSize() > 0) {
                    // 根据参数类型的大小，调用 TaintTrackingMethodVisitor.getStackTaint 读取栈中的值
                    // 参数从右往左入栈，这里将参数值拷贝到 argTaint
                    argTaint.set(argTypes.length - 1 - i, getStackTaint(stackIndex + argType.getSize() - 1));
                }
                stackIndex += argType.getSize();
            }

            // 如果被调用的是构造方法，则认为被调用方法所属类的实例对象本身可以传递污染
            if (name.equals("<init>")) {
                // Pass result taint through to original taint set; the initialized object is directly tainted by
                // parameters
                resultTaint = argTaint.get(0);  // 从栈顶取出对象，实际上是该对象的参数索引集合
            } else {
                resultTaint = new HashSet<>();  // 否则初始化为空
            }

            // 经过逆拓扑排序，调用链末端的方法先被访问和判断，即被调用方法已经被判断过
            // 例如 A->B，判断 A 时 B 已经有判断结果了，并且此时栈中的数据是这样：B对象 B参数
            Set<Integer> passthrough = passthroughDataflow.get(new MethodReference.Handle(new ClassReference.Handle(owner), name, desc));
            // 如果被调用方法存在能够传递污染的参数
            if (passthrough != null) {
                // 遍历参数索引
                for (Integer passthroughDataflowArg : passthrough) {
                    // 从栈中获取能够传递污染的参数索引集合，全部添加到 resultTaint
                    resultTaint.addAll(argTaint.get(passthroughDataflowArg));
                }
            }
            break;
        default:
            throw new IllegalStateException("Unsupported opcode: " + opcode);
    }

    // 调用 TaintTrackingMethodVisitor.visitMethodInsn 执行出/入栈操作，根据预定义的判断规则分析参数索引集合
    super.visitMethodInsn(opcode, owner, name, desc, itf);

    // 返回值不为空
    // 实例对象本身有可能传递污染，因此不能直接根据返回值判断（即不能最先执行这一块）
    if (retSize > 0) {  // 1 或者 2
        // 调用 TaintTrackingMethodVisitor.getStackTaint 将 resultTaint 中的元素合并到参数索引集合中
        // 这里减 1 是因为在 TaintTrackingMethodVisitor.visitMethodInsn 中已经将第一个单位的值设置为其分析得到的参数索引集合
        getStackTaint(retSize - 1).addAll(resultTaint);
    }
}

save 方法和 load 方法使用工厂方法实现数据的存取。

/**
 * 使用工厂方法存储存储数据流信息
 *
 * @throws IOException
 */
public void save() throws IOException {
    if (passthroughDataflow == null) {
        throw new IllegalStateException("Save called before discover()");
    }
    DataLoader.saveData(Paths.get("passthrough.dat"), new PassThroughFactory(), passthroughDataflow.entrySet());
}

/**
 * 从 passthrough.dat 加载数据流信息
 *
 * @return
 * @throws IOException
 */
public static Map<MethodReference.Handle, Set<Integer>> load() throws IOException {
    Map<MethodReference.Handle, Set<Integer>> passthroughDataflow = new HashMap<>();
    for (Map.Entry<MethodReference.Handle, Set<Integer>> entry : DataLoader.loadData(Paths.get("passthrough.dat"), new PassThroughFactory())) {
        passthroughDataflow.put(entry.getKey(), entry.getValue());
    }
    return passthroughDataflow;
}

/**
 * 数据工厂接口实现
 */
public static class PassThroughFactory implements DataFactory<Map.Entry<MethodReference.Handle, Set<Integer>>> {
    @Override
    public Map.Entry<MethodReference.Handle, Set<Integer>> parse(String[] fields) {
        ClassReference.Handle clazz = new ClassReference.Handle(fields[0]);
        MethodReference.Handle method = new MethodReference.Handle(clazz, fields[1], fields[2]);

        Set<Integer> passthroughArgs = new HashSet<>();
        for (String arg : fields[3].split(",")) {
            if (arg.length() > 0) {
                passthroughArgs.add(Integer.parseInt(arg));
            }
        }
        return new AbstractMap.SimpleEntry<>(method, passthroughArgs);
    }

    @Override
    public String[] serialize(Map.Entry<MethodReference.Handle, Set<Integer>> entry) {
        if (entry.getValue().size() == 0) {
            return null;
        }

        final String[] fields = new String[4];
        fields[0] = entry.getKey().getClassReference().getName();   // 方法所属类的类名
        fields[1] = entry.getKey().getName();   // 方法的名称
        fields[2] = entry.getKey().getDesc();   // 方法的描述符

        StringBuilder sb = new StringBuilder();
        for (Integer arg : entry.getValue()) {
            sb.append(Integer.toString(arg));
            sb.append(",");
        }
        fields[3] = sb.toString();  // 参数索引

        return fields;
    }
}

6. TaintTrackingMethodVisitor

继承 asm 的 MethodVisitor，模拟 JVM 内存结构，即本地变量表 localVars 和操作数栈 stackVars；重写了大量方法模拟调用参数时的出/入栈操作，用于进行污点分析。

private static class SavedVariableState<T> {
    List<Set<T>> localVars; // 本地变量表
    List<Set<T>> stackVars; // 操作数栈

    public SavedVariableState() {
        localVars = new ArrayList<>();
        stackVars = new ArrayList<>();
    }

    public SavedVariableState(SavedVariableState<T> copy) {
        this.localVars = new ArrayList<>(copy.localVars.size());
        this.stackVars = new ArrayList<>(copy.stackVars.size());
        for (Set<T> original : copy.localVars) {
            this.localVars.add(new HashSet<>(original));
        }
        for (Set<T> original : copy.stackVars) {
            this.stackVars.add(new HashSet<>(original));
        }
    }

    public void combine(SavedVariableState<T> copy) {
        for (int i = 0; i < copy.localVars.size(); i++) {
            while (i >= this.localVars.size()) {
                this.localVars.add(new HashSet<T>());
            }
            this.localVars.get(i).addAll(copy.localVars.get(i));
        }
        for (int i = 0; i < copy.stackVars.size(); i++) {
            while (i >= this.stackVars.size()) {
                this.stackVars.add(new HashSet<T>());
            }
            this.stackVars.get(i).addAll(copy.stackVars.get(i));
        }
    }
}

预定义了一些数据流信息：类名，方法名，方法描述符，传递污染的参数索引。

private static final Object[][] PASSTHROUGH_DATAFLOW = new Object[][]{
        {"java/lang/Object", "toString", "()Ljava/lang/String;", 0},

        // Taint from ObjectInputStream. Note that defaultReadObject() is handled differently below
        {"java/io/ObjectInputStream", "readObject", "()Ljava/lang/Object;", 0},
        {"java/io/ObjectInputStream", "readFields", "()Ljava/io/ObjectInputStream$GetField;", 0},
        {"java/io/ObjectInputStream$GetField", "get", "(Ljava/lang/String;Ljava/lang/Object;)Ljava/lang/Object;", 0},

        // Pass taint from class name to returned class
        {"java/lang/Object", "getClass", "()Ljava/lang/Class;", 0},
        {"java/lang/Class", "forName", "(Ljava/lang/String;)Ljava/lang/Class;", 0},

        // Pass taint from class or method name to returned method
        {"java/lang/Class", "getMethod", "(Ljava/lang/String;[Ljava/lang/Class;)Ljava/lang/reflect/Method;", 0, 1},
        // Pass taint from class to methods
        {"java/lang/Class", "getMethods", "()[Ljava/lang/reflect/Method;", 0},

        {"java/lang/StringBuilder", "<init>", "(Ljava/lang/String;)V", 0, 1},
        {"java/lang/StringBuilder", "<init>", "(Ljava/lang/CharSequence;)V", 0, 1},
        {"java/lang/StringBuilder", "append", "(Ljava/lang/Object;)Ljava/lang/StringBuilder;", 0, 1},
        {"java/lang/StringBuilder", "append", "(Ljava/lang/String;)Ljava/lang/StringBuilder;", 0, 1},
        {"java/lang/StringBuilder", "append", "(Ljava/lang/StringBuffer;)Ljava/lang/StringBuilder;", 0, 1},
        {"java/lang/StringBuilder", "append", "(Ljava/lang/CharSequence;)Ljava/lang/StringBuilder;", 0, 1},
        {"java/lang/StringBuilder", "append", "(Ljava/lang/CharSequence;II)Ljava/lang/StringBuilder;", 0, 1},
        {"java/lang/StringBuilder", "toString", "()Ljava/lang/String;", 0},

        {"java/io/ByteArrayInputStream", "<init>", "([B)V", 1},
        {"java/io/ByteArrayInputStream", "<init>", "([BII)V", 1},
        {"java/io/ObjectInputStream", "<init>", "(Ljava/io/InputStream;)V", 1},
        {"java/io/File", "<init>", "(Ljava/lang/String;I)V", 1},
        {"java/io/File", "<init>", "(Ljava/lang/String;Ljava/io/File;)V", 1},
        {"java/io/File", "<init>", "(Ljava/lang/String;)V", 1},
        {"java/io/File", "<init>", "(Ljava/lang/String;Ljava/lang/String;)V", 1},

        {"java/nio/paths/Paths", "get", "(Ljava/lang/String;[Ljava/lang/String;)Ljava/nio/file/Path;", 0},

        {"java/net/URL", "<init>", "(Ljava/lang/String;)V", 1},
};

问题：这里实现的 visitMethodInsn 比 PassthroughDataflowMethodVisitor.visitMethodInsn 多三个判断规则，后面 CallGraphDiscovery 中的 ModelGeneratorMethodVisitor 也重写了该方法并在最后调用该父类方法，为什么不直接剥离出来？

PassthroughDataflowMethodVisitor 中存储的是参数索引，而 ModelGeneratorMethodVisitor 中存储的是 arg参数索引.字段名称
出入栈操作都在 TaintTrackingMethodVisitor 中实现
经过该方法的模拟，栈顶元素即该方法能够传递污染的参数索引集合

举个例子看一看字节码：

public class Main {
    public void main(String[] args) {
        String cmd = new A().method1(args[0]);
    }
    public void getValue(Integer number) {
        String s = "100";
        Integer n = Integer.parseInt(s);

        n = number;
        String value = n.toString();
    }
}

class A {
    public String method1(String param) {
        return param;
    }
}

getValue 部分的字节码，用空行分隔了上面四条语句的字节码，出现的字节码指令包括：
ldc 从常量池加载数据到操作数栈，astore 从栈顶弹出并存储到本地变量表，aload 从本地变量表加载数据到操作数栈，invokestatic 调用类方法（静态），invokevirtual 调用实例方法，return 从当前方法返回 void。

public void getValue(java.lang.Integer);
  descriptor: (Ljava/lang/Integer;)V
  flags: ACC_PUBLIC
  Code:
    stack=1, locals=5, args_size=2

       0: ldc           #5                  // String 100 入栈
       2: astore_2                          // 出栈

       3: aload_2                           // 入栈，invokestatic 的参数，执行完毕后结果入栈
       4: invokestatic  #6                  // Method java/lang/Integer.parseInt:(Ljava/lang/String;)I
       7: invokestatic  #7                  // Method java/lang/Integer.valueOf:(I)Ljava/lang/Integer;
      10: astore_3                          // 出栈，存储执行结果

      11: aload_1                           // 入栈，参数 number
      12: astore_3                          // 出栈

      13: aload_3                           // 入栈，invokevirtual 的参数
      14: invokevirtual #8                  // Method java/lang/Integer.toString:()Ljava/lang/String; 执行完毕后结果入栈
      17: astore        4                   // 出栈
      19: return                            // 返回 void
    LineNumberTable:
      line 6: 0
      line 7: 3
      line 9: 11
      line 10: 13
      line 11: 19

在调用方法前，进行参数的入栈，即创建一个新的栈帧，执行完毕后继续执行下一条指令。实际上这部分的模拟不是很懂，要说汇编语言倒还会看，但是 Java 字节码也还没到那么底层，我的理解是调用函数就会创建一个栈帧，执行完毕后从系统栈弹出栈帧，那么返回结果存入上一个栈帧的操作数栈栈顶？回头等我搞明白了再补两张图…

7. CallGraphDiscovery

discover 方法利用之前得到的类信息、方法信息、继承/重写信息、数据流信息，结合 asm 访问者分析被调方法的参数是否会被调用者方法的参数所影响。

以下面 getValue 方法为例，调用了 parseInt 和 toString 两个方法，但是参数 number 只会影响到 toString。因此如果污点（攻击者的输入数据）走到 getValue 方法且参数 number 是可控的（即上一步分析能够传递污染），那么进一步只需要检查 toString 方法，而 parseInt 方法就不用再检查了。

public void getValue(Integer number) {
    String s = "100";
    Integer n = Integer.parseInt(s);
    
    n = number;
    String value = n.toString();
}

discover 方法的具体实现如下：

private static final Logger LOGGER = LoggerFactory.getLogger(CallGraphDiscovery.class);

// 调用关系信息：方法所属类名，方法名，方法描述符，被调方法所属类名，被调方法名，被调方法描述符，方法参数索引，方法参数对象的字段名称，被调方法参数索引
private final Set<GraphCall> discoveredCalls = new HashSet<>();

/**
 * 分析调用关系，即被调方法的参数是否会被（调用者）方法的参数所影响
 *
 * @param classResourceEnumerator 类枚举器
 * @param config                  配置
 * @throws IOException
 */
public void discover(final ClassResourceEnumerator classResourceEnumerator, GIConfig config) throws IOException {
    // 加载方法信息
    Map<MethodReference.Handle, MethodReference> methodMap = DataLoader.loadMethods();
    // 加载类信息
    Map<ClassReference.Handle, ClassReference> classMap = DataLoader.loadClasses();
    // 加载继承信息（inheritanceMap：子类->父类集合，subClassMap：父类->子类集合）
    InheritanceMap inheritanceMap = InheritanceMap.load();
    // 加载数据流信息：方法->传递污染的参数索引
    Map<MethodReference.Handle, Set<Integer>> passthroughDataflow = PassthroughDiscovery.load();

    // 序列化决策者
    SerializableDecider serializableDecider = config.getSerializableDecider(methodMap, inheritanceMap);

    // 遍历所有的类
    for (ClassResourceEnumerator.ClassResource classResource : classResourceEnumerator.getAllClasses()) {
        try (InputStream in = classResource.getInputStream()) { // 读取类文件
            ClassReader cr = new ClassReader(in);   // 创建 ClassReader，后续调用 accept 方法解析类文件
            try {
                // 判断被调方法的参数是否会被调用者方法的参数所影响
                // 重写方法的调用顺序（没有重写的调用默认方法）：visit -> visitMethod -> visitOuterClass -> visitInnerClass -> visitEnd
                cr.accept(new ModelGeneratorClassVisitor(classMap, inheritanceMap, passthroughDataflow, serializableDecider, Opcodes.ASM6),
                        ClassReader.EXPAND_FRAMES);
            } catch (Exception e) {
                LOGGER.error("Error analyzing: " + classResource.getName(), e);
            }
        }
    }
}

ModelGeneratorClassVisitor 类继承了 asm 中的 ClassVisitor，重写了五个访问者方法，主要关注 visitMethod 中调用 ModelGeneratorMethodVisitor 对方法进行分析。

private class ModelGeneratorClassVisitor extends ClassVisitor {
    private final Map<ClassReference.Handle, ClassReference> classMap;              // 类信息
    private final InheritanceMap inheritanceMap;                                    // 继承信息
    private final Map<MethodReference.Handle, Set<Integer>> passthroughDataflow;    // 数据流信息
    private final SerializableDecider serializableDecider;                          // 序列化决策者
    
    public ModelGeneratorClassVisitor(Map<ClassReference.Handle, ClassReference> classMap,
                                      InheritanceMap inheritanceMap,
                                      Map<MethodReference.Handle, Set<Integer>> passthroughDataflow,
                                      SerializableDecider serializableDecider, int api) {
        super(api); // ASM API 版本
        this.classMap = classMap;
        this.inheritanceMap = inheritanceMap;
        this.passthroughDataflow = passthroughDataflow;
        this.serializableDecider = serializableDecider;
    }

    private String name;            // 类名
    private String signature;       // 签名
    private String superName;       // 父类名
    private String[] interfaces;    // 接口

    @Override
    public void visit(int version, int access, String name, String signature,
                      String superName, String[] interfaces) {
        super.visit(version, access, name, signature, superName, interfaces);
        // 记录类的相关信息
        this.name = name;
        this.signature = signature;
        this.superName = superName;
        this.interfaces = interfaces;
    }

    @Override
    public MethodVisitor visitMethod(int access, String name, String desc,
                                     String signature, String[] exceptions) {
        // 调用父类方法，返回新的方法观察者
        // 如果类观察者的 cv 变量为空，则返回 null，否则返回 cv.visitMethod
        MethodVisitor mv = super.visitMethod(access, name, desc, signature, exceptions);

        // 创建方法访问者，判断方法参数与被调用方法参数的传递关系
        // 重写方法的调用顺序（没有重写的调用默认方法）:visitCode -> visitFieldInsn -> visitMethodInsn
        ModelGeneratorMethodVisitor modelGeneratorMethodVisitor = new ModelGeneratorMethodVisitor(classMap,
                inheritanceMap, passthroughDataflow, serializableDecider, api, mv, this.name, access, name, desc, signature, exceptions);
        // 简化代码分析，删除 JSR 指令并内联引用的子例程
        return new JSRInlinerAdapter(modelGeneratorMethodVisitor, access, name, desc, signature, exceptions);
    }

    @Override
    public void visitOuterClass(String owner, String name, String desc) {   // 访问类的外围类（如果有）
        // TODO: Write some tests to make sure we can ignore this
        super.visitOuterClass(owner, name, desc);
    }

    @Override
    public void visitInnerClass(String name, String outerName, String innerName, int access) {  // 访问内部类，该内部类不一定是被访问的类的成员
        // TODO: Write some tests to make sure we can ignore this
        super.visitInnerClass(name, outerName, innerName, access);
    }

    @Override
    public void visitEnd() {
        super.visitEnd();
    }
}

ModelGeneratorMethodVisitor 也继承了 TaintTrackingMethodVisitor 实现，不过只重写了其中的 3 个访问者方法

visitCode：启动对方法代码的访问，把参数全部存到本地变量表
visitFieldInsn：访问字段指令，字段指令是加载或存储对象字段值的指令
visitMethodInsn：访问方法指令，方法指令是调用方法的指令

visitCode 和 PassthroughDataflowMethodVisitor（直接存储参数索引）中的实现类似，不同的是这里将 arg 与参数索引进行拼接，存储字符串到本地变量表。

@Override
public void visitCode() {   // 启动对方法代码的访问
    // 调用 TaintTrackingMethodVisitor.visitCode 初始化本地变量表
    super.visitCode();

    // 记录参数到本地变量表 savedVariableState.localVars
    int localIndex = 0;
    int argIndex = 0;

    // 非静态方法，第一个参数（隐式）为对象实例 this
    if ((this.access & Opcodes.ACC_STATIC) == 0) {
        // 调用 TaintTrackingMethodVisitor.setLocalTaint 添加到本地变量表
        // 使用 arg 前缀来表示方法入参，后续用于判断是否为目标调用方法的入参
        setLocalTaint(localIndex, "arg" + argIndex);
        localIndex += 1;
        argIndex += 1;
    }

    // 遍历参数，根据描述符得出参数类型（占用空间大小）
    for (Type argType : Type.getArgumentTypes(desc)) {
        // 调用 TaintTrackingMethodVisitor.setLocalTaint 添加到本地变量表
        setLocalTaint(localIndex, "arg" + argIndex);
        localIndex += argType.getSize();
        argIndex += 1;
    }
}

visitFieldInsn 也和 PassthroughDataflowMethodVisitor（直接存储参数索引）中的实现类似，不同的是这里将字段名称与 arg参数索引 字符串进行拼接，然后存储到栈顶。

@Override
public void visitFieldInsn(int opcode, String owner, String name, String desc) {    // 访问字段指令，字段指令是加载或存储对象字段值的指令。
    // 方法执行过程中可能访问对象字段，访问前会进行入栈操作
    switch (opcode) {
        case Opcodes.GETSTATIC: // 获取类的静态字段
            break;
        case Opcodes.PUTSTATIC: // 设置类的静态字段
            break;
        case Opcodes.GETFIELD:  // 获取对象字段
            Type type = Type.getType(desc); // 字段类型
            if (type.getSize() == 1) {
                Boolean isTransient = null; // 如果字段被 transient 关键字修饰，则不可序列化

                // 判断读取的字段所属类是否可序列化，即字段是否可以序列化
                // If a field type could not possibly be serialized, it's effectively transient
                if (!couldBeSerialized(serializableDecider, inheritanceMap, new ClassReference.Handle(type.getInternalName()))) {
                    isTransient = Boolean.TRUE;
                } else {
                    // 若读取的字段所属类可序列化
                    ClassReference clazz = classMap.get(new ClassReference.Handle(owner));
                    while (clazz != null) {
                        // 遍历类的所有字段
                        for (ClassReference.Member member : clazz.getMembers()) {
                            // 是否为目标字段
                            if (member.getName().equals(name)) {
                                // 是否被 transient 关键字修饰
                                isTransient = (member.getModifiers() & Opcodes.ACC_TRANSIENT) != 0;
                                break;
                            }
                        }
                        if (isTransient != null) {
                            break;
                        }
                        // 若找不到目标字段，则向上查找（超类）
                        clazz = classMap.get(new ClassReference.Handle(clazz.getSuperClass()));
                    }
                }

                // 能够传递污染的参数索引集合
                Set<String> newTaint = new HashSet<>();
                if (!Boolean.TRUE.equals(isTransient)) {
                    for (String s : getStackTaint(0)) {
                        newTaint.add(s + "." + name);   // 拼接名称
                    }
                }

                // 调用 TaintTrackingMethodVisitor.visitFieldInsn 进行出/入栈操作
                super.visitFieldInsn(opcode, owner, name, desc);

                // 调用 TaintTrackingMethodVisitor.setStackTaint 将栈顶设置为 newTaint
                setStackTaint(0, newTaint);
                return;
            }
            break;
        case Opcodes.PUTFIELD:  // 设置对象字段
            break;
        default:
            throw new IllegalStateException("Unsupported opcode: " + opcode);
    }

    // 调用 TaintTrackingMethodVisitor.visitFieldInsn 进行出/入栈操作
    super.visitFieldInsn(opcode, owner, name, desc);
}

visitMethodInsn 方法分析被调方法的操作数栈，栈中的元素要么为空集合，要么为能够传递污染的参数集合，模拟操作数栈的元素个数，但元素值是集合（模拟值，不是真实/实际值）。
最开始的时候已经将当前方法的参数以 arg参数索引 的形式存储到了本地变量表，当调用其他方法时，会从本地变量表加载数据到栈中，如果用到对象字段，则以 arg参数索引.字段名称 的形式入栈，因此根据栈中元素的名称就可以得知方法的哪些参数（根据名称判断）影响了被调方法的哪些参数（已知参数个数）。

@Override
public void visitMethodInsn(int opcode, String owner, String name, String desc, boolean itf) {  // 访问方法指令，方法指令是调用方法的指令。
    // 获取被调用方法的参数和类型，非静态方法需要把实例类型放在第一个元素
    // 根据描述符得出被调用方法的参数类型（占用空间大小）
    Type[] argTypes = Type.getArgumentTypes(desc);

    // 非静态方法的第一个参数是对象本身，即 this
    if (opcode != Opcodes.INVOKESTATIC) {   // 非静态方法的第一个参数是实例
        Type[] extendedArgTypes = new Type[argTypes.length + 1];
        System.arraycopy(argTypes, 0, extendedArgTypes, 1, argTypes.length);
        extendedArgTypes[0] = Type.getObjectType(owner);    // 对象类型
        argTypes = extendedArgTypes;
    }

    switch (opcode) {
        case Opcodes.INVOKESTATIC:      // 调用静态方法
        case Opcodes.INVOKEVIRTUAL:     // 调用实例方法
        case Opcodes.INVOKESPECIAL:     // 调用超类构造方法，实例初始化方法，私有方法
        case Opcodes.INVOKEINTERFACE:   // 调用接口方法
            int stackIndex = 0;
            // 被调用方法的操作数栈
            for (int i = 0; i < argTypes.length; i++) {
                // 最右边的参数，就是最后入栈，即在栈顶
                int argIndex = argTypes.length - 1 - i; // 参数索引
                Type type = argTypes[argIndex]; // 参数类型

                // 参数从右往左入栈，因此最右边的参数在栈底
                Set<String> taint = getStackTaint(stackIndex);
                if (taint.size() > 0) { // 如果存在能够传递污染的参数
                    // 遍历参数
                    for (String argSrc : taint) {
                        if (!argSrc.substring(0, 3).equals("arg")) {
                            throw new IllegalStateException("Invalid taint arg: " + argSrc);
                        }
                        // arg数字.字段名称
                        int dotIndex = argSrc.indexOf('.'); // 分隔位置
                        int srcArgIndex;    // 第几个参数
                        String srcArgPath;
                        if (dotIndex == -1) {
                            srcArgIndex = Integer.parseInt(argSrc.substring(3));
                            srcArgPath = null;  // 没有名称
                        } else {
                            srcArgIndex = Integer.parseInt(argSrc.substring(3, dotIndex));
                            srcArgPath = argSrc.substring(dotIndex + 1);  // 字段名称
                        }

                        // 记录参数流动关系
                        // argIndex：当前方法参数索引；srcArgIndex：对应上一级方法的参数索引
                        discoveredCalls.add(new GraphCall(
                                new MethodReference.Handle(new ClassReference.Handle(this.owner), this.name, this.desc),
                                new MethodReference.Handle(new ClassReference.Handle(owner), name, desc),
                                srcArgIndex,
                                srcArgPath,
                                argIndex));
                    }
                }
                // 往左一个参数
                stackIndex += type.getSize();
            }
            break;
        default:
            throw new IllegalStateException("Unsupported opcode: " + opcode);
    }

    // 调用 TaintTrackingMethodVisitor.visitMethodInsn 执行出/入栈操作
    super.visitMethodInsn(opcode, owner, name, desc, itf);
}

save 方法存储分析得到的调用关系信息。

/**
 * 使用工厂方法存储调用关系信息
 *
 * @throws IOException
 */
public void save() throws IOException {
    DataLoader.saveData(Paths.get("callgraph.dat"), new GraphCall.Factory(), discoveredCalls);
}

8. GadgetChainDiscovery

针对不同的挖掘类型，污点源信息收集的实现不同，这里关注 Java 原生序列化的污点源，分析已经在 0x02 项目结构 - gadgetinspector/javaserial - SimpleSourceDiscovery 一节中给出。

挖掘利用链实际就是找一条从 source 点到 sink 点的路径，前面收集的信息都是为了这里的搜索做准备。

这里定义了两个类分别表示利用链和利用链上的的节点（即方法）。

// 利用链
private static class GadgetChain {
    private final List<GadgetChainLink> links;

    private GadgetChain(List<GadgetChainLink> links) {
        this.links = links;
    }

    private GadgetChain(GadgetChain gadgetChain, GadgetChainLink link) {
        List<GadgetChainLink> links = new ArrayList<GadgetChainLink>(gadgetChain.links);
        links.add(link);
        this.links = links;
    }
}

// 利用链（节点）
private static class GadgetChainLink {
    private final MethodReference.Handle method;
    private final int taintedArgIndex;

    private GadgetChainLink(MethodReference.Handle method, int taintedArgIndex) {
        this.method = method;
        this.taintedArgIndex = taintedArgIndex;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        GadgetChainLink that = (GadgetChainLink) o;
        if (taintedArgIndex != that.taintedArgIndex) return false;
        return method != null ? method.equals(that.method) : that.method == null;
    }

    @Override
    public int hashCode() {
        int result = method != null ? method.hashCode() : 0;
        result = 31 * result + taintedArgIndex;
        return result;
    }
}

discover 方法首先加载方法信息、继承信息，调用 InheritanceDeriver.getAllMethodImplementations 获取方法的重写信息，分析也已经在 0x02 项目结构 一节中给出，并保存到文件中，再加载上一步得到的调用关系信息。

然后加载污点源信息，将每个 source 方法作为初始节点创建一条链，加入待分析的链集合。遍历集合中的链，取出链并从尾节点（方法）开始分析，第一次分析污点源，如果其参数索引与被调方法的参数索引相同，则创建新节点并加入链的最末端，如果被调方法不是 sink 点，则加入待分析的链集合，否则加入发现的利用链集合。之后重复上面的步骤，集合中待分析的链会越来越长，直到所有链都被弹出和分析完毕。

/**
 * 搜索可能的利用链，保存到 gadget-chains.txt 中
 *
 * @throws Exception
 */
public void discover() throws Exception {
    // 加载方法信息
    Map<MethodReference.Handle, MethodReference> methodMap = DataLoader.loadMethods();
    // 加载继承信息（inheritanceMap：子类->父类集合，subClassMap：父类->子类集合）
    InheritanceMap inheritanceMap = InheritanceMap.load();
    // 加载重写信息：方法->重写方法集合
    Map<MethodReference.Handle, Set<MethodReference.Handle>> methodImplMap = InheritanceDeriver.getAllMethodImplementations(
            inheritanceMap, methodMap);

    // 返回目标方法的可序列化重写方法（包括目标方法本身）
    final ImplementationFinder implementationFinder = config.getImplementationFinder(
            methodMap, methodImplMap, inheritanceMap);

    // 保存重写信息到 methodimpl.dat：（缩进）类名 方法名 描述符
    try (Writer writer = Files.newBufferedWriter(Paths.get("methodimpl.dat"))) {
        for (Map.Entry<MethodReference.Handle, Set<MethodReference.Handle>> entry : methodImplMap.entrySet()) {
            writer.write(entry.getKey().getClassReference().getName());
            writer.write("\t");
            writer.write(entry.getKey().getName());
            writer.write("\t");
            writer.write(entry.getKey().getDesc());
            writer.write("\n");
            for (MethodReference.Handle method : entry.getValue()) {
                writer.write("\t");
                writer.write(method.getClassReference().getName());
                writer.write("\t");
                writer.write(method.getName());
                writer.write("\t");
                writer.write(method.getDesc());
                writer.write("\n");
            }
        }
    }

    // 加载调用关系信息
    Map<MethodReference.Handle, Set<GraphCall>> graphCallMap = new HashMap<>();
    for (GraphCall graphCall : DataLoader.loadData(Paths.get("callgraph.dat"), new GraphCall.Factory())) {
        MethodReference.Handle caller = graphCall.getCallerMethod();
        if (!graphCallMap.containsKey(caller)) {
            Set<GraphCall> graphCalls = new HashSet<>();
            graphCalls.add(graphCall);
            graphCallMap.put(caller, graphCalls);
        } else {
            graphCallMap.get(caller).add(graphCall);
        }
    }

    // 已经访问过的方法（节点）
    Set<GadgetChainLink> exploredMethods = new HashSet<>();
    // 待分析的链
    LinkedList<GadgetChain> methodsToExplore = new LinkedList<>();
    // 加载所有 sources，并将每个 source 分别作为链的第一个节点
    for (Source source : DataLoader.loadData(Paths.get("sources.dat"), new Source.Factory())) {
        // 创建节点
        GadgetChainLink srcLink = new GadgetChainLink(source.getSourceMethod(), source.getTaintedArgIndex());
        if (exploredMethods.contains(srcLink)) {
            continue;
        }
        // 创建仅有一个节点的链
        methodsToExplore.add(new GadgetChain(Arrays.asList(srcLink)));
        // 将方法标记为已访问
        exploredMethods.add(srcLink);
    }

    // 循环次数
    long iteration = 0;
    // 保存找到的利用链
    Set<GadgetChain> discoveredGadgets = new HashSet<>();
    // BFS 搜索 source 到 sink 的利用链
    while (methodsToExplore.size() > 0) {
        if ((iteration % 1000) == 0) {
            LOGGER.info("Iteration " + iteration + ", Search space: " + methodsToExplore.size());
        }
        iteration += 1;

        GadgetChain chain = methodsToExplore.pop(); // 取出一条链
        GadgetChainLink lastLink = chain.links.get(chain.links.size() - 1); // 取这条链最后一个节点（方法）

        // 获取当前方法与其被调方法的调用关系
        Set<GraphCall> methodCalls = graphCallMap.get(lastLink.method);
        if (methodCalls != null) {
            for (GraphCall graphCall : methodCalls) {
                // 如果当前方法的污染参数与被调方法受方法参数影响的索引不一致则跳过（即第 index 个参数）
                // 判断 source 时，索引指出能够被攻击者控制的参数
                if (graphCall.getCallerArgIndex() != lastLink.taintedArgIndex) {
                    continue;
                }

                // 获取被调方法的可序列化重写信息
                Set<MethodReference.Handle> allImpls = implementationFinder.getImplementations(graphCall.getTargetMethod());

                // 遍历被调方法的重写方法
                for (MethodReference.Handle methodImpl : allImpls) {
                    GadgetChainLink newLink = new GadgetChainLink(methodImpl, graphCall.getTargetArgIndex());
                    // 如果被调方法已经被访问过了，则跳过，减少开销
                    // 但是跳过会使其他链在经过此节点时断掉
                    // 而去掉这步可能会遇到环状问题，造成路径无限增加
                    if (exploredMethods.contains(newLink)) {
                        continue;
                    }

                    // 新节点（被调方法）与之前的链组成新链
                    GadgetChain newChain = new GadgetChain(chain, newLink);
                    // 判断被调方法是否为 sink 点，如果是则加入利用链集合
                    if (isSink(methodImpl, graphCall.getTargetArgIndex(), inheritanceMap)) {
                        discoveredGadgets.add(newChain);
                    } else {    // 否则将新链加入待分析的链集合，被调方法加入已访问的方法集合
                        methodsToExplore.add(newChain);
                        exploredMethods.add(newLink);
                    }
                }
            }
        }
    }

    // 将搜索到的利用链保存到 gadget-chains.txt
    try (OutputStream outputStream = Files.newOutputStream(Paths.get("gadget-chains.txt"));
         Writer writer = new OutputStreamWriter(outputStream, StandardCharsets.UTF_8)) {
        for (GadgetChain chain : discoveredGadgets) {
            printGadgetChain(writer, chain);
        }
    }
    LOGGER.info("Found {} gadget chains.", discoveredGadgets.size());
}

isSink 方法判断方法（和参数）是否触发预定义的 JDK 中的 sink 点，比如 Runtime.exec 方法。

/**
 * 预定义的 sink 点
 * Represents a collection of methods in the JDK that we consider to be "interesting". If a gadget chain can
 * successfully exercise one of these, it could represent anything as mundade as causing the target to make a DNS
 * query to full blown RCE.
 *
 * @param method            方法
 * @param argIndex          参数索引
 * @param inheritanceMap    继承信息
 * @return
 */
// TODO: Parameterize this as a configuration option
private boolean isSink(MethodReference.Handle method, int argIndex, InheritanceMap inheritanceMap) {
    if (method.getClassReference().getName().equals("java/io/FileInputStream")
            && method.getName().equals("<init>")) {
        return true;
    }
    if (method.getClassReference().getName().equals("java/io/FileOutputStream")
            && method.getName().equals("<init>")) {
        return true;
    }
    if (method.getClassReference().getName().equals("java/nio/file/Files")
            && (method.getName().equals("newInputStream")
            || method.getName().equals("newOutputStream")
            || method.getName().equals("newBufferedReader")
            || method.getName().equals("newBufferedWriter"))) {
        return true;
    }

    if (method.getClassReference().getName().equals("java/lang/Runtime")
            && method.getName().equals("exec")) {
        return true;
    }
    /*
    if (method.getClassReference().getName().equals("java/lang/Class")
            && method.getName().equals("forName")) {
        return true;
    }
    if (method.getClassReference().getName().equals("java/lang/Class")
            && method.getName().equals("getMethod")) {
        return true;
    }
    */
    // If we can invoke an arbitrary method, that's probably interesting (though this doesn't assert that we
    // can control its arguments). Conversely, if we can control the arguments to an invocation but not what
    // method is being invoked, we don't mark that as interesting.
    if (method.getClassReference().getName().equals("java/lang/reflect/Method")
            && method.getName().equals("invoke") && argIndex == 0) {
        return true;
    }

    if (method.getClassReference().getName().equals("java/net/URLClassLoader")
            && method.getName().equals("newInstance")) {
        return true;
    }

    if (method.getClassReference().getName().equals("java/lang/System")
            && method.getName().equals("exit")) {
        return true;
    }

    if (method.getClassReference().getName().equals("java/lang/Shutdown")
            && method.getName().equals("exit")) {
        return true;
    }

    if (method.getClassReference().getName().equals("java/lang/Runtime")
            && method.getName().equals("exit")) {
        return true;
    }

    if (method.getClassReference().getName().equals("java/nio/file/Files")
            && method.getName().equals("newOutputStream")) {
        return true;
    }

    if (method.getClassReference().getName().equals("java/lang/ProcessBuilder")
            && method.getName().equals("<init>") && argIndex > 0) {
        return true;
    }

    if (inheritanceMap.isSubclassOf(method.getClassReference(), new ClassReference.Handle("java/lang/ClassLoader"))
            && method.getName().equals("<init>")) {
        return true;
    }

    if (method.getClassReference().getName().equals("java/net/URL") && method.getName().equals("openStream")) {
        return true;
    }

    // Some groovy-specific sinks
    if (method.getClassReference().getName().equals("org/codehaus/groovy/runtime/InvokerHelper")
            && method.getName().equals("invokeMethod") && argIndex == 1) {
        return true;
    }

    if (inheritanceMap.isSubclassOf(method.getClassReference(), new ClassReference.Handle("groovy/lang/MetaClass"))
            && Arrays.asList("invokeMethod", "invokeConstructor", "invokeStaticMethod").contains(method.getName())) {
        return true;
    }

    // This jython-specific sink effectively results in RCE
    if (method.getClassReference().getName().equals("org/python/core/PyCode") && method.getName().equals("call")) {
        return true;
    }

    return false;
}

printGadgetChain 方法用于输出利用链信息。

/**
 * 将利用链写入文件：（缩进）类名 方法名 方法描述符 传递污点的参数索引
 *
 * @param writer 写入流
 * @param chain  利用链
 * @throws IOException
 */
private static void printGadgetChain(Writer writer, GadgetChain chain) throws IOException {
    writer.write(String.format("%s.%s%s (%d)%n",    // 污点源
            chain.links.get(0).method.getClassReference().getName(),    // 类名
            chain.links.get(0).method.getName(),    // 方法名
            chain.links.get(0).method.getDesc(),    // 描述符
            chain.links.get(0).taintedArgIndex));   // 污点参数索引
    for (int i = 1; i < chain.links.size(); i++) {  // 利用链
        writer.write(String.format("  %s.%s%s (%d)%n",
                chain.links.get(i).method.getClassReference().getName(),
                chain.links.get(i).method.getName(),
                chain.links.get(i).method.getDesc(),
                chain.links.get(i).taintedArgIndex));
    }
    writer.write("\n");
}

0x04 结语

测试时发现 Gadget Inspector 无法分析用 Java16 生成的 jar 包，听说 Java8 的兼容性比较好，尝试使用 Java8 打包，可以正常执行分析，之后再补充例子。

这个工具很明显无法搜索所有的利用链，为了避免路径爆炸对每个方法只访问一次，可以用最大深度限制修改；另外也有文章分析表示生成的调用关系不够全，我没有验证过；扩充的话可以从添加 source/sink 点（规则）开始，也有人扩充了对 SQL 注入（Web）的检测之类的。

当然还是先熟悉工具的运行原理，用简单的程序测试之后，再拿实际例子（比如 ysoserial）测，难顶🤯。

0x00 前言

0x01 预备知识

1.1 Java 字节码

1.2 JVM

1.3 ASM

访问者模式

代码组织架构

工作流程

访问者

1. ClassVisitor

2. MethodVisitor

3. FieldVisitor

其他

0x02 项目结构

gadgetinspector/data

1. DataLoader

2. DataFactory

3. ClassReference

4. MethodReference

5. inheritanceMap

6. InheritanceDeriver

7. GraphCall

8. Source

gadgetinspector

1. SerializableDecider

2. ImplementationFinder

3. SourceDiscovery

gadgetinspector/config

1. GIConfig

2. ConfigRepository

3. GIConfig 接口实现

gadgetinspector/javaserial

1. SimpleSerializableDecider

2. SimpleImplementationFinder

3. SimpleSourceDiscovery

0x03 工作流程

1. Util

2. ClassResourceEnumerator

3. GadgetInspector

4. MethodDiscovery

5. PassthroughDiscovery

6. TaintTrackingMethodVisitor

7. CallGraphDiscovery

8. GadgetChainDiscovery

0x04 结语

参阅