前言

由 Facebook 开源的 Pyre 是兼容 PEP 484 的 Python 性能类型检查器，可以增量分析大型代码库，能够迅速处理百万级别的代码。Pyre 附带了 Pysa，一个关注安全性的静态分析工具，Pysa 是 Python Static Analyzer 的缩写，Pysa 支持追踪和分析 Python 程序中的数据流（污点分析）。

此外还有一个 SAPP (Static Analysis Post Processor) 静态分析后置处理器，提供命令行和 UI 检索 Pysa 的执行结果。

关于 Python 的类型（PEP 484），建议阅读 mypy 的清单和类型参考。下面就是没有添加和添加了类型注释的两个函数，Python3.5 开始支持可选的类型注释，这个特性极大地方便了对 Python 程序进行静态分析，不过就我看到的开源工具很少有添加类型注释的…

from typing import List

def unannotated():        # implictly returns `Any`
    return b"" + ""       # function body is not checked

def annotated() -> List:  # explicit return annotation means we type check `annotated`
    any = unannotated()
    any.attribute         # `Any` has all possible attributes
    return 1              # Error: returning `int` but expecting `List`

Pysa 跟踪数据流，用户定义源点（产生数据的地方）和汇点（来自源点的数据不应该结束的地方），当源点和汇点相交时就产生了问题

最常见的数据源点就是用户控制的输入
汇点比较多样，包括各种 API

Pysa 执行的是过程间分析，即跟踪跨函数调用的数据流（污点分析），使用代码中的所有可用信息，包括可选的静态类型信息。Pyre 能够为源码添加类型信息，它本身的作用就是静态类型检查器。

局限

问题空间

Pysa 只能追踪从 admin_operation 到 delete_user 的数据流，无法检查 user_is_admin

def admin_operation(request: HttpRequest):
  if not user_is_admin():
      return Http404
 
  delete_user(request.GET["user_to_delete"])

Python 的动态特性

Pysa 无法识别动态导入的模块函数调用

def secret_eval(request: HttpRequest):
  os = importlib.import_module("os")
 
  # Pysa won't know what 'os' is, and thus won't
  # catch this remote code execution issue
  os.system(request.GET["command"])

装饰器

2020.8.7 Facebook 博客指出暂不支持在调用图中包括装饰器

Facebook 提供了 Pysa 的教程 👉 Pysa Tutorial ，涵盖几个主要的内容，进行实验的过程中遇到的一些问题也都磕磕绊绊地解决了。

实验

Pysa 使用 pyre analyze 调用，实验中涉及到以下几个配置文件和工具：

taint.config

定义污点的源（source）和汇（sink），还包括隐式的源和汇
检测规则（rule），例如，从某源点到某汇点是 XXX 攻击，一条规则中可以包含多个源和汇
特征（feature）：污点的附加元数据，可用于过滤误报

.pysa

污点模型，表示哪里是源点和汇点（利用签名），使用完全限定名，格式必须匹配 .pyi 存根文件
- TaintSource[SOURCE_NAME] 标记源点
- TaintSink[SINK_NAME] 标记汇点
- TaintInTaintOut 标记污点进入进出，指的是进入函数的污点传播到返回值中
- PartialSink 标记组合源，使用规则名称
消毒器（Sanitizer）表示对象的变化，经过消毒器污点就被净化，不再跟踪
- 使用装饰器声明函数为消毒器
- 可以限定范围：源（source）、汇（sink）、污点进污点出（taint-in-taint-out, TITO）

.pyre_configuration

路径配置：源代码、存根文件等

SAPP

交互式命令行
Web 服务器

动态生成污点模型

pyre-check/tools/generate_taint_models/get_*.py 包含预定义的一些生成器
遵循模型领域特定语言（Domain Specific Language, DSL）

环境设置

实验环境： Ubuntu 20.04 Server + Python 3.8.10 + pip 20.0.2

在 Python 虚拟环境中进行实验

# 下载源码
git clone https://github.com/facebook/pyre-check.git
cd pyre-check

# 安装虚拟环境
cd documentation/pysa_tutorial
python -m venv tutorial

# 激活虚拟环境
source tutorial/bin/activate

# 安装依赖
pip install pyre-check fb-sapp

如果这里遇到 Error 可以更新一下工具

1 2	pip install wheel python -m pip install --upgrade setuptools

exercise1

views.py 存在远程代码执行（Remote Code Execution, RCE）漏洞。Pysa 需要知道 request.GET 包含用户控制的数据，eval 可以执行代码。

from django.http import HttpRequest, HttpResponse

def operate_on_twos(request: HttpRequest) -> HttpResponse:
    operator = request.GET["operator"]
    result = eval(f"2 {operator} 2")  # noqa: P204
    return result

taint.config 编写规则，告诉 Pysa 当 CustomUserControlled 源点数据到达 CodeExecution 汇点时会引发 RCE 。

{
  "sources": [
    {
      "name": "CustomUserControlled",
      "comment": "use to annotate user input"
    }
  ],

  "sinks": [
    {
      "name": "CodeExecution",
      "comment": "use to annotate execution of python code"
    }
  ],

  "features": [],

  "rules": [
    {
      "name": "Possible RCE:",
      "code": 5001,
      "sources": [ "CustomUserControlled" ],
      "sinks": [ "CodeExecution" ],
      "message_format": "User specified data may reach a code execution sink"
    }
  ]
}

sources_sinks.pysa 告诉 Pysa request.Get 是 CustomUserControlled 类型的污点源点 TaintSource，而 eval 是 CodeExecution 代码执行类型的污点汇点 TaintSink 。

1
2
3

django.http.request.HttpRequest.GET: TaintSource[CustomUserControlled] = ...

def eval(__source: TaintSink[CodeExecution], __globals, __locals): ...

.pyre_configuration 配置了搜索的路径。

{
  "source_directories": [ // 查找源码的目录
    "."
  ],
  "taint_models_path": [ // 查找 .pysa/taint.config 文件的目录
    "."
  ],
  "search_path": [  // 查找存根文件
    "../../../stubs/"
  ],
  "exclude": [
    ".*/integration_test/.*"
  ],
  "use_command_v2": true
}

具体的执行结果如下，最后输出的 JSON 数组给出了问题列表。

(tutorial) jck@analysis:~/pyre-check/documentation/pysa_tutorial/exercise1$ pyre analyze
ƛ No cached overrides loaded, computing overrides...
ƛ `google.protobuf.message.Message.ClearField` has 57 overrides, this might slow down the analysis considerably.
ƛ `google.protobuf.message.Message.__init__` has 58 overrides, this might slow down the analysis considerably.
ƛ `object.__eq__` has 82 overrides, this might slow down the analysis considerably.
ƛ `object.__init__` has 754 overrides, this might slow down the analysis considerably.
ƛ `object.__ne__` has 60 overrides, this might slow down the analysis considerably.
ƛ `type.__call__` has 131 overrides, this might slow down the analysis considerably.
ƛ `type.__init__` has 448 overrides, this might slow down the analysis considerably.
ƛ `type.__new__` has 75 overrides, this might slow down the analysis considerably.
[
  {
    "line": 12,
    "column": 18,
    "stop_line": 12,
    "stop_column": 35,
    "path": "views.py",
    "code": 5001,
    "name": "Possible RCE:",
    "description":
      "Possible RCE: [5001]: User specified data may reach a code execution sink",
    "long_description":
      "Possible RCE: [5001]: User specified data may reach a code execution sink",
    "concise_description":
      "Possible RCE: [5001]: User specified data may reach a code execution sink",
    "inference": null,
    "define": "views.operate_on_twos"
  }
]

exercise2

views.py 三个函数都存在远程执行漏洞，前两个执行 python 代码，最后一个执行 shell 代码。

import subprocess
from django.http import HttpRequest, HttpResponse

def operate_on_twos(request: HttpRequest) -> HttpResponse:
    operator = request.POST["operator"]
    result = eval(f"2 {operator} 2")  # noqa: P204
    return result

def operate_on_threes(request: HttpRequest) -> HttpResponse:
    operator = request.GET["operator"]
    exec(f"result = 3 {operator} 3")
    return result  # noqa: F821

def operate_on_fours(request: HttpRequest) -> HttpResponse:
    operator = request.GET["operator"]
    result = subprocess.getoutput(f"expr 4 {operator} 4")
    return result

taint.config 规则中已经添加了名称为 ShellExecution 的汇点。

{
  "sources": [
    {
      "name": "CustomUserControlled",
      "comment": "use to annotate user input"
    }
  ],

  "sinks": [
    {
      "name": "CodeExecution",
      "comment": "use to annotate execution of python code"
    },
    {
      "name": "ShellExecution",
      "comment": "use to annotate execution of shell scripts"
    }
  ],

  "features": [],

  "rules": [
    {
      "name": "Possible RCE:",
      "code": 5001,
      "sources": [ "CustomUserControlled" ],
      "sinks": [ "CodeExecution" ],
      "message_format": "User specified data may reach a code execution sink"
    }
  ]
}

在 taint.config 中添加一个 CustomUserControlled 源点到 ShellExecution 汇点的规则，将 code 定义为 5002 。

"rules": [
  {
    "name": "Possible RCE:",
    "code": 5002,
    "sources": [ "CustomUserControlled" ],
    "sinks": [ "ShellExecution" ],
    "message_format": "User specified data may reach a shell execution sink"
  }
]

sources_sinks.pysa 包含带有污点注释的模型，这些模型必须匹配 .pyi 存根文件中的存根或源码。将 .pyi 存根或源码转换为模型时，必须确保：

函数名不变
参数名不变
删除类型注释
函数或属性是完全限定的

Pyre 的主要存根来自于 typeshed（包含 Python 标准库和 Python 内置包的外部类型注释，以及项目外部人员贡献的第三方包）。还有一部分是为 Pysa 编写的存根，涵盖 Django 等第三方库，不包含在 typeshed 中。

django.http.request.HttpRequest.GET: TaintSource[CustomUserControlled] = ...

def eval(__source: TaintSink[CodeExecution], __globals, __locals): ...

def subprocess.getoutput(cmd: TaintSink[ShellExecution]): ...

添加规则，指明使用源点和汇点的函数。

1
2
3

django.http.request.HttpRequest.POST: TaintSource[CustomUserControlled] = ...

def exec(__source: TaintSink[CodeExecution], __globals, __locals): ...

执行 pyre analyze ，结果如下：

(tutorial) jck@analysis:~/pyre-check/documentation/pysa_tutorial/exercise2$ pyre analyze
ƛ No cached overrides loaded, computing overrides...
ƛ `google.protobuf.message.Message.ClearField` has 57 overrides, this might slow down the analysis considerably.
ƛ `google.protobuf.message.Message.__init__` has 58 overrides, this might slow down the analysis considerably.
ƛ `object.__eq__` has 82 overrides, this might slow down the analysis considerably.
ƛ `object.__init__` has 754 overrides, this might slow down the analysis considerably.
ƛ `object.__ne__` has 60 overrides, this might slow down the analysis considerably.
ƛ `type.__call__` has 131 overrides, this might slow down the analysis considerably.
ƛ `type.__init__` has 448 overrides, this might slow down the analysis considerably.
ƛ `type.__new__` has 75 overrides, this might slow down the analysis considerably.
[
  {
    "line": 30,
    "column": 34,
    "stop_line": 30,
    "stop_column": 56,
    "path": "views.py",
    "code": 5002,
    "name": "Possible RCE:",
    "description":
      "Possible RCE: [5002]: User specified data may reach a shell execution sink",
    "long_description":
      "Possible RCE: [5002]: User specified data may reach a shell execution sink",
    "concise_description":
      "Possible RCE: [5002]: User specified data may reach a shell execution sink",
    "inference": null,
    "define": "views.operate_on_fours"
  },
  {
    "line": 22,
    "column": 9,
    "stop_line": 22,
    "stop_column": 35,
    "path": "views.py",
    "code": 5001,
    "name": "Possible RCE:",
    "description":
      "Possible RCE: [5001]: User specified data may reach a code execution sink",
    "long_description":
      "Possible RCE: [5001]: User specified data may reach a code execution sink",
    "concise_description":
      "Possible RCE: [5001]: User specified data may reach a code execution sink",
    "inference": null,
    "define": "views.operate_on_threes"
  },
  {
    "line": 14,
    "column": 18,
    "stop_line": 14,
    "stop_column": 35,
    "path": "views.py",
    "code": 5001,
    "name": "Possible RCE:",
    "description":
      "Possible RCE: [5001]: User specified data may reach a code execution sink",
    "long_description":
      "Possible RCE: [5001]: User specified data may reach a code execution sink",
    "concise_description":
      "Possible RCE: [5001]: User specified data may reach a code execution sink",
    "inference": null,
    "define": "views.operate_on_twos"
  }
]

exercise3

直接运行 pyre analyze 提示模型验证错误：Found 93 model verification errors in exercise3 #441

1 2	ƛ Finding taint models in `/home/jck/pyre-check/stubs/taint, /home/jck/pyre-check/documentation/pysa_tutorial/exercise3`.ƛ Found 93 model verification errors! ...

添加 --no-verify 参数得到预期反馈，存在误报（假阳）问题。

...
ƛ No cached overrides loaded, computing overrides...
ƛ `google.protobuf.message.Message.ClearField` has 57 overrides, this might slow down the analysis considerably.
ƛ `google.protobuf.message.Message.__init__` has 58 overrides, this might slow down the analysis considerably.
ƛ Iteration #2. 4 Callables [zipfile.ZipFile::__init__ (override), str::format (override), shelve.Shelf::__init__ (overri[
  {
    "line": 34,
    "column": 9,
    "stop_line": 34,
    "stop_column": 35,
    "path": "views.py",
    "code": 5001,
    "name": "Possible shell injection",
    "description":
      "Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
    "long_description":
      "Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
    "concise_description":
      "Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
    "inference": null,
    "define": "views.operate_on_threes"
  },
  {
    "line": 24,
    "column": 18,
    "stop_line": 24,
    "stop_column": 35,
    "path": "views.py",
    "code": 5001,
    "name": "Possible shell injection",
    "description":
      "Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
    "long_description":
      "Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
    "concise_description":
      "Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
    "inference": null,
    "define": "views.operate_on_twos"
  }
]

Pysa 为许多 Python 标准库和开源库提供了预先编写的源点、汇点和规则。预先写好的 taint.config 和 *.pysa 文件在 stubs/taint 文件夹中。

views.py 所有函数都没有安全问题，但执行 pyre analyze --no-verify 出现误报。

from django.http import HttpRequest, HttpResponse

def example_sanitizer():
    ...

def get_operator_safe(request: HttpRequest) -> str:
    operator = request.POST["operator"]
    assert operator in {"+", "-", "*", "/"}
    return operator

def operate_on_twos(request: HttpRequest) -> HttpResponse:
    operator = get_operator_safe(request)
    result = eval(f"2 {operator} 2")  # noqa: P204
    return result

def operate_on_threes(request: HttpRequest) -> HttpResponse:
    operator = request.GET["operator"]
    assert operator in {"+", "-", "*", "/"}
    exec(f"result = 3 {operator} 3")
    return result  # noqa: F821

sanitizers.pysa 中定义了消毒器（Sanitizer），它们标记了 Pysa 对待整个可调用对象的方式变化，而不仅仅是返回值或参数，使用装饰器表示注释。

1 2	@Sanitize def views.example_sanitizer(): ...

对于 operate_on_twos 函数，因为有调用 get_operator_safe 过滤请求，所以后续调用 eval() 也是安全的，将 get_operator_safe 标记为消毒器。

1 2	@Sanitize def views.get_operator_safe(request: TaintSource[UserControlled]): ...

再次执行 pyre analyze --no-verify 只剩一个误报：

...
ƛ No cached overrides loaded, computing overrides...
ƛ `google.protobuf.message.Message.ClearField` has 57 overrides, this might slow down the analysis considerably.
ƛ `google.protobuf.message.Message.__init__` has 58 overrides, this might slow down the analysis considerably.
ƛ Iteration #2. 4 Callables [zipfile.ZipFile::__init__ (override), str::format (override), shelve.Shelf::__init__ (overri[
  {
    "line": 34,
    "column": 9,
    "stop_line": 34,
    "stop_column": 35,
    "path": "views.py",
    "code": 5001,
    "name": "Possible shell injection",
    "description":
      "Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
    "long_description":
      "Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
    "concise_description":
      "Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
    "inference": null,
    "define": "views.operate_on_threes"
  }
]

operate_on_threes 函数本身就过滤了请求，这里添加一个 identity 函数调用，将参数原样返回。

def identity(x):
    return x

def operate_on_threes(request: HttpRequest) -> HttpResponse:
    operator = request.GET["operator"]
    assert operator in {"+", "-", "*", "/"}
    operator = identity(operator)
    exec(f"result = 3 {operator} 3")
    return result  # noqa: F821

将 identity 函数标记为消毒器，指明经过该消毒器的污点源就不再是污点数据，不用继续跟踪。

1 2	@Sanitize def views.identity(x: TaintSource[UserControlled]): ...

再次执行 pyre analyze --no-verify，不存在误报了。

...
ƛ No cached overrides loaded, computing overrides...
ƛ `google.protobuf.message.Message.ClearField` has 57 overrides, this might slow down the analysis considerably.
ƛ `google.protobuf.message.Message.__init__` has 58 overrides, this might slow down the analysis considerably.
ƛ Iteration #2. 4 Callables [zipfile.ZipFile::__init__ (override), str::format (override), shelve.Shelf::__init__ (overri[]

exercise4

使用 SAPP (Static Analysis Post Processor) 提供的交互式命令行。

views.py 函数同样不存在安全问题，但又产生了误报。

from django.http import HttpRequest, HttpResponse

def example_feature(argument: str) -> None:
    ...

def assert_numeric(operand: str) -> None:
    assert operand.isnumeric()

def do_and(request: HttpRequest) -> HttpResponse:
    left = bool(request.GET["left"])
    right = bool(request.GET["right"])
    result = eval(f"{left} and {right}")  # noqa: P204
    return result

def do_or(request: HttpRequest) -> HttpResponse:
    left = request.GET["left"]
    right = request.GET["right"]
    assert_numeric(left)
    assert_numeric(right)
    result = eval(f"{left} or {right}")  # noqa: P204
    return result

taint.config 特征（feature）是与污点流相关的附加元数据，可用于过滤误报（不影响分析）。

{
  "sources": [],
  "sinks": [],
  "features": [
    {
      "name": "example",
      "comment": "Copy this feature and write your own. Don't forget that JSON lists are comma seperated!"
    }
  ],
  "rules": []
}

features.pysa 使用了名称为 example 的特征。

1	def views.example_feature(argument: AddFeatureToArgument[Via[example]]): ...

运行 Pysa 并在 SAPP 中打开结果：

1
2
3

pyre analyze --no-verify --save-results-to .
sapp analyze taint-output.json
sapp explore

使用 SAPP 交互式命令行查看安全问题：

...
[ run 1 ]
>>> issues # 返回 2 个问题
Issue 1
            Code: 5001
         Message: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)
        Callable: views.do_or
         Sources: UserControlled
           Sinks: RemoteCodeExecution
        Features: first-index:right
                  always-via:format-string
                  has:first-index
                  first-index:left
        Location: views.py:33|18|38
Min Trace Length: Source (0) | Sink (0)
--------------------------------------------------------------------------------
Issue 2
            Code: 5001
         Message: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)
        Callable: views.do_and
         Sources: UserControlled
           Sinks: RemoteCodeExecution
        Features: first-index:right
                  always-type:bool
                  always-type:scalar
                  always-via:obscure
                  always-via:format-string
                  first-index:left
                  has:first-index
        Location: views.py:21|18|39
Min Trace Length: Source (0) | Sink (0)
Found 2 issues with run_id 1.

[ run 1 ]
>>> issues(exclude_features=["always-type:bool"]) # 过滤 do_and
Issue 1
            Code: 5001
         Message: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)
        Callable: views.do_or
         Sources: UserControlled
           Sinks: RemoteCodeExecution
        Features: first-index:right
                  always-via:format-string
                  has:first-index
                  first-index:left
        Location: views.py:33|18|38
Min Trace Length: Source (0) | Sink (0)
Found 1 issues with run_id 1.

[ run 1 ]
>>> exit # 退出
(tutorial) jck@analysis:~/pyre-check/documentation/pysa_tutorial/exercise4$

在 taint.config 中添加名称为 assert_numeric 的特征。

{
  "sources": [],
  "sinks": [],
  "features": [
    {
      "name": "example",
      "comment": "Copy this feature and write your own. Don't forget that JSON lists are comma seperated!"
    },
    {
      "name": "assert_numeric",
      "comment": "via assert_numeric"
    }
  ],
  "rules": []
}

在 features.pysa 中使用该特征，指明 views.assert_numeric 函数的 operand 参数带有该特征。

1	def views.assert_numeric(operand: AddFeatureToArgument[Via[assert_numeric]]): ...

重新运行 Pysa 并在 SAPP 中打开结果：

1
2
3

pyre analyze --no-verify --save-results-to .
sapp analyze taint-output.json
sapp explore

可以看到新的特征，使用两个特征过滤，返回 0 个安全问题。

[ run 2 ]
>>> issues
Issue 3
            Code: 5001
         Message: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)
        Callable: views.do_or
         Sources: UserControlled
           Sinks: RemoteCodeExecution
        Features: always-via:format-string
                  first-index:left
                  always-via:assert_numeric  # 新特征
                  first-index:right
                  has:first-index
        Location: views.py:33|18|38
Min Trace Length: Source (0) | Sink (0)
--------------------------------------------------------------------------------
Issue 4
            Code: 5001
         Message: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)
        Callable: views.do_and
         Sources: UserControlled
           Sinks: RemoteCodeExecution
        Features: always-via:format-string
                  always-via:obscure
                  first-index:left
                  always-type:scalar
                  first-index:right
                  has:first-index
                  always-type:bool
        Location: views.py:21|18|39
Min Trace Length: Source (0) | Sink (0)
Found 2 issues with run_id 2.

[ run 2 ]
>>> issues(exclude_features=["always-type:bool"]) # 过滤 do_and
Issue 3
            Code: 5001
         Message: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)
        Callable: views.do_or
         Sources: UserControlled
           Sinks: RemoteCodeExecution
        Features: always-via:format-string
                  first-index:left
                  always-via:assert_numeric
                  first-index:right
                  has:first-index
        Location: views.py:33|18|38
Min Trace Length: Source (0) | Sink (0)
Found 1 issues with run_id 2.

[ run 2 ]
>>> issues(exclude_features=["always-type:bool", "always-via:assert_numeric"]) # 过滤 do_and、do_or

Found 0 issues with run_id 2.

[ run 2 ]
>>> exit
(tutorial) jck@analysis:~/pyre-check/documentation/pysa_tutorial/exercise4$

exercise5

动态模型生成器在 Pysa 之前运行，能够生成 .pysa 文件。官方仓库 pyre-check/tools/generate_taint_models/get_*.py 中包含了生成器，说明文档见 Dynamically Generating Models。

直接运行 pyre analyze --no-verify 没有检测出安全问题。

...
ƛ No cached overrides loaded, computing overrides...
ƛ `google.protobuf.message.Message.ClearField` has 57 overrides, this might slow down the analysis considerably.
ƛ `google.protobuf.message.Message.__init__` has 58 overrides, this might slow down the analysis considerably.
ƛ Iteration #2. 4 Callables [zipfile.ZipFile::__init__ (override), str::format (override), shelve.Shelf::__init__ (overri[]

views.py 和 urls.py 模仿 Django 处理请求的逻辑。views.py 两个函数都存在 RCE 漏洞，但 Pysa 产生了漏报。

from typing import Callable

def api_wrapper(func: Callable):
    def inner(request):
        func(request, **request.GET)
    return inner

def operate_on_twos(request, operator: str):
    result = eval(f"2 {operator} 2")  # noqa: P204
    return result

@api_wrapper
def operate_on_threes(request, operator: str):
    exec(f"result = 3 {operator} 3")
    return result  # noqa: F821

urls.py

from dataclasses import dataclass
from views import operate_on_twos

@dataclass
class UrlPattern:
    path: str
    callback: str

urlpatterns = [UrlPattern(r"^operate_on_twos/(.*)", operate_on_twos)]

generate_models.py 能够为 views.py 动态生成污点注释。

import importlib
import sys
from pathlib import Path
from urls import UrlPattern

# Make sure we're able to import dependencies in 'pyre-check' repo, since they
# are not currently in the PyPI package for pyre-check
current_file = Path(__file__).absolute()
sys.path.append(str(current_file.parents[4]))

# Work around '-' in the name of 'pyre-check'
generate_taint_models = importlib.import_module(
    "pyre-check.tools.generate_taint_models"
)
view_generator = importlib.import_module(
    "pyre-check.tools.generate_taint_models.view_generator"
)
generator_specifications = importlib.import_module(
    "pyre-check.tools.generate_taint_models.generator_specifications"
)

class Ignore:
    pass

def main() -> None:
    # Here, specify all the generators that you might want to call.
    generators = {
        "django_path_params": generate_taint_models.RESTApiSourceGenerator(
            django_urls=view_generator.DjangoUrls(
                urls_module="urls",
                url_pattern_type=UrlPattern,
                url_resolver_type=Ignore,
            )
        ),
        # "decorator_extracted_params": generate_taint_models.<GENERATOR_NAME>(
        #     root=".",
        #     annotation_specifications=[
        #         generate_taint_models.DecoratorAnnotationSpecification(
        #             decorator=<DECORATOR_NAME_INCLUDING_PRECEEDING_@>,
        #             annotations=generator_specifications.default_entrypoint_taint,
        #         )
        #     ],
        # ),
    }
    # The `run_generators` function will take care of parsing command-line arguments, as
    # well as executing the generators specified in `default_modes` unless you pass in a
    # specific set from the command line.
    generate_taint_models.run_generators(
        generators,
        default_modes=[
            "django_path_params",
            # "decorator_extracted_params"
        ],
    )

if __name__ == "__main__":
    main()

利用 generate_models.py 动态生成 .pysa 文件。

1	python generate_models.py --output-directory .

报错 graphql3 模块没有找到，将文件中的 graphql3 改为 graphql：

1 2	vim /home/jck/pyre-check/tools/generate_taint_models/get_dynamic_graphql_sources.py # from graphql import GraphQLSchema

重新执行成功，输出如下，生成 generated_django_path_params.pysa 配置文件。

(tutorial) jck@analysis:~/pyre-check/documentation/pysa_tutorial/exercise5$ python generate_models.py --output-directory .
2021-07-07 03:24:56 INFO Computing models for `django_path_params`
2021-07-07 03:24:56 INFO Getting all URLs from `urls`
2021-07-07 03:24:56 INFO Computed models for `django_path_params` in 0.000 seconds.
{"number of generated models": 1}

generated_django_path_params.pysa 中指明了使用污点源点和汇点的函数。

1	def views.operate_on_twos(request: TaintSource[UserControlled], operator: TaintSource[UserControlled]) -> TaintSink[ReturnedToUser]: ...

再次执行 pyre analyze --no-verify，得到 1 个安全问题，还有一个问题仍然漏报。

...
ƛ No cached overrides loaded, computing overrides...
ƛ `google.protobuf.message.Message.ClearField` has 57 overrides, this might slow down the analysis considerably.
ƛ `google.protobuf.message.Message.__init__` has 58 overrides, this might slow down the analysis considerably.
ƛ Iteration #2. 4 Callables [zipfile.ZipFile::__init__ (override), str::format (override), shelve.Shelf::__init__ (overri[
  {
    "line": 17,
    "column": 18,
    "stop_line": 17,
    "stop_column": 35,
    "path": "views.py",
    "code": 5001,
    "name": "Possible shell injection",
    "description":
      "Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
    "long_description":
      "Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
    "concise_description":
      "Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
    "inference": null,
    "define": "views.operate_on_twos"
  }
]

扩展 generate_models.py 识别装饰器，使用注释的内容，只需要填入两项（生成器可以在 Example Model Generators 里找）：

<GENERATOR_NAME>：AnnotatedFreeFunctionWithDecoratorGenerator
<DECORATOR_NAME_INCLUDING_PRECEEDING_@>："@api_wrapper"

def main() -> None:
    # Here, specify all the generators that you might want to call.
    generators = {
        "django_path_params": generate_taint_models.RESTApiSourceGenerator(
            django_urls=view_generator.DjangoUrls(
                urls_module="urls",
                url_pattern_type=UrlPattern,
                url_resolver_type=Ignore,
            )
        ),
        "decorator_extracted_params": generate_taint_models.AnnotatedFreeFunctionWithDecoratorGenerator(
            root=".",
            annotation_specifications=[
                generate_taint_models.DecoratorAnnotationSpecification(
                    decorator="@api_wrapper",
                    annotations=generator_specifications.default_entrypoint_taint,
                )
            ],
        ),
    }
    # The `run_generators` function will take care of parsing command-line arguments, as
    # well as executing the generators specified in `default_modes` unless you pass in a
    # specific set from the command line.
    generate_taint_models.run_generators(
        generators,
        default_modes=[
            "django_path_params",
            "decorator_extracted_params"
        ],
    )

重新生成 .pysa 文件并执行分析：

1 2	python generate_models.py --output-directory . pyre analyze --no-verify

检测出 2 个安全问题：

...
ƛ No cached overrides loaded, computing overrides...
ƛ `google.protobuf.message.Message.ClearField` has 57 overrides, this might slow down the analysis considerably.
ƛ `google.protobuf.message.Message.__init__` has 58 overrides, this might slow down the analysis considerably.
ƛ Iteration #2. 4 Callables [zipfile.ZipFile::__init__ (override), str::format (override), shelve.Shelf::__init__ (overri[
  {
    "line": 25,
    "column": 9,
    "stop_line": 25,
    "stop_column": 35,
    "path": "views.py",
    "code": 5001,
    "name": "Possible shell injection",
    "description":
      "Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
    "long_description":
      "Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
    "concise_description":
      "Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
    "inference": null,
    "define": "views.operate_on_threes"
  },
  {
    "line": 17,
    "column": 18,
    "stop_line": 17,
    "stop_column": 35,
    "path": "views.py",
    "code": 5001,
    "name": "Possible shell injection",
    "description":
      "Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
    "long_description":
      "Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
    "concise_description":
      "Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
    "inference": null,
    "define": "views.operate_on_twos"
  }
]