前言

由 Facebook 开源的 Pyre 是兼容 PEP 484 的 Python 性能类型检查器,可以增量分析大型代码库,能够迅速处理百万级别的代码。Pyre 附带了 Pysa,一个关注安全性的静态分析工具,Pysa 是 Python Static Analyzer 的缩写,Pysa 支持追踪和分析 Python 程序中的数据流(污点分析)。

此外还有一个 SAPP (Static Analysis Post Processor) 静态分析后置处理器,提供命令行和 UI 检索 Pysa 的执行结果。

关于 Python 的类型(PEP 484),建议阅读 mypy清单类型参考 。下面就是没有添加和添加了类型注释的两个函数,Python3.5 开始支持可选的类型注释,这个特性极大地方便了对 Python 程序进行静态分析,不过就我看到的开源工具很少有添加类型注释的…

1
2
3
4
5
6
7
8
9
from typing import List

def unannotated(): # implictly returns `Any`
return b"" + "" # function body is not checked

def annotated() -> List: # explicit return annotation means we type check `annotated`
any = unannotated()
any.attribute # `Any` has all possible attributes
return 1 # Error: returning `int` but expecting `List`

Pysa 跟踪数据流,用户定义源点(产生数据的地方)和汇点(来自源点的数据不应该结束的地方),当源点和汇点相交时就产生了问题

  • 最常见的数据源点就是用户控制的输入
  • 汇点比较多样,包括各种 API

Pysa 执行的是过程间分析,即跟踪跨函数调用的数据流(污点分析),使用代码中的所有可用信息,包括可选的静态类型信息。Pyre 能够为源码添加类型信息,它本身的作用就是静态类型检查器。

局限

  1. 问题空间

    Pysa 只能追踪从 admin_operation 到 delete_user 的数据流,无法检查 user_is_admin

    1
    2
    3
    4
    5
    def admin_operation(request: HttpRequest):
    if not user_is_admin():
    return Http404

    delete_user(request.GET["user_to_delete"])
  2. Python 的动态特性

    Pysa 无法识别动态导入的模块函数调用

    1
    2
    3
    4
    5
    6
    def secret_eval(request: HttpRequest):
    os = importlib.import_module("os")

    # Pysa won't know what 'os' is, and thus won't
    # catch this remote code execution issue
    os.system(request.GET["command"])
  3. 装饰器

    2020.8.7 Facebook 博客指出暂不支持在调用图中包括装饰器

Facebook 提供了 Pysa 的教程 👉 Pysa Tutorial ,涵盖几个主要的内容,进行实验的过程中遇到的一些问题也都磕磕绊绊地解决了。

实验

Pysa 使用 pyre analyze 调用,实验中涉及到以下几个配置文件和工具:

taint.config

  • 定义污点的源(source)和汇(sink),还包括隐式的源和汇
  • 检测规则(rule),例如,从某源点到某汇点是 XXX 攻击,一条规则中可以包含多个源和汇
  • 特征(feature):污点的附加元数据,可用于过滤误报

.pysa

  • 污点模型,表示哪里是源点和汇点(利用签名),使用完全限定名,格式必须匹配 .pyi 存根文件
    • TaintSource[SOURCE_NAME] 标记源点
    • TaintSink[SINK_NAME] 标记汇点
    • TaintInTaintOut 标记污点进入进出,指的是进入函数的污点传播到返回值中
    • PartialSink 标记组合源,使用规则名称
  • 消毒器(Sanitizer)表示对象的变化,经过消毒器污点就被净化,不再跟踪
    • 使用装饰器声明函数为消毒器
    • 可以限定范围:源(source)、汇(sink)、污点进污点出(taint-in-taint-out, TITO)

.pyre_configuration

  • 路径配置:源代码、存根文件等

SAPP

  • 交互式命令行
  • Web 服务器

动态生成污点模型

环境设置

实验环境: Ubuntu 20.04 Server + Python 3.8.10 + pip 20.0.2

在 Python 虚拟环境中进行实验

1
2
3
4
5
6
7
8
9
10
11
12
13
# 下载源码
git clone https://github.com/facebook/pyre-check.git
cd pyre-check

# 安装虚拟环境
cd documentation/pysa_tutorial
python -m venv tutorial

# 激活虚拟环境
source tutorial/bin/activate

# 安装依赖
pip install pyre-check fb-sapp

如果这里遇到 Error 可以更新一下工具

1
2
pip install wheel
python -m pip install --upgrade setuptools

exercise1

views.py 存在远程代码执行(Remote Code Execution, RCE)漏洞。Pysa 需要知道 request.GET 包含用户控制的数据,eval 可以执行代码。

1
2
3
4
5
6
from django.http import HttpRequest, HttpResponse

def operate_on_twos(request: HttpRequest) -> HttpResponse:
operator = request.GET["operator"]
result = eval(f"2 {operator} 2") # noqa: P204
return result

taint.config 编写规则,告诉 Pysa 当 CustomUserControlled 源点数据到达 CodeExecution 汇点时会引发 RCE 。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
{
"sources": [
{
"name": "CustomUserControlled",
"comment": "use to annotate user input"
}
],

"sinks": [
{
"name": "CodeExecution",
"comment": "use to annotate execution of python code"
}
],

"features": [],

"rules": [
{
"name": "Possible RCE:",
"code": 5001,
"sources": [ "CustomUserControlled" ],
"sinks": [ "CodeExecution" ],
"message_format": "User specified data may reach a code execution sink"
}
]
}

sources_sinks.pysa 告诉 Pysa request.GetCustomUserControlled 类型的污点源点 TaintSource,而 evalCodeExecution 代码执行类型的污点汇点 TaintSink

1
2
3
django.http.request.HttpRequest.GET: TaintSource[CustomUserControlled] = ...

def eval(__source: TaintSink[CodeExecution], __globals, __locals): ...

.pyre_configuration 配置了搜索的路径。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
"source_directories": [ // 查找源码的目录
"."
],
"taint_models_path": [ // 查找 .pysa/taint.config 文件的目录
"."
],
"search_path": [ // 查找存根文件
"../../../stubs/"
],
"exclude": [
".*/integration_test/.*"
],
"use_command_v2": true
}

具体的执行结果如下,最后输出的 JSON 数组给出了问题列表。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
(tutorial) jck@analysis:~/pyre-check/documentation/pysa_tutorial/exercise1$ pyre analyze
ƛ No cached overrides loaded, computing overrides...
ƛ `google.protobuf.message.Message.ClearField` has 57 overrides, this might slow down the analysis considerably.
ƛ `google.protobuf.message.Message.__init__` has 58 overrides, this might slow down the analysis considerably.
ƛ `object.__eq__` has 82 overrides, this might slow down the analysis considerably.
ƛ `object.__init__` has 754 overrides, this might slow down the analysis considerably.
ƛ `object.__ne__` has 60 overrides, this might slow down the analysis considerably.
ƛ `type.__call__` has 131 overrides, this might slow down the analysis considerably.
ƛ `type.__init__` has 448 overrides, this might slow down the analysis considerably.
ƛ `type.__new__` has 75 overrides, this might slow down the analysis considerably.
[
{
"line": 12,
"column": 18,
"stop_line": 12,
"stop_column": 35,
"path": "views.py",
"code": 5001,
"name": "Possible RCE:",
"description":
"Possible RCE: [5001]: User specified data may reach a code execution sink",
"long_description":
"Possible RCE: [5001]: User specified data may reach a code execution sink",
"concise_description":
"Possible RCE: [5001]: User specified data may reach a code execution sink",
"inference": null,
"define": "views.operate_on_twos"
}
]

exercise2

views.py 三个函数都存在远程执行漏洞,前两个执行 python 代码,最后一个执行 shell 代码。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
import subprocess
from django.http import HttpRequest, HttpResponse

def operate_on_twos(request: HttpRequest) -> HttpResponse:
operator = request.POST["operator"]
result = eval(f"2 {operator} 2") # noqa: P204
return result

def operate_on_threes(request: HttpRequest) -> HttpResponse:
operator = request.GET["operator"]
exec(f"result = 3 {operator} 3")
return result # noqa: F821

def operate_on_fours(request: HttpRequest) -> HttpResponse:
operator = request.GET["operator"]
result = subprocess.getoutput(f"expr 4 {operator} 4")
return result

taint.config 规则中已经添加了名称为 ShellExecution 的汇点。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
{
"sources": [
{
"name": "CustomUserControlled",
"comment": "use to annotate user input"
}
],

"sinks": [
{
"name": "CodeExecution",
"comment": "use to annotate execution of python code"
},
{
"name": "ShellExecution",
"comment": "use to annotate execution of shell scripts"
}
],

"features": [],

"rules": [
{
"name": "Possible RCE:",
"code": 5001,
"sources": [ "CustomUserControlled" ],
"sinks": [ "CodeExecution" ],
"message_format": "User specified data may reach a code execution sink"
}
]
}

在 taint.config 中添加一个 CustomUserControlled 源点到 ShellExecution 汇点的规则,将 code 定义为 5002 。

1
2
3
4
5
6
7
8
9
"rules": [
{
"name": "Possible RCE:",
"code": 5002,
"sources": [ "CustomUserControlled" ],
"sinks": [ "ShellExecution" ],
"message_format": "User specified data may reach a shell execution sink"
}
]

sources_sinks.pysa 包含带有污点注释的模型,这些模型必须匹配 .pyi 存根文件中的存根或源码。将 .pyi 存根或源码转换为模型时,必须确保:

  • 函数名不变
  • 参数名不变
  • 删除类型注释
  • 函数或属性是完全限定的

Pyre 的主要存根来自于 typeshed(包含 Python 标准库和 Python 内置包的外部类型注释,以及项目外部人员贡献的第三方包)。还有一部分是为 Pysa 编写的存根,涵盖 Django 等第三方库,不包含在 typeshed 中。

1
2
3
4
5
django.http.request.HttpRequest.GET: TaintSource[CustomUserControlled] = ...

def eval(__source: TaintSink[CodeExecution], __globals, __locals): ...

def subprocess.getoutput(cmd: TaintSink[ShellExecution]): ...

添加规则,指明使用源点和汇点的函数。

1
2
3
django.http.request.HttpRequest.POST: TaintSource[CustomUserControlled] = ...

def exec(__source: TaintSink[CodeExecution], __globals, __locals): ...

执行 pyre analyze ,结果如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
(tutorial) jck@analysis:~/pyre-check/documentation/pysa_tutorial/exercise2$ pyre analyze
ƛ No cached overrides loaded, computing overrides...
ƛ `google.protobuf.message.Message.ClearField` has 57 overrides, this might slow down the analysis considerably.
ƛ `google.protobuf.message.Message.__init__` has 58 overrides, this might slow down the analysis considerably.
ƛ `object.__eq__` has 82 overrides, this might slow down the analysis considerably.
ƛ `object.__init__` has 754 overrides, this might slow down the analysis considerably.
ƛ `object.__ne__` has 60 overrides, this might slow down the analysis considerably.
ƛ `type.__call__` has 131 overrides, this might slow down the analysis considerably.
ƛ `type.__init__` has 448 overrides, this might slow down the analysis considerably.
ƛ `type.__new__` has 75 overrides, this might slow down the analysis considerably.
[
{
"line": 30,
"column": 34,
"stop_line": 30,
"stop_column": 56,
"path": "views.py",
"code": 5002,
"name": "Possible RCE:",
"description":
"Possible RCE: [5002]: User specified data may reach a shell execution sink",
"long_description":
"Possible RCE: [5002]: User specified data may reach a shell execution sink",
"concise_description":
"Possible RCE: [5002]: User specified data may reach a shell execution sink",
"inference": null,
"define": "views.operate_on_fours"
},
{
"line": 22,
"column": 9,
"stop_line": 22,
"stop_column": 35,
"path": "views.py",
"code": 5001,
"name": "Possible RCE:",
"description":
"Possible RCE: [5001]: User specified data may reach a code execution sink",
"long_description":
"Possible RCE: [5001]: User specified data may reach a code execution sink",
"concise_description":
"Possible RCE: [5001]: User specified data may reach a code execution sink",
"inference": null,
"define": "views.operate_on_threes"
},
{
"line": 14,
"column": 18,
"stop_line": 14,
"stop_column": 35,
"path": "views.py",
"code": 5001,
"name": "Possible RCE:",
"description":
"Possible RCE: [5001]: User specified data may reach a code execution sink",
"long_description":
"Possible RCE: [5001]: User specified data may reach a code execution sink",
"concise_description":
"Possible RCE: [5001]: User specified data may reach a code execution sink",
"inference": null,
"define": "views.operate_on_twos"
}
]

exercise3

直接运行 pyre analyze 提示模型验证错误:Found 93 model verification errors in exercise3 #441

1
2
ƛ Finding taint models in `/home/jck/pyre-check/stubs/taint, /home/jck/pyre-check/documentation/pysa_tutorial/exercise3`.ƛ Found 93 model verification errors!
...

添加 --no-verify 参数得到预期反馈,存在误报(假阳)问题。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
...
ƛ No cached overrides loaded, computing overrides...
ƛ `google.protobuf.message.Message.ClearField` has 57 overrides, this might slow down the analysis considerably.
ƛ `google.protobuf.message.Message.__init__` has 58 overrides, this might slow down the analysis considerably.
ƛ Iteration #2. 4 Callables [zipfile.ZipFile::__init__ (override), str::format (override), shelve.Shelf::__init__ (overri[
{
"line": 34,
"column": 9,
"stop_line": 34,
"stop_column": 35,
"path": "views.py",
"code": 5001,
"name": "Possible shell injection",
"description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"long_description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"concise_description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"inference": null,
"define": "views.operate_on_threes"
},
{
"line": 24,
"column": 18,
"stop_line": 24,
"stop_column": 35,
"path": "views.py",
"code": 5001,
"name": "Possible shell injection",
"description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"long_description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"concise_description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"inference": null,
"define": "views.operate_on_twos"
}
]

Pysa 为许多 Python 标准库和开源库提供了预先编写的源点、汇点和规则。预先写好的 taint.config*.pysa 文件在 stubs/taint 文件夹中。

views.py 所有函数都没有安全问题,但执行 pyre analyze --no-verify 出现误报。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
from django.http import HttpRequest, HttpResponse

def example_sanitizer():
...

def get_operator_safe(request: HttpRequest) -> str:
operator = request.POST["operator"]
assert operator in {"+", "-", "*", "/"}
return operator

def operate_on_twos(request: HttpRequest) -> HttpResponse:
operator = get_operator_safe(request)
result = eval(f"2 {operator} 2") # noqa: P204
return result

def operate_on_threes(request: HttpRequest) -> HttpResponse:
operator = request.GET["operator"]
assert operator in {"+", "-", "*", "/"}
exec(f"result = 3 {operator} 3")
return result # noqa: F821

sanitizers.pysa 中定义了消毒器(Sanitizer),它们标记了 Pysa 对待整个可调用对象的方式变化,而不仅仅是返回值或参数,使用装饰器表示注释。

1
2
@Sanitize
def views.example_sanitizer(): ...

对于 operate_on_twos 函数,因为有调用 get_operator_safe 过滤请求,所以后续调用 eval() 也是安全的,将 get_operator_safe 标记为消毒器。

1
2
@Sanitize
def views.get_operator_safe(request: TaintSource[UserControlled]): ...

再次执行 pyre analyze --no-verify 只剩一个误报:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
...
ƛ No cached overrides loaded, computing overrides...
ƛ `google.protobuf.message.Message.ClearField` has 57 overrides, this might slow down the analysis considerably.
ƛ `google.protobuf.message.Message.__init__` has 58 overrides, this might slow down the analysis considerably.
ƛ Iteration #2. 4 Callables [zipfile.ZipFile::__init__ (override), str::format (override), shelve.Shelf::__init__ (overri[
{
"line": 34,
"column": 9,
"stop_line": 34,
"stop_column": 35,
"path": "views.py",
"code": 5001,
"name": "Possible shell injection",
"description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"long_description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"concise_description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"inference": null,
"define": "views.operate_on_threes"
}
]

operate_on_threes 函数本身就过滤了请求,这里添加一个 identity 函数调用,将参数原样返回。

1
2
3
4
5
6
7
8
9
def identity(x):
return x

def operate_on_threes(request: HttpRequest) -> HttpResponse:
operator = request.GET["operator"]
assert operator in {"+", "-", "*", "/"}
operator = identity(operator)
exec(f"result = 3 {operator} 3")
return result # noqa: F821

将 identity 函数标记为消毒器,指明经过该消毒器的污点源就不再是污点数据,不用继续跟踪。

1
2
@Sanitize
def views.identity(x: TaintSource[UserControlled]): ...

再次执行 pyre analyze --no-verify,不存在误报了。

1
2
3
4
5
...
ƛ No cached overrides loaded, computing overrides...
ƛ `google.protobuf.message.Message.ClearField` has 57 overrides, this might slow down the analysis considerably.
ƛ `google.protobuf.message.Message.__init__` has 58 overrides, this might slow down the analysis considerably.
ƛ Iteration #2. 4 Callables [zipfile.ZipFile::__init__ (override), str::format (override), shelve.Shelf::__init__ (overri[]

exercise4

使用 SAPP (Static Analysis Post Processor) 提供的交互式命令行。

views.py 函数同样不存在安全问题,但又产生了误报。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
from django.http import HttpRequest, HttpResponse

def example_feature(argument: str) -> None:
...

def assert_numeric(operand: str) -> None:
assert operand.isnumeric()

def do_and(request: HttpRequest) -> HttpResponse:
left = bool(request.GET["left"])
right = bool(request.GET["right"])
result = eval(f"{left} and {right}") # noqa: P204
return result

def do_or(request: HttpRequest) -> HttpResponse:
left = request.GET["left"]
right = request.GET["right"]
assert_numeric(left)
assert_numeric(right)
result = eval(f"{left} or {right}") # noqa: P204
return result

taint.config 特征(feature)是与污点流相关的附加元数据,可用于过滤误报(不影响分析)。

1
2
3
4
5
6
7
8
9
10
11
{
"sources": [],
"sinks": [],
"features": [
{
"name": "example",
"comment": "Copy this feature and write your own. Don't forget that JSON lists are comma seperated!"
}
],
"rules": []
}

features.pysa 使用了名称为 example 的特征。

1
def views.example_feature(argument: AddFeatureToArgument[Via[example]]): ...

运行 Pysa 并在 SAPP 中打开结果:

1
2
3
pyre analyze --no-verify --save-results-to .
sapp analyze taint-output.json
sapp explore

使用 SAPP 交互式命令行查看安全问题:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
...
[ run 1 ]
>>> issues # 返回 2 个问题
Issue 1
Code: 5001
Message: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)
Callable: views.do_or
Sources: UserControlled
Sinks: RemoteCodeExecution
Features: first-index:right
always-via:format-string
has:first-index
first-index:left
Location: views.py:33|18|38
Min Trace Length: Source (0) | Sink (0)
--------------------------------------------------------------------------------
Issue 2
Code: 5001
Message: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)
Callable: views.do_and
Sources: UserControlled
Sinks: RemoteCodeExecution
Features: first-index:right
always-type:bool
always-type:scalar
always-via:obscure
always-via:format-string
first-index:left
has:first-index
Location: views.py:21|18|39
Min Trace Length: Source (0) | Sink (0)
Found 2 issues with run_id 1.

[ run 1 ]
>>> issues(exclude_features=["always-type:bool"]) # 过滤 do_and
Issue 1
Code: 5001
Message: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)
Callable: views.do_or
Sources: UserControlled
Sinks: RemoteCodeExecution
Features: first-index:right
always-via:format-string
has:first-index
first-index:left
Location: views.py:33|18|38
Min Trace Length: Source (0) | Sink (0)
Found 1 issues with run_id 1.

[ run 1 ]
>>> exit # 退出
(tutorial) jck@analysis:~/pyre-check/documentation/pysa_tutorial/exercise4$

在 taint.config 中添加名称为 assert_numeric 的特征。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
{
"sources": [],
"sinks": [],
"features": [
{
"name": "example",
"comment": "Copy this feature and write your own. Don't forget that JSON lists are comma seperated!"
},
{
"name": "assert_numeric",
"comment": "via assert_numeric"
}
],
"rules": []
}

在 features.pysa 中使用该特征,指明 views.assert_numeric 函数的 operand 参数带有该特征。

1
def views.assert_numeric(operand: AddFeatureToArgument[Via[assert_numeric]]): ...

重新运行 Pysa 并在 SAPP 中打开结果:

1
2
3
pyre analyze --no-verify --save-results-to .
sapp analyze taint-output.json
sapp explore

可以看到新的特征,使用两个特征过滤,返回 0 个安全问题。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
[ run 2 ]
>>> issues
Issue 3
Code: 5001
Message: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)
Callable: views.do_or
Sources: UserControlled
Sinks: RemoteCodeExecution
Features: always-via:format-string
first-index:left
always-via:assert_numeric # 新特征
first-index:right
has:first-index
Location: views.py:33|18|38
Min Trace Length: Source (0) | Sink (0)
--------------------------------------------------------------------------------
Issue 4
Code: 5001
Message: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)
Callable: views.do_and
Sources: UserControlled
Sinks: RemoteCodeExecution
Features: always-via:format-string
always-via:obscure
first-index:left
always-type:scalar
first-index:right
has:first-index
always-type:bool
Location: views.py:21|18|39
Min Trace Length: Source (0) | Sink (0)
Found 2 issues with run_id 2.

[ run 2 ]
>>> issues(exclude_features=["always-type:bool"]) # 过滤 do_and
Issue 3
Code: 5001
Message: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)
Callable: views.do_or
Sources: UserControlled
Sinks: RemoteCodeExecution
Features: always-via:format-string
first-index:left
always-via:assert_numeric
first-index:right
has:first-index
Location: views.py:33|18|38
Min Trace Length: Source (0) | Sink (0)
Found 1 issues with run_id 2.

[ run 2 ]
>>> issues(exclude_features=["always-type:bool", "always-via:assert_numeric"]) # 过滤 do_and、do_or

Found 0 issues with run_id 2.

[ run 2 ]
>>> exit
(tutorial) jck@analysis:~/pyre-check/documentation/pysa_tutorial/exercise4$

exercise5

动态模型生成器在 Pysa 之前运行,能够生成 .pysa 文件。官方仓库 pyre-check/tools/generate_taint_models/get_*.py 中包含了生成器,说明文档见 Dynamically Generating Models

直接运行 pyre analyze --no-verify 没有检测出安全问题。

1
2
3
4
5
...
ƛ No cached overrides loaded, computing overrides...
ƛ `google.protobuf.message.Message.ClearField` has 57 overrides, this might slow down the analysis considerably.
ƛ `google.protobuf.message.Message.__init__` has 58 overrides, this might slow down the analysis considerably.
ƛ Iteration #2. 4 Callables [zipfile.ZipFile::__init__ (override), str::format (override), shelve.Shelf::__init__ (overri[]

views.py 和 urls.py 模仿 Django 处理请求的逻辑。views.py 两个函数都存在 RCE 漏洞,但 Pysa 产生了漏报。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
from typing import Callable

def api_wrapper(func: Callable):
def inner(request):
func(request, **request.GET)
return inner

def operate_on_twos(request, operator: str):
result = eval(f"2 {operator} 2") # noqa: P204
return result

@api_wrapper
def operate_on_threes(request, operator: str):
exec(f"result = 3 {operator} 3")
return result # noqa: F821

urls.py

1
2
3
4
5
6
7
8
9
from dataclasses import dataclass
from views import operate_on_twos

@dataclass
class UrlPattern:
path: str
callback: str

urlpatterns = [UrlPattern(r"^operate_on_twos/(.*)", operate_on_twos)]

generate_models.py 能够为 views.py 动态生成污点注释。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
import importlib
import sys
from pathlib import Path
from urls import UrlPattern

# Make sure we're able to import dependencies in 'pyre-check' repo, since they
# are not currently in the PyPI package for pyre-check
current_file = Path(__file__).absolute()
sys.path.append(str(current_file.parents[4]))

# Work around '-' in the name of 'pyre-check'
generate_taint_models = importlib.import_module(
"pyre-check.tools.generate_taint_models"
)
view_generator = importlib.import_module(
"pyre-check.tools.generate_taint_models.view_generator"
)
generator_specifications = importlib.import_module(
"pyre-check.tools.generate_taint_models.generator_specifications"
)

class Ignore:
pass

def main() -> None:
# Here, specify all the generators that you might want to call.
generators = {
"django_path_params": generate_taint_models.RESTApiSourceGenerator(
django_urls=view_generator.DjangoUrls(
urls_module="urls",
url_pattern_type=UrlPattern,
url_resolver_type=Ignore,
)
),
# "decorator_extracted_params": generate_taint_models.<GENERATOR_NAME>(
# root=".",
# annotation_specifications=[
# generate_taint_models.DecoratorAnnotationSpecification(
# decorator=<DECORATOR_NAME_INCLUDING_PRECEEDING_@>,
# annotations=generator_specifications.default_entrypoint_taint,
# )
# ],
# ),
}
# The `run_generators` function will take care of parsing command-line arguments, as
# well as executing the generators specified in `default_modes` unless you pass in a
# specific set from the command line.
generate_taint_models.run_generators(
generators,
default_modes=[
"django_path_params",
# "decorator_extracted_params"
],
)

if __name__ == "__main__":
main()

利用 generate_models.py 动态生成 .pysa 文件。

1
python generate_models.py --output-directory .

报错 graphql3 模块没有找到,将文件中的 graphql3 改为 graphql

1
2
vim /home/jck/pyre-check/tools/generate_taint_models/get_dynamic_graphql_sources.py
# from graphql import GraphQLSchema

重新执行成功,输出如下,生成 generated_django_path_params.pysa 配置文件。

1
2
3
4
5
(tutorial) jck@analysis:~/pyre-check/documentation/pysa_tutorial/exercise5$ python generate_models.py --output-directory .
2021-07-07 03:24:56 INFO Computing models for `django_path_params`
2021-07-07 03:24:56 INFO Getting all URLs from `urls`
2021-07-07 03:24:56 INFO Computed models for `django_path_params` in 0.000 seconds.
{"number of generated models": 1}

generated_django_path_params.pysa 中指明了使用污点源点和汇点的函数。

1
def views.operate_on_twos(request: TaintSource[UserControlled], operator: TaintSource[UserControlled]) -> TaintSink[ReturnedToUser]: ...

再次执行 pyre analyze --no-verify,得到 1 个安全问题,还有一个问题仍然漏报。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
...
ƛ No cached overrides loaded, computing overrides...
ƛ `google.protobuf.message.Message.ClearField` has 57 overrides, this might slow down the analysis considerably.
ƛ `google.protobuf.message.Message.__init__` has 58 overrides, this might slow down the analysis considerably.
ƛ Iteration #2. 4 Callables [zipfile.ZipFile::__init__ (override), str::format (override), shelve.Shelf::__init__ (overri[
{
"line": 17,
"column": 18,
"stop_line": 17,
"stop_column": 35,
"path": "views.py",
"code": 5001,
"name": "Possible shell injection",
"description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"long_description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"concise_description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"inference": null,
"define": "views.operate_on_twos"
}
]

扩展 generate_models.py 识别装饰器,使用注释的内容,只需要填入两项(生成器可以在 Example Model Generators 里找):

  • <GENERATOR_NAME>AnnotatedFreeFunctionWithDecoratorGenerator
  • <DECORATOR_NAME_INCLUDING_PRECEEDING_@>"@api_wrapper"
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
def main() -> None:
# Here, specify all the generators that you might want to call.
generators = {
"django_path_params": generate_taint_models.RESTApiSourceGenerator(
django_urls=view_generator.DjangoUrls(
urls_module="urls",
url_pattern_type=UrlPattern,
url_resolver_type=Ignore,
)
),
"decorator_extracted_params": generate_taint_models.AnnotatedFreeFunctionWithDecoratorGenerator(
root=".",
annotation_specifications=[
generate_taint_models.DecoratorAnnotationSpecification(
decorator="@api_wrapper",
annotations=generator_specifications.default_entrypoint_taint,
)
],
),
}
# The `run_generators` function will take care of parsing command-line arguments, as
# well as executing the generators specified in `default_modes` unless you pass in a
# specific set from the command line.
generate_taint_models.run_generators(
generators,
default_modes=[
"django_path_params",
"decorator_extracted_params"
],
)

重新生成 .pysa 文件并执行分析:

1
2
python generate_models.py --output-directory .
pyre analyze --no-verify

检测出 2 个安全问题:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
...
ƛ No cached overrides loaded, computing overrides...
ƛ `google.protobuf.message.Message.ClearField` has 57 overrides, this might slow down the analysis considerably.
ƛ `google.protobuf.message.Message.__init__` has 58 overrides, this might slow down the analysis considerably.
ƛ Iteration #2. 4 Callables [zipfile.ZipFile::__init__ (override), str::format (override), shelve.Shelf::__init__ (overri[
{
"line": 25,
"column": 9,
"stop_line": 25,
"stop_column": 35,
"path": "views.py",
"code": 5001,
"name": "Possible shell injection",
"description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"long_description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"concise_description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"inference": null,
"define": "views.operate_on_threes"
},
{
"line": 17,
"column": 18,
"stop_line": 17,
"stop_column": 35,
"path": "views.py",
"code": 5001,
"name": "Possible shell injection",
"description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"long_description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"concise_description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"inference": null,
"define": "views.operate_on_twos"
}
]

参阅