机器学习 决策树可视化
Sep092019
机器学习决策树可视化
<机器学习实战>第三章中介绍了基于Json
格式的决策树的可视化方法,但是书中介绍的方法画出来的图过于简陋.
于是使用Python
的pygraphviz
库重新画了一个,看起来舒服一些.
pygraphviz
依赖于GraphViz,因此使用之前需要先下载安装.
Mac系统直接brew安装就好了.
假设决策树算法计算完成之后,生成类似下面两个json格式的决策树:
- ml.json
{
"有胡子": {
0: {
"长头发": {
0: "女",
1: "女"
}
},
1: "男"
}
}
- lens.json
{
"tearRate": {
"reduced": "no lenses",
"normal": {
"astigmatic": {
"yes": {
"prescript": {
"hyper": {
"age": {
"pre": "no lenses",
"young": "hard",
"presbyopic": "no lenses"
}
},
"myope": "hard"
}
},
"no": {
"age": {
"pre": "soft",
"young": "soft",
"presbyopic": {
"prescript": {
"hyper": "soft",
"myope": "no lenses"
}
}
}
}
}
}
}
}
使用本文中的方法生成的决策树图形如下所示:
Python代码treeGraph.py
如下:
#!/usr/bin/env python3
import pygraphviz as pgv
import uuid
def buildTreeGraphOpt(myTree, parent, treeLevel, label, theGraph):
"""
使用 pygraphviz 库(底层依赖 graphviz) 画树形图, (优化过的算法)
:param myTree: 根据训练数据生成的决策树,是dict类型
:param parent: 当前处理的 myTree的根节点,是graphviz 中的node的ID
:param treeLevel: 表示当前处理的树的层次, 偶数层是特征名称,需要绘制一个节点,奇数层时特征值,
特征值会传递给下一层,然后变成当前层和下一层Edge的label
:param label: 如果当前层时偶数, label是上一层的特征值,需要绘制称链接上一层与当前层的Edge的label
:param theGraph: 传入的pygraphviz 库中的AGraph 对象
:return: 没有返回值
"""
currentGraph = theGraph
currTreeLevel = treeLevel
nextTreeLevel = treeLevel + 1
currentLabel = label
# print(f"\nlevel={treeLevel},parent={parent}, label={label},myTree={myTree} ")
if not isinstance(myTree, dict): # 叶子节点, 由奇数层递归调用至此
leaf = myTree
# print(f"\nnot dict, myTree={myTree}")
keyNodeId = f"{leaf}_{currTreeLevel}_{uuid.uuid1()}"
currentGraph.add_node(keyNodeId, label=leaf)
if parent is not None and currentLabel is not None and currentLabel != "":
# print(f"-------------------------parent={parent},currentLabel={currentLabel}, current node={keyNodeId}")
currentGraph.add_edge(parent, keyNodeId, label=currentLabel)
return
for k in myTree.keys():
v = myTree[k]
if currTreeLevel % 2 == 0:
keyNodeId = f"{k}_{currTreeLevel}_{uuid.uuid1()}"
currentGraph.add_node(keyNodeId, label=k)
if parent is not None and currentLabel is not None and currentLabel != "":
currentGraph.add_edge(parent, keyNodeId, label=currentLabel)
if isinstance(v, dict):
buildTreeGraphOpt(v, keyNodeId, nextTreeLevel, None, currentGraph)
else:
return # 已经到叶子节点了
else:
currentLabel = k
buildTreeGraphOpt(v, parent, nextTreeLevel, currentLabel, currentGraph)
def generatePicForTreeOpt(picFileName, theTree):
"""
将决策树(dict类型)可视化,保存在由picFileName指定的文件中
:param picFileName:
:param theTree:
:return:
"""
G = pgv.AGraph(directed=True, rankdir='UD')
G.graph_attr['epsilon'] = '0.001'
buildTreeGraphOpt(theTree, None, 0, None, G)
G.layout('dot')
G.draw(picFileName)
def main():
print(__file__)
tree1 = {'tearRate': {'reduced': 'no lenses', 'normal': {'astigmatic': {'yes': {
'prescript': {'hyper': {'age': {'pre': 'no lenses', 'young': 'hard', 'presbyopic': 'no lenses'}},
'myope': 'hard'}}, 'no': {'age': {'pre': 'soft', 'young': 'soft', 'presbyopic': {
'prescript': {'hyper': 'soft', 'myope': 'no lenses'}}}}}}}}
tree2 = {'有胡子': {0: {'长头发': {0: '女', 1: '女'}}, 1: '男'}}
picFilename1 = "lens.png"
picFilename2 = "mf.png"
generatePicForTreeOpt(picFilename1, tree1)
generatePicForTreeOpt(picFilename2, tree2)
if __name__ == '__main__':
main()
赞 赏微信赞赏 支付宝赞赏
本文固定链接: https://www.jack-yin.com/coding/essay/3124.html | 边城网事