Superset 1.1.0 新手指南

pip install apache-superset==1.1.0 on CentOS 7.6

SuperSetopen in new window 是出色的开源 BI 工具(3.85W Star),SuperSet 1.1.0 版本可谓脱胎换骨,换了 UI 和交互,非常出色,值得推荐。

以下是艰辛的安装过程。

题记

使用 docker-composeopen in new window 部署遇到容器内无法与外部网络不通,使用 Running on Kubernetesopen in new window 的方式,添加 database 又遇到 An error occurred while creating databases: (configuration_method) Missing data for required field.

最后直接本机使用 pip install apache-superset==1.1.0 的方式部署成功,部署之路可谓坎坷。

前置条件

  • CentoOS 7.6 x86-64
  • Python >= 3.7.4

1. 准备安装环境

1.1 安装 Python 3.7.4

因为 apache-superset==1.1.0open in new window 的 Python 版本要求是 Requires: Python ~=3.7,所以首先把 CentOS 7 中的 Python 换成 3.7.4,否则只能安装 apache-superset 0.38.0open in new window

# wget https://www.python.org/ftp/python/3.7.0/Python-3.7.4.tgz
# tar zxf Python-3.7.4.tgz
# cd Python-3.7.4
# ./configure
# make && make install
1
2
3
4
5

1.2 安装系统依赖

# yum install gcc gcc-c++ libffi-devel python3-devel python3-pip python3-wheel openssl-devel cyrus-sasl-devel openldap-devel
1

1.3 安装 SuperSet

# pip install apache-superset==1.1.0
1
  • 初始化 DB
# superset db upgrade
1
  • Create an admin user
# export FLASK_APP=superset
# superset fab create-admin
1
2
  • Load some data to play with
# superset load_examples
1
  • Create default roles and permissions
# superset init
1
  • 启动 Web 服务

superset 使用了 django 框架,使用如下方式启动服务。

# superset run -h 10.0.0.30 -p 8088 --with-threads
1

2. 添加待查询的数据库实例

Install Database Driversopen in new window 页面中找到需要安装数据库驱动的方式。

这里介绍下小编需要使用的 MySQL、Hive、Impala、Druid

  • MySQL
# pip install mysqlclient
1

连接串: mysql://{username}:@{password}{hostname}:{port}/{database}

superset_db_mysq

  • Hive
# pip install 'pyhive[hive]' 
1

连接串: hive://{hostname}:{port, default 10000}/{database}

superset_db_hive

  • Impala
# pip install impyla
1

连接串: impala://{hostname}:{port, default 21050}/{database}

superset_db_impala

  • Druid
# pip install pydruid	
1

连接串: druid://<User>:<password>@<Host>:<Port-default-9088>/druid/v2/sql

superset_db_druid

3. 出图

3.1 添加 datasets

add_dataset

3.2 创建 Chart

chart_uk

自带 46 种 图表类型。 chart_uk_type

3.3 创建仪表盘

dashboard_uk

3.4 SQL Lab

sqllab

FAQ

使用 pip 安装 superset 还是蛮艰辛的,其中坎坷,一言难尽。

error: command 'gcc' failed with exit status 1

因为前面的安装依赖没做。

src/geohash.cpp:538:20: 致命错误:Python.h:没有那个文件或目录
    
error: command 'gcc' failed with exit status 1
1
2
3

ERROR: No matching distribution found for pandas<1.3,>=1.2.2

这是因为 pandas==1.2.2open in new window Requires: Python >=3.7.1,Python 3.7.0 是不行的。

ERROR: Could not find a version that satisfies the requirement pandas<1.3,>=1.2.2 (from apache-superset) (from versions: 0.1, 0.2b0, 0.2b1, 0.2, 0.3.0b0, 0.3.0b2, 0.3.0, 0.4.0, 0.4.1, 0.4.2, 0.4.3, 0.5.0, 0.6.0, 0.6.1, 0.7.0rc1, 0.7.0, 0.7.1, 0.7.2, 0.7.3, 0.8.0rc1, 0.8.0rc2, 0.8.0, 0.8.1, 0.9.0, 0.9.1, 0.10.0, 0.10.1, 0.11.0, 0.12.0, 0.13.0, 0.13.1, 0.14.0, 0.14.1, 0.15.0, 0.15.1, 0.15.2, 0.16.0, 0.16.1, 0.16.2, 0.17.0, 0.17.1, 0.18.0, 0.18.1, 0.19.0rc1, 0.19.0, 0.19.1, 0.19.2, 0.20.0rc1, 0.20.0, 0.20.1, 0.20.2, 0.20.3, 0.21.0rc1, 0.21.0, 0.21.1, 0.22.0, 0.23.0rc2, 0.23.0, 0.23.1, 0.23.2, 0.23.3, 0.23.4, 0.24.0rc1, 0.24.0, 0.24.1, 0.24.2, 0.25.0rc0, 0.25.0, 0.25.1, 0.25.2, 0.25.3, 1.0.0rc0, 1.0.0, 1.0.1, 1.0.2, 1.0.3, 1.0.4, 1.0.5, 1.1.0rc0, 1.1.0, 1.1.1, 1.1.2, 1.1.3, 1.1.4, 1.1.5)
ERROR: No matching distribution found for pandas<1.3,>=1.2.2
1
2

ERROR: No matching distribution found for mysqlclient==2.0.3

安装 mysqlclient 时,提示 mariadb_config 不存在,使用 yum 安装 mariadb-devel 即可。

# pip install mysqlclient==2.0.3
Looking in indexes: http://mirrors.tencentyun.com/pypi/simple
Collecting mysqlclient==2.0.3
  Downloading http://mirrors.tencentyun.com/pypi/packages/3c/df/59cd2fa5e48d0804d213bdcb1acb4d08c403b61c7ff7ed4dd4a6a2deb3f7/mysqlclient-2.0.3.tar.gz (88 kB)
     |████████████████████████████████| 88 kB 9.1 MB/s
    ERROR: Command errored out with exit status 1:
     command: /usr/local/bin/python3.7 -c 'import io, os, sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-bs1gch7r/mysqlclient_a8ecbcb585d4474c8886784ffc1ba4cf/setup.py'"'"'; __file__='"'"'/tmp/pip-install-bs1gch7r/mysqlclient_a8ecbcb585d4474c8886784ffc1ba4cf/setup.py'"'"';f = getattr(tokenize, '"'"'open'"'"', open)(__file__) if os.path.exists(__file__) else io.StringIO('"'"'from setuptools import setup; setup()'"'"');code = f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base /tmp/pip-pip-egg-info-bwo0e6yy
         cwd: /tmp/pip-install-bs1gch7r/mysqlclient_a8ecbcb585d4474c8886784ffc1ba4cf/
    Complete output (15 lines):
    /bin/sh: mysql_config: 未找到命令
    /bin/sh: mariadb_config: 未找到命令
    /bin/sh: mysql_config: 未找到命令
    mysql_config --version
    mariadb_config --version
    mysql_config --libs
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-install-bs1gch7r/mysqlclient_a8ecbcb585d4474c8886784ffc1ba4cf/setup.py", line 15, in <module>
        metadata, options = get_config()
      File "/tmp/pip-install-bs1gch7r/mysqlclient_a8ecbcb585d4474c8886784ffc1ba4cf/setup_posix.py", line 70, in get_config
        libs = mysql_config("libs")
      File "/tmp/pip-install-bs1gch7r/mysqlclient_a8ecbcb585d4474c8886784ffc1ba4cf/setup_posix.py", line 31, in mysql_config
        raise OSError("{} not found".format(_mysql_config_path))
    OSError: mysql_config not found
    ----------------------------------------
WARNING: Discarding http://mirrors.tencentyun.com/pypi/packages/3c/df/59cd2fa5e48d0804d213bdcb1acb4d08c403b61c7ff7ed4dd4a6a2deb3f7/mysqlclient-2.0.3.tar.gz#sha256=f6ebea7c008f155baeefe16c56cd3ee6239f7a5a9ae42396c2f1860f08a7c432 (from http://mirrors.tencentyun.com/pypi/simple/mysqlclient/) (requires-python:>=3.5). Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
ERROR: Could not find a version that satisfies the requirement mysqlclient==2.0.3 (from versions: 1.3.0, 1.3.1, 1.3.2, 1.3.3, 1.3.4, 1.3.5, 1.3.6, 1.3.7, 1.3.8, 1.3.9, 1.3.10, 1.3.11rc1, 1.3.11, 1.3.12, 1.3.13, 1.3.14, 1.4.0rc1, 1.4.0rc2, 1.4.0rc3, 1.4.0, 1.4.1, 1.4.2, 1.4.2.post1, 1.4.3, 1.4.4, 1.4.5, 1.4.6, 2.0.0, 2.0.1, 2.0.2, 2.0.3)
ERROR: No matching distribution found for mysqlclient==2.0.3
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28

ModuleNotFoundError: No module named '_sqlite3'

因为 CentOS 中使用 yum 可以安装 Python 3.6 ,版本差异不大,拷贝一下即可。

cp /usr/lib64/python3.6/lib-dynload/_sqlite3.cpython-36m-x86_64-linux-gnu.so /usr/local/lib/python3.7/lib-dynload/cpython-37m-x86_64-linux-gnu.so
1

ModuleNotFoundError: No module named '_bz2'

同上。

# cp  /usr/lib64/python3.6/lib-dynload/_bz2.cpython-36m-x86_64-linux-gnu.so /usr/local/lib/python3.7/lib-dynload/_bz2.cpython-37m-x86_64-linux-gnu.so
1

Issue 1002 - The database returned an unexpected error.

hive 连接串中未填用户名,默认为 root,没有 /user 目录的权限。

DB engine Error
hive error: ('Query error', 'Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.check(FSPermissionChecker.java:400)\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:256)\n\tat org.apache.hadoop.hdfs.server.namenode.FSPermissionChecker.checkPermission(FSPermissionChecker.java:194)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1855)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkPermission(FSDirectory.java:1839)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirectory.checkAncestorAccess(FSDirectory.java:1798)\n\tat org.apache.hadoop.hdfs.server.namenode.FSDirMkdirOp.mkdirs(FSDirMkdirOp.java:61)\n\tat org.apache.hadoop.hdfs.server.namenode.FSNamesystem.mkdirs(FSNamesystem.java:3101)\n\tat org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.mkdirs(NameNodeRpcServer.java:1123)\n\tat org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.mkdirs(ClientNamenodeProtocolServerSideTranslatorPB.java:696)\n\tat org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)\n\tat org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:523)\n\tat org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:869)\n\tat org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:815)\n\tat java.security.AccessController.doPrivileged(Native Method)\n\tat javax.security.auth.Subject.doAs(Subject.java:422)\n\tat org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1875)\n\tat org.apache.hadoop.ipc.Server$Handler.run(Server.java:2675)\n')

This may be triggered by:
Issue 1002 - The database returned an unexpected error. 
1
2
3
4
5

hive: Error while fetching schema list

在 SQL Lab 中查询 hive 的表时,有时候会遇到如下报错,暂时还没找到原因,可能是 pyhive 库的版本兼容性,先记录下,后面找到原因再更新文章。

INFO: - - [16/May/2021 16:06:07] "GET /api/v1/database/3/schemas/?q=(force:!f) HTTP/1.1" 500 -
Failed to fetch database function names with error: type object 'TCLIService' has no attribute 'Client'
ERROR:superset.models.core:Failed to fetch database function names with error: type object 'TCLIService' has no attribute 'Client'
ERROR:root:type object 'TCLIService' has no attribute 'Client'
1
2
3
4

Reference