ARM/X86服务器的安卓市场(虚拟化、容器)

0. 背景

随着国产化进程的推进,相当的应用已经可在国产化服务器(ARM/X86)上运行,本文将使用容器以及虚拟化两种技术对ARM/X86服务器上运行高性能的Android桌面进行探索。

调研了一圈实现,业界性价比最高的还是用板卡。。但是初始研究成本高一些,决心做的话可以先买一些现成的深圳货,但是对于入门的厂商来说还是用arm服务器跑容器合适,毕竟是安卓。

1. X86服务器

1.1. 虚拟化

1.2. 容器

2. ARM服务器

2.1. 虚拟化

2.2. 容器

Docker-Android

Anbox(LXC)

Xdroid

3. 桌面协议

由于qemu的ARM模拟的VGA设备由于其天生架构问题,不能正常使用,因而暂时需要使用virtio-vga设备方可显示(https://www.linux-kvm.org/images/0/09/Qemu-gfx-2016.pdf)。

3.1. 带内协议

3.2. 带外协议

TensorFlow in NetLogo, Make Your Agent More Intelligent

0. Background

NetLogo is a very useful tools for ABM, and Python is also a handful language for building proof of concept.

In this post I will show you how to call python language in NetLogo. For more information please follow here.

1. NetLogo version

breed [data-points data-point]
breed [centroids centroid]

globals [
  any-centroids-moved?
]

to setup
  clear-all
  set-default-shape data-points "circle"
  set-default-shape centroids "x"
  generate-clusters
  reset-centroids
end

to generate-clusters
  let cluster-std-dev 20 - num-clusters
  let cluster-size num-data-points / num-clusters
  repeat num-clusters [
    let center-x random-xcor / 1.5
    let center-y random-ycor / 1.5
    create-data-points cluster-size [
      setxy center-x center-y
      set heading random 360
      fd abs random-normal 0 (cluster-std-dev / 2) ;; Divide by two because abs doubles the width
    ]
  ]
end

to reset-centroids
  set any-centroids-moved? true
  ask data-points [ set color grey ]

  let colors base-colors
  ask centroids [die]
  create-centroids num-centroids [
    move-to one-of data-points
    set size 5
    set color last colors + 1
    set colors butlast colors
  ]
  clear-all-plots
  reset-ticks
end

to go
  if not any-centroids-moved? [stop]
  set any-centroids-moved? false
  assign-clusters
  update-clusters
  tick
end

to assign-clusters
  ask data-points [set color [color] of closest-centroid - 2]
end

to update-clusters
  let movement-threshold 0.1
  ask centroids [
    let my-points data-points with [ shade-of? color [ color ] of myself ]
    if any? my-points [
      let new-xcor mean [ xcor ] of my-points
      let new-ycor mean [ ycor ] of my-points
      if distancexy new-xcor new-ycor > movement-threshold [
        set any-centroids-moved? true
      ]
      setxy new-xcor new-ycor
    ]
  ]
  update-plots
end

to-report closest-centroid
  report min-one-of centroids [ distance myself ]
end

to-report square-deviation
  report sum [ (distance myself) ^ 2 ] of data-points with [ closest-centroid = myself ]
end


; Copyright 2014 Uri Wilensky.
; See Info tab for full copyright and license.

 

2. TensorFlow version

TensorFlow version: 1.14

import numpy as np
import tensorflow as tf

num_points = 100
dimensions = 2
points = np.random.uniform(0, 1000, [num_points, dimensions])

def input_fn():
  return tf.compat.v1.train.limit_epochs(
      tf.convert_to_tensor(points, dtype=tf.float32), num_epochs=1)

num_clusters = 5
kmeans = tf.contrib.factorization.KMeansClustering(
    num_clusters=num_clusters, use_mini_batch=False)

# train
num_iterations = 10
previous_centers = None
for _ in xrange(num_iterations):
  kmeans.train(input_fn)
  cluster_centers = kmeans.cluster_centers()
  if previous_centers is not None:
    print 'delta:', cluster_centers - previous_centers
  previous_centers = cluster_centers
  print 'score:', kmeans.score(input_fn)
print 'cluster centers:', cluster_centers

# map the input points to their clusters
cluster_indices = list(kmeans.predict_cluster_index(input_fn))
for i, point in enumerate(points):
  cluster_index = cluster_indices[i]
  center = cluster_centers[cluster_index]
  print 'point:', point, 'is in cluster', cluster_index, 'centered at', center

 

3. NetLogo with Python Extension version

Here’s the snapshot.

And here’s the code.

extensions [ py ]

breed [data-points data-point]
breed [centroids centroid]

data-points-own [
  cluster-id
]

centroids-own [
  cluster-id
  centx
  centy
]

globals [
  testoutput
  centroid-list
]

to setup
  clear-all
  py:setup py:python
  (py:run
    "import tensorflow as tf"
    "import numpy as np"
  )
  set testoutput py:runresult "1"
  py:set "testoutput" testoutput
  set-default-shape data-points "circle"
  set-default-shape centroids "x"
  generate-clusters
  ; For python
  py:set "num_points" num-clusters
  py:set "points" [list xcor ycor] of data-points
  py:set "num_clusters" num-clusters
  py:set "num_round" num-round
  if debug = True
  [
    py:run "print('Points Cordinates:', points)" ;for debug
  ]
  ;reset-centroids
end

to generate-clusters
  set testoutput py:runresult "testoutput + 1"
  let cluster-std-dev cluster-range
  let cluster-size num-data-points / num-clusters
  repeat num-clusters [
    let center-x random-xcor / 1.5
    let center-y random-ycor / 1.5
    create-data-points cluster-size [
      setxy center-x center-y
      set heading random 360
      fd abs random-normal 0 (cluster-std-dev / 2)
    ]
  ]
end

to train
  ; Cluster center
  (py:run
    "points = np.asarray(points)"
    "def input_fn():"
    "    return tf.compat.v1.train.limit_epochs(tf.convert_to_tensor(points, dtype=tf.float32), num_epochs=1)"
    "kmeans = tf.contrib.factorization.KMeansClustering(num_clusters=num_clusters, use_mini_batch=False)"
    "num_iterations = num_round"
    "previous_centers = None"
    "for _ in range(num_iterations):"
    "    kmeans.train(input_fn)"
    "    cluster_centers = kmeans.cluster_centers()"
    "    if previous_centers is not None:"
    "        print(('delta:', cluster_centers - previous_centers))"
    "    previous_centers = cluster_centers"
    "    print(('score:', kmeans.score(input_fn)))"
    "print(('cluster centers:', cluster_centers))"
    "# map the input points to their clusters"
    "cluster_indices = list(kmeans.predict_cluster_index(input_fn))"
    "print('cluster indices: ', cluster_indices)"
    "for i, point in enumerate(points):"
    "    cluster_index = cluster_indices[i]"
    "    center = cluster_centers[cluster_index]"
    "    print(('point:', point, 'is in cluster', cluster_index, 'centered at', center))"
  )
end

to show-shape
  set centroid-list py:runresult "cluster_centers"
  foreach centroid-list [
    x -> create-centroids 1 [
      set xcor ( item 0 x )
      set ycor ( item 1 x )
      set size 3
      set color white
    ]
  ]
end

 

Ref:

[1] https://www.altoros.com/blog/using-k-means-clustering-in-tensorflow/

[2] https://www.tensorflow.org/api_docs/python/tf/contrib/factorization/KMeansClustering

构建基于ARM的超算集群

Background

Nowadays ARM-based servers are being applied to many scenarios, so we are going to port some workload to the minimal ARM SoC with NVIDIA GPU to evaluate the feasibility of production usage.

Preparation

Hardware: Legacy ARM SoC boards, e.g. Raspberry Pi, Beagle Board, NVIDIA Jetson(GPU) and standard ARM-based servers.

MPI: OpenMPICH and MPICH2.

Share Storage: ARM-based Glusterfs with pNFS.

Application: ANY

Build/Installation

Ref:

[1] https://developer.arm.com/solutions/hpc

PetaLinux/Pynq中使用QEMU(裸金属探索的路子)

Background

Before stepping into the virtio acceleration, we will build and run a Linux(PetaLinux) in FPGA.

Despite the poor performance of simulation in X86, it’s very convenient to create POC.

We choose MPSoC zcu106(PS) to test ARM, Zynq zc702(PS) to test ARM and Kintex UltraScale kcu705(PL) to test MicroBlaze softcore.

0. Essential tools

OS: Ubuntu 16.04 Desktop

Vivado: 2019.1

Xilinx SDK: 2019.1

PetalLinux SDK:  PetaLinux 2019.1

BSP: Evaluation Board BSP or generated from Vivado/Xilinx SDK

1. Build QEMU

QEMU is in the PetaLinux SDK directory, it’s not necessary to rebuild if you do not wanna modify anything.

git clone git://github.com/Xilinx/qemu.git
cd qemu
apt install libglib2.0-dev libgcrypt20-dev zlib1g-dev autoconf automake libtool bison flex libpixman-1-dev
git submodule update --init dtc
./configure --target-list="aarch64-softmmu,microblazeel-softmmu" --enable-fdt --disable-kvm --disable-xen
make -j4

 

2. PetaLinux Project

2.1. Create Project

petalinux-create -t project -n zcu106_arm_a53 --template zynqMP -s ../bsp/xilinx-zcu106-v2019.1-final.bsp
petalinux-config # Decide what to build

 

After the project created, pre-built images is in directory pre-built. You can test it via petalinux-boot

petalinux-boot --qemu --prebuilt 3 # Start ZCU106 virtual machine with prebuilt kernel

 

To accelerate the sstate mirror check process, you need download the sstate cache in Ref [5].

If you are using petalinux as root, you need modify sanity.conf like this:

cat /opt/petalinux/v2019.1/components/yocto/source/aarch64/layers/core/meta/conf/sanity.conf

# Sanity checks for common user misconfigurations
#
# See sanity.bbclass
#
# Expert users can confirm their sanity with "touch conf/sanity.conf"
BB_MIN_VERSION = "1.39.1"

SANITY_ABIFILE = "${TMPDIR}/abi_version"

SANITY_VERSION ?= "1"
LOCALCONF_VERSION  ?= "1"
LAYER_CONF_VERSION ?= "7"
SITE_CONF_VERSION  ?= "1"

#INHERIT += "sanity"

2.2. Build Kernel/Rootfs

 

3. Boot from QEMU

3.1. Debug

petalinux-boot --qemu --prebuilt 3

 

3.2. Run with self-compiled-QEMU

#!/bin/bash
/root/Desktop/qemu_zynq_devices/qemu/aarch64-softmmu/qemu-system-aarch64 \
-M arm-generic-fdt-7series -machine linux=on -smp 2 -m 1G \
-serial /dev/null -serial mon:stdio -display none \
-kernel images/linux/uImage -dtb images/linux/system.dtb

Now we’ll add sdcard and try to boot Ubuntu Core OS.

qemu-img create qemu_sd.img 2G
-M arm-generic-fdt-7series -machine linux=on -smp 2 -m 1G \
-serial /dev/null -serial mon:stdio -display none \
-kernel images/linux/uImage -dtb images/linux/system.dtb

 

4. Co-Sim

 

5. Boot from Evaluation Board SDCard

 

Ref:

[1] Build QEMU: https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842060/QEMU

[2] Zynq UltraScale+ QEMU: https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18841606/QEMU+-+Zynq+UltraScalePlus

[3] CoSim: https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842109/QEMU+SystemC+and+TLM+CoSimulation

[4] Xilinx Evaluation Board BSP files and Yocto local mirror: https://www.xilinx.com/support/download/index.html/content/xilinx/en/downloadNav/embedded-design-tools.html

Xilinx FPGA中应用P4网络数据面编程

0. Background

As we all know, P4 is about to be the future of OpenFlow 2.0, and to achieve an totally software defined network with programmable control plane and data plane.

1. Basic Knowledge

Since we’ve got P4 working on X86 server

In SDN controller ONOS.

 

2. Software/Hardware Plant

2.1. Pureway: P4 bitstream

2.2. Easyway: P4 on PetaLinux

3. Further Exploration

 

基于FPGA的物理机容错(FT)或者虚拟机容错设计

1. Background

Since the MicroBlaze provides the lockstep feature, so finally we can make a POC.

2. MicroBlaze FT Test

3. Server FT Test via MicroBlaze FT controller

4. QEMU with COLO

[1] Triple Modular Redundancy: https://www.xilinx.com/support/documentation/ip_documentation/tmr/v1_0/pg268-tmr.pdf

[2] Fault Tolerance Technique for Dynamically Reconfigurable Processor: https://pdfs.semanticscholar.org/2e98/b34ee8736eba7747b223c333de5739a6e601.pdf

[3] Xilinx Reduces Risk and Increases Efficiency for IEC61508 and ISO26262 Certified Safety Applications: https://www.xilinx.com/support/documentation/white_papers/wp461-functional-safety.pdf

[4] Spartan-6 FPGA Dual-Lockstep MicroBlaze Processor with Isolation Design Flow: https://www.xilinx.com/support/documentation/application_notes/xapp584-dual-lockstep-microblaze-IDF.pdf

[5] MicroBlaze Processor Reference Guide: https://www.xilinx.com/support/documentation/sw_manuals/xilinx2019_2/ug984-vivado-microblaze-ref.pdf

软件定义无线电(SDR)的设备、软件与应用指南

注意:本文内容仅限于实验室安全测试目的,禁止用于任何商业或违反当地法律法规的活动。

不管是较贵的Ettus还是入门的HackRF,抑或是最初级的RTL-SDR设备,都可以使用这篇教程中的绝大部分内容。

GRCon2019

https://www.gnuradio.org/grcon/grcon17/presentations/

https://www.gnuradio.org/grcon/grcon18/presentations/

https://www.gnuradio.org/grcon/grcon19/presentations/

https://github.com/mossmann/hackrf/wiki

https://www.hackrf.net/hackrf%E4%B8%8Egnuradio%E5%85%A5%E9%97%A8%E6%8C%87%E5%8D%97/

http://www.hackrf.net/faq/

https://wiki.myriadrf.org/LimeSDR

https://myriadrf.org/news/limesdr-made-simple-part-1/

雪碧0xroot的PPT

0. 环境准备

0.1 硬件

HackRF One Ettus B200 Ettus B210 BladeRF x40 LimeSDR LimeSDR mini
Frequency Range 1MHz-6GHz 70MHz-6GHz 70MHz-6GHz 300MHz-3.8GHz 100kHz-3.8GHz 100kHz-3.5GHz
RF Bandwidth 20MHz 61.44MHz 61.44MHz 40MHz 61.44MHz 30.72MHz
Sample Depth 8 bits 12 bits 12 bits 12 bits 12 bits 12 bits
Sample Rate 20MSPS 61.44MSPS 61.44MSPS 40MSPS 3.2MSPS 61.44MSPS
Transmitter Channels 1 1 2 1 2 1
Receivers 1 1 2 1 2 1
Duplex Half Full Full Full Full Full
Interface USB 2.0 USB 3.0 USB 3.0 USB 3.0 USB 3.0 USB 3.0
Programmable Logic Gates 64 macrocell CPLD 75k 100k 40k (115k avail) 40k 40k
Chipset MAX5864, MAX2837, RFFC5072 AD9364 AD9361 LMS6002M LMS7002M LMS7002M
Open Source Full Schematic, Firmware Schematic, Firmware Schematic, Firmware Full Full
Oscillator Precision +/-20ppm +/-2ppm +/-2ppm +/-1ppm  +/-1ppm initial

+/-4ppm stable

 +/-1ppm initial

+/-4ppm stable

Transmit Power -10dBm+ (15dBm @ 2.4GHz) 10dBm+ 10dBm+ 6dBm  0 to 10dBm 0 to 10dBm
Price 249€ euros VAT Exc. 991€ euros VAT Exc. 1658€ euros VAT Exc. 625€ euros VAT Exc. 332€ euros VAT Exc. 190€ euros VAT Exc.

0.2 驱动

0.3 软件

https://unicorn.360.com/blog/2017/04/12/LimeSDR-Getting-Started-Quickly/

https://oneguyoneblog.com/2016/09/15/sdrsharp-sdr-installing-windows-10/

下载SDR#后,重启按F7进入“禁用驱动签名”的运行模式,运行其中的install-rtlsdr.bat,替换第0个驱动

 

1. 接收信号

接收信号建议使用gqrx(MacOS、Linux),也可以用sdrsharp(Windows)。

https://www.rtl-sdr.com/big-list-rtl-sdr-supported-software/

$ port info gqrx

$ sudo port install gqrx

接收信号以后,你可以做的内容就比较多了,这里我会举一些比较有意思的例子。

1.1. 听广播/看电视

http://dalvikplanet.blogspot.com/2017/03/how-to-get-working-rtl2832u-r820t2-on.html

1.2. 接收气象云图

SDR软件

虚拟声卡

WXtoimg

gpredict/orbitron

https://www.rtl-sdr.com/rtl-sdr-tutorial-receiving-noaa-weather-satellite-images/

https://wischu.com/archives/528.html

 

GSM嗅探

https://www.cnblogs.com/k1two2/p/7000942.html

1.3. 接收GPS信息

https://swling.com/blog/2016/04/guest-post-using-the-hackrf-one-for-dgps-beacon-reception/

http://sdrgps.blogspot.com/2016/12/rtl-sdr-to-orbit-with-limesdr.html

1.4. 方向探测与被动雷达 Direction Finding and Passive Rador

https://www.rtl-sdr.com/ksdr/

1.5. 接收whatever you want LEGALLY

zigbee https://github.com/bastibl/gr-ieee802-15-4

https://github.com/BastilleResearch/scapy-radio/tree/master/gnuradio/gr-zigbee

2. 发送信号

2.1. 发送GPS信号

https://gist.github.com/gyaresu/343ae51ecbb70486e270

https://www.cnblogs.com/k1two2/p/5477291.html#4245780

https://gorgias.me/2017/07/30/HackRF-GPS-%E6%AC%BA%E9%AA%97/

https://github.com/osqzss/LimeGPS

2.2. 发送文字/音视频

Windows软件sdrangel

http://gareth.codes/hackrf-transmit/

https://github.com/fsphil/hacktv

http://www.irrational.net/2014/03/02/digital-atv/

http://www.hackrf.net/2014/06/hackrf_nbfm_tx_n_ctcss_squelch/

http://www.xn--hrdin-gra.se/blog/wp-content/uploads/2015/08/nbfm-tx.grc

https://gist.github.com/gyaresu/343ae51ecbb70486e270

https://nuclearrambo.com/wordpress/transferring-a-text-file-over-the-air-with-limesdr-mini/

https://github.com/martinmarinov/TempestSDR

3. 收发信号

GSM

https://www.evilsocket.net/2016/03/31/how-to-build-your-own-rogue-gsm-bts-for-fun-and-profit/

https://yatebts.com/open_source/

https://cn0xroot.com/2017/01/10/iot-mode-fuzzing-with-openbt/

LTE

https://yq.aliyun.com/articles/310348

https://www.cnblogs.com/k1two2/p/5666667.html

https://cn0xroot.com/2017/04/12/limesdr-getting-started-quickly/

 

OpenBTS+LimeSDR

Prepare:

Ubuntu Desktop 16.04 & LimeSDR 1.4s with LimeSuite 17.12(If not, OpenUSRP will fail.)

Install build-essential packages

#packages for soapysdr available at myriadrf PPA
sudo add-apt-repository -y ppa:myriadrf/drivers
sudo apt-get update

#install core library and build dependencies
sudo apt-get install -y git g++ cmake libsqlite3-dev

#install hardware support dependencies
sudo apt-get install -y libsoapysdr-dev libi2c-dev libusb-1.0-0-dev

#install graphics dependencies
sudo apt-get install -y libwxgtk3.0-dev freeglut3-dev

# Install for building uhd
sudo apt-get install libboost-all-dev libusb-1.0-0-dev python-mako doxygen python-docutils cmake build-essential

 

Change to UHD driver via uhd

$ cd ~                 # build and install limesuite
$ git clone https://github.com/myriadrf/LimeSuite.git
$ cd LimeSuite
$ mkdir builddir && cd builddir
$ cmake ../
$ make -j4
$ sudo make install
$ sudo ldconfig

$ cd ~ # build uhd, install, enable lime, rebuild
$ git clone https://github.com/EttusResearch/uhd.git
$ cd uhd/host/
$ mkdir build && cd build
$ cmake ../
$ make -j4
$ sudo make install
$ git clone https://github.com/jocover/OpenUSRP.git lib/ursp/OpenUSRP # DO NOT GO OUT
$ echo "INCLUDE_SUBDIRECTORY(OpenUSRP)">>lib/ursp/CMakeLists.txt
$ cmake ../
$ make -j4
$ sudo make install

 

Or, Change to UHD driver via SoapySDR

$ git clone https://github.com/pothosware/SoapySDR
$ cd SoapySDR
$ mkdir builddir;cd builddir; cmake ../
$ make -j4
$ sudo make install


$ git clone https://github.com/myriadrf/LimeSuite
$ cd LimeSuite

Build OpenBTS

4. SDR

软件定义无线电的内容即是可以灵活定义信号的处理过程,比如输出到TCP/UDP、文字音视频解码等。其中比较有名的有GNURadio、SoapySDR、Pothos(IDE)等(这里以GNURadio为例)。推荐在Linux中安装,当然也可在MacOS或者Windows中使用MacPorts进行安装,除此之外,也有PyBombs可选。

在MacOS中安装需要使用MacPorts、XQuartz,MacPorts安装内容如下。

$ port info gnuradio
$ sudo port install gnuradio+wxgui gr-osmosdr sox
$ port content gnuradio
$ sudo port install hackrf
$ sudo port install rtl-sdr
$ sudo port search gr- # if you wanna more modules in gnuradio, don't be shy

然后打开XQuartz,将/opt/local/bin/gnuradio-companion加入到X自定义应用程序菜单中(建议修改默认的X终端程序内容为xterm -e “source ~/.bash_profile;/bin/bash”)。

https://greatscottgadgets.com/sdr/

https://gist.github.com/machinaut/addf3438ef0c1a9cad38

https://osmocom.org/projects/gr-osmosdr/wiki/GrOsmoSDR#RTL-SDRSource

https://pypi.org/project/pyrtlsdr/#description

使用Xilinx/Intel FPGA加速虚拟化

UPDATE: 老王的笔记中也总结了一些关于devconf的内容。

This article is just a collection of ideas and posts.

Recently I was doing some performance-tunning of QEMU/KVM with kinds of pure-software ways. However, it ended with the existence of QEMU/KVM’s process load.

I just remembered that I used to do some co-sim work with National Instruments LabView about ten years ago…(during college life…fk…I’m still young…)

Trial No.1

TBD: Start QEMU instance(s) with passthrough-ed PCI(PCI-SRIOV) devices like ethernet controller or NVMe controller designed in PCI FPGA to offload the emulated works.

Trial No.2

TBD: Start QEMU instance with devices ported to passthrough-ed PCI FPGA as many as possible, i.e. usb controller, ethernet controller and etc..(Like a dock with kinds of devices…)

But I think it will be replaced by Trial No.1.

Trial No.3

TBD: Co-Sim.

Trial No.4

SoC FPGA, as DOM0.

Trial No.5

ASIC offload with slight host management.

References

[1] Running Xilinx in QEMU, https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842060/QEMU

[2] Building Xen Hypervisor with Petalinux 2019.1, https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/99188792/Building+Xen+Hypervisor+with+Petalinux+2019.1

[3] QEMU SystemC and TLM CoSimulation, https://xilinx-wiki.atlassian.net/wiki/spaces/A/pages/18842109/QEMU+SystemC+and+TLM+CoSimulation

向NVIDIA Jetson Nano中移植QEMU-KVM

因为Jetson如果作为边缘设备,那么我们需要进一步探索虚拟化在其上的可能性,从而使FT有更容易的路线可走,还有既然它的芯片是PCIE的,那理应可以透传。

参考:https://elinux.org/Jetson/Nano/Upstream

什么build rootfs、uboot之类的就不要了,那是后期嵌入式的活,我们在现有环境上build kernel即可。

1. 准备环境

访问链接https://developer.nvidia.com/embedded/downloads并下载源码包,包括Jetson自有以及L4T源码,也可以点击如下链接直接下载。

https://developer.nvidia.com/embedded/dlc/l4t-sources-32-1-jetson-nano

解压其中的kernel部分。

https://developer.nvidia.com/embedded/dlc/l4t-jetson-driver-package-32-1-jetson-nano

下载并解压后,得到如下文件系统。

2. 准备kernel

Host:
sudo su
sudo apt install nfs-kernel-server
sudo echo "/home/lofyer/Downloads *(rw,no_root_squash,no_subtree_check)" >> /etc/exports
sudo exportfs -avf

Jetson Nano:
sudo su
apt instlal libncurses-dev
mount root@192.168.0.59:/home/lofyer/Downloads /mnt
cd /mnt/
cp /proc/config.gz .
gunzip config.gz
mv config .config
make menuconfig # find and enable kvm, tegra hypervisor
make -j4; make -j4 modules_install
make -j4 Image
cp arch/arm64/boot/Image /boot/Image-kvm

然后编辑启动项,默认从新kernel启动。

# vi /boot/extlinux/extlinux.conf

TIMEOUT 10
DEFAULT secondary

MENU TITLE p3450-porg eMMC boot options

LABEL primary
      MENU LABEL primary kernel
      LINUX /boot/Image
      INITRD /boot/initrd
      APPEND ${cbootargs} rootfstype=ext4 root=/dev/mmcblk0p1 rw rootwait

LABEL secondary
      MENU LABEL kernel with kvm
      LINUX /boot/Image-kvm
      INITRD /boot/initrd
      APPEND ${cbootargs} rootfstype=ext4 root=/dev/mmcblk0p1 rw rootwait

3. 尝试qemu-kvm

自带的:

apt install qemu-kvm
kvm --help

自己编的:

git clone https://github.com/qemu/qemu
cd qemu
./configure --enable-kvm
make -j4

只能使用machine类型为arm进行加速。

4. 看看FT

算了,现在不看了,等下半年。

NVIDIA Jetson使用指导

本文将以NVIDIA Jetson为硬件基础,为你展现NVIDIA的力量,可以将其作为Jetson Nano的入门参考手册(教程)。

1. 入门篇

入门篇的有两章内容,来自NVIDIA JETSON包装盒自带的内容。

1.1. 准备环境

Ref.1: https://developer.nvidia.com/embedded/learn/get-started-jetson-nano-devkit

Ref.2: https://developer.nvidia.com/embedded/downloads

这里我使用的是一个Jetson Nano开发板,起初我以为随便一个USB数据线就能把它跑起来,但是我想多了。18W的快充头插上后可以正常启动,但是一旦运行比如WebGL测试的页面直接就关机了,因而我把小米音箱的电源适配器(5V2A)给它用了,还真能跑的动。

但是为了接下来的内容,我特意买了一个5V4A的5.5mm OD的电源适配器,要不然以我现在的环境很难保证能过了接下来的准备环境阶段。

要使用这个板子,你需要提前下载SDK的SD卡镜像,以及Host OS(PC,Ubuntu 18.04)所需的SDK Manager

下载完成之后,从Ref.1的链接中下载所需的镜像烧录软件或者别的烧录软件也行,将SD卡镜像烧录至TF卡中(这里我使用的是64G TF卡,A2)。烧录完毕将其插入核心板下面的卡槽中(有些不好找),见下图。

然后通过HDMI/DP线将之与显示器连接(启动阶段的分辨率需要修改,要不然小点的屏幕没法显示启动logo,这个不是重点以后再说),插上蓝牙USB键鼠,接入USB电源,你就可以看到信仰之NVIDIA logo。

这里你需要等它初始化完成,初始化工作包括扩展跟文件系统、解压乱七八糟的包之类的,总之等看到Ubuntu Desktop的安装配置界面后可以开始操作了。

进入桌面后的第一件事儿,可以先打开Chromium,访问WebGL的示例网站,随便开个demo试试会不会关机,如果关机那么恭喜你可以找个正经的USB电源了(Ref.1里有Adafruit的USB和DC电源适配器购买链接)。

当你的DC电源到了以后,先不要直接插上,因为需要设置一下跳线,如图所示。

看到Power Jack/USB Jumper没,由于我手里没有跳线帽,所以我直接短接了它们,如图所示(看我意念焊接术)。

然后再插上刚入手的DC 5V4A,即可空出你的USB并将之与Host PC相连了。

1.2. 准备SDK

这一节你可以先跳过去,等跑完下面的小节后再看,因为这部分并不影响接下来的操作。

准备SDK的内容主要包括:下载安装Host PC、开发板所需CUDA、OpenCV之类的,需要开发板的USB连接到Host PC上作数据连接用(我没有尝试过那个USB口既作电源又作数据传输)。

这里我并不打算过多介绍,只要按照引导进行操作即可。

1.3. Hello AI World

这个示例为你充分避开了各种依赖库的复杂安装步骤以及非常多的专业术语,对新手较为友好,但是我会仍会将其以链接形式展现,在最后章节的连接中。

第一步,从Hello World开始,你仍然需要最基础的工具。

$ sudo apt install git make cmake
$ git clone https://github.com/dusty-nv/jetson-inference
$ cd jetson-inference
$ git submodule update --init
$ mkdir build
$ cd build
$ cmake ../
$ make -j4
$ sudo make install
$ cd aarch64/bin

这一顿操作后,你会拥有个Hello AI World的全部成果。但是,what the hell an I doing?

来,对于一些Linux不熟悉的同学来说只要知道这里的cmake与make是编译源码的指令就行,cmake用来生成Makefile,make会根据Makefile里定义的动作调用gcc开始编译。

然后让我们看第一个例子,使用ImageNet的图片素材来训练我们的“机器人”让它能够识别各种物体,其中你会看到当前目录下有两个imagenet开头的文件,让我们从imagenet-console开始。

在图形界面上打开终端后,执行如下命令。

$ ./imagenet-console orange_0.jpg output_0.jpg
$ nautilus .

然后经过机器人的推理以后,你会得到一张橘子、另一张还是橘子的图片,并且新橘子图片的左上角标识了机器人认为它有多大概率是橘子。

是不是有感觉了?OK,我们继续。

既然它可以看图片,那么它当然也可以看视频或者摄像头中的内容,那么接下来我们让它看到摄像头中的橘子试试。

这里我需要给Jetson Nano接一个USB摄像头,接入以后可以在终端键入cheese打开拍照应用程序看它是否工作。

然后终端中运行如下命令。

./imagenet-camera

Oops,segmentation fault了,如文档所说,默认的摄像头是板载CSI摄像头,所以这里需要修改代码让它使用后来插入的USB摄像头。

$ vi ../../../imagenet-camera/imagenet-camera.cpp

...
#include "imageNet.h"
#define DEFAULT_CAMERA 0 // -1 for onboard camera, or change to index of /dev/video V4L2 camera (>=0)
bool signal_recieved = false;
...

将DEFAULT_CAMERA修改为0以后,便会启用/dev/video0路径上的摄像头,然后重新编译。

$ cd ../../ # build
$ cmake ../
$ make -j4

然后进到bin目录下再运行一次imagenet-camera即可。

PS:不是所有的摄像头都叫罗技C920,由于摄像头原生编码的问题,可能会导致上述程序黑屏,那么我们需要是适当修改一些内容。笔者暂时跳过这里,等改好以后再看。关于兼容列表可以参考eLinux的链接

1.4. 写一个小程序

如果你跟着github的教程,那么应该到你自己写一段代码的时间了,直接粘贴吧。

// include imageNet header for image recognition
#include <jetson-inference/imageNet.h>
// // include loadImage header for loading images
#include <jetson-utils/loadImage.h>

int main( int argc, char** argv )
{
                // a command line argument containing the image filename is expected,
                //      // so make sure we have at least 2 args (the first arg is the program)
  if( argc < 2 )
  {
    printf("my-recognition:  expected image filename as argument\n");
    printf("example usage:   ./my-recognition my_image.jpg\n");
    return 0;
  }

// retrieve the image filename from the array of command line args
  const char* imgFilename = argv[1];
  float* imgCPU    = NULL;    // CPU pointer to floating-point RGBA image data
  float* imgCUDA   = NULL;    // GPU pointer to floating-point RGBA image data
  int    imgWidth  = 0;       // width of the image (in pixels)
  int    imgHeight = 0;       // height of the image (in pixels)

// load the image from disk as float4 RGBA (32 bits per channel, 128 bits per pixel)
  if( !loadImageRGBA(imgFilename, (float4**)&imgCPU, (float4**)&imgCUDA, &imgWidth, &imgHeight) )
  {
    printf("failed to load image '%s'\n", imgFilename);
    return 0;
  }
  imageNet* net = imageNet::Create(imageNet::GOOGLENET);

  if( !net )
  {
    printf("failed to load image recognition network\n");
    return 0;
  }
  float confidence = 0.0;
  const int classIndex = net->Classify(imgCUDA, imgWidth, imgHeight, &confidence);
  if( classIndex >= 0 )
  {
    // retrieve the name/description of the object class index
    const char* classDescription = net->GetClassDesc(classIndex);
    // print out the classification results
    printf("image is recognized as '%s' (class #%i) with %f%% confidence\n",
    classDescription, classIndex, confidence * 100.0f);
  }
  else
  {
    // if Classify() returned < 0, an error occurred
    printf("failed to classify image\n");
  }
  delete net;
  // this is the end of the example!
  return 0;
}
# require CMake 2.8 or greater
cmake_minimum_required(VERSION 2.8)

# declare my-recognition project
project(my-recognition)

# import jetson-inference and jetson-utils packages.
# note that if you didn't do "sudo make install"
# while building jetson-inference, this will error.
find_package(jetson-utils)
find_package(jetson-inference)

# CUDA and Qt4 are required
find_package(CUDA)
find_package(Qt4)

# setup Qt4 for build
include(${QT_USE_FILE})
add_definitions(${QT_DEFINITIONS})

# compile the my-recognition program
cuda_add_executable(my-recognition my-recognition.cpp)

# link my-recognition to jetson-inference library
target_link_libraries(my-recognition jetson-inference)

然后编译并运行。

$ cmake .
$ make
$ ./my.cpp polarbear.jpg

输出结果如下。

[cuda]  cudaAllocMapped 5089520 bytes, CPU 0x100c30000 GPU 0x100c30000

imageNet -- loading classification network model from:
         -- prototxt     networks/googlenet.prototxt
         -- model        networks/bvlc_googlenet.caffemodel
         -- class_labels networks/ilsvrc12_synset_words.txt
         -- input_blob   'data'
         -- output_blob  'prob'
         -- batch_size   2

[TRT]  TensorRT version 5.0.6
[TRT]  detected model format - caffe  (extension '.caffemodel')
[TRT]  desired precision specified for GPU: FASTEST
[TRT]  requested fasted precision for device GPU without providing valid calibrator, disabling INT8
[TRT]  native precisions detected for GPU:  FP32, FP16
[TRT]  selecting fastest native precision for GPU:  FP16
[TRT]  attempting to open engine cache file /usr/local/bin/networks/bvlc_googlenet.caffemodel.2.1.GPU.FP16.engine
[TRT]  loading network profile from engine cache... /usr/local/bin/networks/bvlc_googlenet.caffemodel.2.1.GPU.FP16.engine
[TRT]  device GPU, /usr/local/bin/networks/bvlc_googlenet.caffemodel loaded
[TRT]  device GPU, CUDA engine context initialized with 2 bindings
[TRT]  binding -- index   0
               -- name    'data'
               -- type    FP32
               -- in/out  INPUT
               -- # dims  3
               -- dim #0  3 (CHANNEL)
               -- dim #1  224 (SPATIAL)
               -- dim #2  224 (SPATIAL)
[TRT]  binding -- index   1
               -- name    'prob'
               -- type    FP32
               -- in/out  OUTPUT
               -- # dims  3
               -- dim #0  1000 (CHANNEL)
               -- dim #1  1 (SPATIAL)
               -- dim #2  1 (SPATIAL)
[TRT]  binding to input 0 data  binding index:  0
[TRT]  binding to input 0 data  dims (b=2 c=3 h=224 w=224) size=1204224
[cuda]  cudaAllocMapped 1204224 bytes, CPU 0x101310000 GPU 0x101310000
[TRT]  binding to output 0 prob  binding index:  1
[TRT]  binding to output 0 prob  dims (b=2 c=1000 h=1 w=1) size=8000
[cuda]  cudaAllocMapped 8000 bytes, CPU 0x101440000 GPU 0x101440000
device GPU, /usr/local/bin/networks/bvlc_googlenet.caffemodel initialized.
[TRT]  networks/bvlc_googlenet.caffemodel loaded
imageNet -- loaded 1000 class info entries
networks/bvlc_googlenet.caffemodel initialized.
class 0296 - 1.000000  (ice bear, polar bear, Ursus Maritimus, Thalarctos maritimus)
image is recognized as 'ice bear, polar bear, Ursus Maritimus, Thalarctos maritimus' (class #296) with 100.000000% confidence

简言之,程序使用ImageNet的图片配上GoogleNet的模型对你的图片进行推理,然后得出它认为这是北极熊的可能性。

因为这篇文章的目的是入门,也就是带进来以后看哪个方向就自己看,所以相关数学知识在这里已经忽略了,如果你能写出厉害的理论paper又做出很厉害的工程实现,那么大牛就请继续往下过,顺便留个言交个朋友让我膜拜一下。你也可以查看文末的链接进一步扩展阅读什么是GoogleNet,什么是CNN,然后撸一遍机器学习、深度学习、强化学习啥的,也可能一路懵懂复习到信号与系统、高等数学,你要书的话我这还卖,另外我的学习笔记可以参阅DataNote

2. 进阶篇

2.1. 重新训练模型

2.2. 作为推理节点

2.3. 深度学习实验

2.4. TensorFlow实验

安装TensorFlow

$ sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev
$ sudo apt-get install python3-pip
$ sudo pip3 install -U pip
$ sudo pip3 install -U numpy grpcio absl-py py-cpuinfo psutil portpicker six mock requests gast h5py astor termcolor protobuf keras-applications keras-preprocessing wrapt google-pasta
$ sudo pip3 install --pre --extra-index-url https://developer.download.nvidia.com/compute/redist/jp/v42 tensorflow-gpu==1.14.0+nv19.9

 

https://devtalk.nvidia.com/default/topic/1048776/official-tensorflow-for-jetson-nano-/

https://www.tensorflow.org/tutorials

with jupyter

2.5. TensorRT实验

3. 行业方案篇

太阳底下没有新东西,我发现把之前的笔记稍微整理点可以新开一个目录出一个系列,那么,这里我就直接写关键字吧,以后说不定又冒出什么新东西了呢。

在继续之前,我们需要抓住一样内容,即凡是人类自己通过观察、模仿、学习可以获得的重复能力,机器/深度/强化等内容理论来说都可以帮你实现,即使是创造力。

然后我们再讲方案,即在这个小小的盒子,它有多少能力,能干啥。

3.1. 边缘计算

工业现场

计算转移

CDN

军用头戴

3.2. 云游戏

4K/8K/HDR/60FPS家用主机、服务端

无主机头戴

3.3. 教学设备

甭管便宜的贵的只要带卡都能当教学设备,不信你看研究生论文有多少CUDA相关。

3.4. 残障辅助

手语翻译(https://github.com/EvilPort2/Sign-Language),可按照中国残联手语进行训练(http://www.cdpf.org.cn/special/zgsy/node_305701.htm)。

apt install python3-keras

辅助视觉(物体/人脸识别后转语音)

参考:

文献

[1] NVIDIA AI Two Days Demo: https://developer.nvidia.com/embedded/twodaystoademo

[2] CNN Architecture: https://medium.com/@sidereal/cnns-architectures-lenet-alexnet-vgg-googlenet-resnet-and-more-666091488df5

[3] eLinux: https://www.elinux.org/Jetson

术语

[1] ImageNet: http://image-net.org/download

[2] TensorRT: https://developer.nvidia.com/tensorrt

[3] DeepStream: https://developer.nvidia.com/deepstream-sdk

[4] cuDNN

[5] PyTorch

[6] NVDLA