Docker Compose In Practice

Continue to advance to Kubernetes, the last article Docker Swarm In Action, After understanding Swarm, it is necessary to get familiar with Docker Compose. Docker Swarm forms a cluster of Docker hosts. As long as the Manager node is notified when the service is deployed, it will automatically find the corresponding node to run containers. Compose is another concept entirely. It organizes multiple associated containers into a whole for deployment, such as a load balance container, multiple web containers, and a Redis cache container to make up a whole bundle.

Revealed by its name, Compose, the concept of container orchestration was officially established. Later, Kubernetes, which we will learn, it's a tool for higher-level organization, operation and management of containers. Because Compose organizes the containers, it can start multiple associated containers with one command, instead of starting one container separately.

Regarding the installation of Docker Compose, Docker Desktop under Mac OS X comes with docker-compose; since Docker Compose is written in Python, we can use  pip install docker-compose  to install it, and use its commands after installation docker-compose. 阅读全文 >>

Docker Swarm In Action

Before officially entering into Kubernetes, I hope to know something about the earlier Docker Swarm, although it is basically no long used currently. Swarm provides a cluster of Docker hosts. There is at least one Manager (or more) in the cluster, plus 0 or more Workers. The main idea is that a Docker service (container) is executed by the Swarm cluster manager, and it will find the corresponding host in the cluster to execute the container. Both Manager and Worker can to run Docker containers, but only Manager can execute management commands.

The following two pictures shows the difference between running Docker containers on a host and a Swarm cluster host.

Run all containers in one host   => Run container together

阅读全文 >>

Introduce Vagrant and common usages

Many of my demos about Kafka, Docker, Python, Kubernates, and etc. are made with Vagrant virtual machines. It is time to write a blog for some frequently used Vagrant commands. Vagrant is a member of the HashiCorp family. Others of HashiCorp's famous tools include  Terraform , Consul ,  Vault , Boundary , Packer , Nomad and Waypoint . 

Speaking of Vagrant, I have to mention the similar Docker, in fact, they are quite different, while they both give people the external feeling that they are command line control Linux. Vagrant is essentially a virtual machine shell, allowing us to use Vagrant commands to interact with the virtual machine more conveniently, instead of switching back and forth between the host machine and the virtual machine, it is more convenient to manage multiple virtual machines in one single terminal; while Docker is a container, The essence of a container is a process on the host machine, but it is isolated from the file system, process, network, etc. of the process with a namespace, making the container process look like a virtual OS.

Vagrant is a tool for the development environment, and Docker is a tool for the deployment environment; Vagrant operates a standard Linux or Windows operating system, and the Docker is very critical on image size. Docker image is usually a trimmed system, just has necessary command to run our service. Since Vagrant corresponds to a virtual machine, the operation status with Vagrant and the installed software will be retained after Vagrant virtual machine shutdown, while the operations status in Docker are all for the current container (copy-on-write), which does not affect the corresponding image , Unless it is committed as a new image with docker commit.

Understand that Vagrant is just the shell of a virtual machine, so it requires different virtual machine implementations, such as VirtualBox, Hyper-V, VMware, etc., and we can use Vagrant to interact with Docker as well. Vagrant can support multiple Operation Systems. 阅读全文 >>

AWS Session Manager connect to EC2 instance

The most common way to manage a remote machine is SSH (Unix/Linux, Mac) or PowerShell/RDP (Windows), which requires the remote machine to open the corresponding access port and firewall, credentials or SSH Key. When selecting an EC2 instance on AWS console, we can click the "Connect" button, which provides three connection options:

  1. EC2 Instance Connect: Requires EC2 to be configured with SSH Key, sshd is started, ssh inbound port allowed by Security Group, ec2-instance-connect installed(sudo yum install ec2-instance-connect)
  2. Session Manager: This is what we are going to talk about next. sshd is not required(SSH key is not needed of cause). Security Group only requires outbound port 443. 
  3. SSH client: Client SSH to EC2 instance, start sshd, allow inbound ssh port 22 by Security Group, use SSH Key or username and password in AMI, or configure to login with domain account after joining the domain.

AWS Session Manager provides access to EC2 instances through a browser or AWS CLI, and even machines or virtual machines in the local datacenter (requires advanced-instances tier support) , and no longer depends on SSH.

Session Manager Overview

Session Manager determines who can or cannot be accessed by the IAM access policy. It can be forwarded through the local port, the operation log in the session can be recorded as an audit, and can configure to send a message to Amazon EventBridge or SQS when session open or closed. The session log encrypted by a KMS key. 阅读全文 >>

为 Java 注册 classpath: 协议用 URL 读取文件

 本文为 Java 注册 classpath 协议读取文件的目的就是要让下面的代码能工作起来

假设在 classpath 下有个文件 db.properties, 比如在 Maven 项目的 src/main/resources 目录中,或是在某个 jar 包的根位置。如果我们直接执行上面的代码将会得到异常

Exception in thread "main" java.net.MalformedURLException: unknown protocol: classpath
    at java.net.URL.<init>(URL.java:617)
    at java.net.URL.<init>(URL.java:507)

说是不认识的 classpath 协议。

前面代码是有实际用途的,比如说我们使用 XML 时就能支持远程协议

阅读全文 >>

Java 10 ~ 16 一路向前冲(新特性一箩筐)

Java 一路突突突, 版本 16 在 2021-03-16 都发布了, 而我们一直碍于 Java 9 的大改还在 Java 8 上原地踏步, 以往每当有新版本 JDK 发布后都是很快就验证,立马升级。Java SE versions history) 列出了所有 Java 的历史版本的发布日期。在今天(2021-05-04) 网站 Java SE Downloads 上直接提供下载的 Java SE 版本有以下三

  1. Java SE 16.0.1
  2. Java SE 11.0.11(LTS)
  3. Java SE 8u291

两个 LTS(长期支持) 版 8 和  11,外加一个目前最新的非  LTS 版本 16, 其他的版本都被归档到了 Java Archive. 查看一下 Java 支持的 roadmap, 几个 LTS 版本的服务支持年限

版本本      发布日       原定支持至      延期支持至
Java 8      2014/3       2022/3             2030/12
Java 11     2018/9      2023/9              2026/9
Java 17     2021/9      2026/9              2029/9

注意到 Java 8 将比 Java 11 和将来的 Java 17 生命力还顽强,一下就觉得这么久坚守在 Java 8 的阵地上不应觉得有什么好害羞的。眼看着下一个 LTS 版本的 Java 17 就要在今年 9 月份发布了,Java 11 看来是要错过了,等准备好和 Java 8 告别时要直接跳到 Java 17 了。 阅读全文 >>

探索 Apache, Tomcat, SpringBoot 对请求数据的解压缩

通常我们都会配置 Web 服务端对响应数据进行压缩,如用 Apache 的 mod_deflate 模块,或配置 Tomcat connector 启用压缩,又或者是在 Java Web 项目中加 Web Filter 来压缩特定的响应数据。这样客户端发送 HTTP 请求时在头中声明如 Accept-Encoding: gzip,服务端就可能会对响应数据进行压缩,同时带上 Content-Encoding: gzip 响应头。

有时候 HTTP Post 的数据太大同样会要求客户端在传输数据之前对请求数据进行压缩,本文主要关注服务端如何自动解压客户端发出的压缩数据。

先以 Apache2 为例,以 Ubuntu 20.04 为例,用命令 apt-get install apache2 安装 Apache 2.4.41, 它自动启用了 mod_deflate 模块。mod_deflate 模块的配置文件 /etc/apache2/modes-enabled/deflate.conf 内容如下

它表示只对以上特定的响应数据类型进行压缩,下面来测试下对 html 内容的压缩 阅读全文 >>

Python 转换 Apache Avro 数据为 Parquet 格式

前面尝试过用 Java 转换 Apache Avro 数据为 Parquet 格式,本文用 Python 来做同样的事情,并且加入 logicalType: date 类型的支持。本测试中的 Avro 数据也是由 Python 代码生成的。

重复一句 Avro 与 Parquet 的最粗略的区别:Avro 广泛的应用于数据的序列化,如 Kafka,它是基于行的格式,可被流式处理,而 Parquet 是列式存储格式的,适合于基于列的查询。

第一步,生成 Avro 数据文件 user.avro, 须先安装 fastavro

pip install fastavro

生成 user.avro 的代码

阅读全文 >>

Vagrant 简介与常用操作及配置

前方许多有关于 Kafka, Docker, Python 和 Kubernates 的文章都是在 Vagrant 虚拟机中做的 Demo,经常用到的一些 Vagrant 命令是时候有必要写篇日志记录下来。Vagrant 是 HashiCorp 家族中的一员,HashiCorp 旗下著名的工具还有  Terraform, ConsulVault, Boundary, Packer, NomadWaypoint

说起 Vagrant,不得不提起与之仿佛类似的 Docker,其实它们相差还是比较大的,只因它们给人的外在感觉都是命令行控制 Linux。Vagrant 实质是一个虚拟机的外挂,让我们更方便的用 Vagrant 命令与虚拟机交互,而不用在宿主机与虚拟机间来回切换,管理多个虚拟机就更得心应手了; 而 Docker 是一个容器,容器的本质是宿主机上的一个进程,只是用命名空间与该进程的文件系统,进程,网络等进行了隔离,使得该容器进程看似一个虚拟  OS。

Vagrant 是开发环境的部署工具, 而 Docker 是运行环境的部署工具; Vagrant 操作的是一个标准的 Linux 或  Windows 操作系统,而 Docker 的镜像考虑到发布服务的个头,通常是一个裁剪的系统,去除了服务器非必要的命令。既然 Vagrant 对应的是虚拟机,那么在 Vagrant 中的操作,安装的软件在 Vagrant 退出后都会保留下来,而 Docker 中操作的都是当前容器(copy-on-write),并不影响所对应的镜像, 除非用 docker commit 固化为新的镜像.

明白了 Vagrant 只是一个虚拟机的皮,那他在不同的硬件平台或操作系统下需要与不同的 Provider, 如 VirtualBox, Hyper-V, VMware 等配合工作,还能用 Vagrant 来操作 Docker。

有了 Vagrant, 从此不再需要下载不同操作系统的 ISO 安装镜像文件,耗时的逐步安装操作系统,也不用手工的下载别人安装好并导出的虚拟机文件,一切有点类似 Docker 一样从远程公共仓库中选择系统即可。

阅读全文 >>

使用 Java 转换 Apache Avro 为 Parquet 数据格式(依赖更新)

在上篇 使用 Java 转换 Apache Avro 为 Parquet 数据格式 实现把 Avro 数据转换为 Parquet 文件或内存字节数组,并支持 LogicalType。其中使用到了 hadoop-core 依赖,注意到它传递的依赖都非常老旧,到官方 Maven 仓库一看才发现还不是一般的老

长时间无人问津的项目,那一定有它的替代品。对啦,据说 hadoop-core 在 2009 年 7 月份更名为 hadoop-common 了,没找到官方说明,只看到 StackOverflow 的 Differences between Hadoop-coomon, Hadoop-core and Hadoop-client? 是这么说的。 应该是这么个说法,不然为何 hadoop-core 一直停留在  1.2.1 的版本,而且原来 hadoop-core 中的类在 hadoop-common 中可以找到,如类 org.apache.hadoop.fs.Path。不过在 hadoop-core-1.2.1 中的 fs/s3 包不见,这么重要的 s3 文件系统没了。 阅读全文 >>