并发发布程序的沉默杀手

news/2024/7/20 1:48:48 标签: python, java, 小程序, c++

Our team is building an on-demand food delivery in Thailand, and we’re facing concurrency issues all the time. The most junior developer I talked to didn’t seem aware of the problem, so we trained them. One day, our team decided to craft a new interview exercise, a page view counter, which involves some extends of the concurrency issue. We found that there are very few engineers who can write the system correctly. I came here to write about it to raise some awareness in this area.

我们的团队正在泰国建立按需交付的食品,而且我们一直都在面对并发问题。 我与之交谈的最初级的开发人员似乎没有意识到问题所在,因此我们对其进行了培训。 有一天,我们的团队决定进行一项新的采访练习,即页面浏览计数器,其中涉及并发问题的一些扩展。 我们发现很少有工程师能正确编写系统。 我来这里是为了写这本书,以提高对此领域的认识。

I’ll demonstrate the issue using Go as an example for the sake of simplicity. So, here are the requirements of the page view counter exercise:

为了简单起见,我将使用Go作为示例来演示该问题。 因此,这是页面视图计数器练习的要求:

  • You build a very simple HTTP server, and you can use any web framework.

    您构建了一个非常简单的HTTP服务器,并且可以使用任何Web框架。
  • There are only 2 endpoints: the home page and stats page.

    只有2个端点:主页和统计信息页面。
  • You can store the stats on the memory so it can be reset to zero once the server is restarted.

    您可以将统计信息存储在内存中,以便在服务器重启后将其重置为零。
  • There are many users visiting your home page.

    有许多用户访问您的主页。

I’ll use Gin as a framework, and here’s a code that satisfies the top 3 requirements.

我将使用Gin作为框架,这是满足前3个要求的代码。

package main


import (
	"fmt"


	"github.com/gin-gonic/gin"
)


func main() {
	counter := 0


	r := gin.Default()


	r.GET("/", func(c *gin.Context) {
		counter++
		c.String(200, "Hello world")
	})


	r.GET("/stats", func(c *gin.Context) {
		c.String(200, fmt.Sprintf("Number of page view: %d", counter))
	})


	r.Run()
}

Start server using the command below and go check at http://localhost:8080/ and http://localhost:8080/stats

使用以下命令启动服务器,然后检查http:// localhost:8080 /和http:// localhost:8080 / stats

go run main.go

It seems like nothing wrong with this code, but only yourself uses it when you do testing. By the way, the production system, there are many people using it at the same time.

这段代码似乎没有错,但是只有您自己在进行测试时才使用它。 顺便说一下,在生产系统中,有很多人同时使用它。

Let simulate by sending some load to the server. Please install Apache Bench on your machine and send load using this command.

通过向服务器发送一些负载来进行仿真。 请在您的机器上安装Apache Bench并使用此命令发送负载。

ab -n 1000 -c 20 http://localhost:8080/

The command above will send a total of 1,000 requests to the home page from 20 concurrent users (which means there are 20 users open your home page at the same time).

上面的命令将总共20个并发用户发送1000个请求到主页(这意味着有20个用户同时打开您的主页)。

Run it and check about the stats. Do you get 1,000 page views? What number do you get? Less than 1,000? Yes, it should less than 1,000, but the exact number is not known.

运行它并查看统计信息。 您获得1,000次页面浏览量吗? 你得到什么号码? 少于1,000? 是的,它应该少于1,000,但是确切的数目是未知的。

引擎盖下 (Under the hood)

It turns out that counter++ is not “atomic” nor “synchronized” so that multiple CPU cores can read and write the same value to and from main memory simultaneously.

事实证明, counter++不是“原子的”或“同步的”,因此多个CPU内核可以同时在主内存中读取和写入相同的值。

Basically, counter++ will do 3 major steps:

基本上, counter++将执行3个主要步骤:

  1. Loads value of counter from main memory to CPU register

    counter值从主存储器加载到CPU寄存器

  2. Increments value by 1 on CPU

    在CPU上将值递增1
  3. Copy value from CPU back to counter on main memory

    将值从CPU复制回主存储器上的counter

If there are 2 threads processing requests and running on different CPUs. You can see a timeline of each CPU instruction executed as below:

如果有2个线程处理请求并在不同的CPU上运行。 您可以看到执行的每个CPU指令的时间轴如下:

|                      
|
LOAD "counter" |
#AX=0 |
| LOAD "counter"
| #AX=0
|
INC AX |
#AX=1 |
| INC AX
| #AX=1
|
MOV AX to "counter" |
#counter=1 |
| MOV AX to "counter"
| #counter=1
|
|
\ | /
\|/
-
CPU 1 CPU 2

At the end, both CPUs just update counter value to be 1 instead of 2, which cause the stats counter to be less than the real hit count. The more concurrent, the more chance of incorrect counting.

最后,两个CPU都将counter值更新为1,而不是2,这导致stats计数器小于实际命中计数。 并发越多,错误计数的机会就越大。

Well, how can we fix that?

好吧,我们该如何解决?

(Solution)

Most programming languages come with a synchronization primitive and atomic data type. In Go there are sync and sync/atomic package. In this case, sync/atomic gives you better performance, but I will show you both solutions.

大多数编程语言都带有同步原语和原子数据类型。 在Go中有syncsync/atomic包。 在这种情况下, sync/atomic会为您提供更好的性能,但是我将向您展示这两种解决方案。

同步/原子 (sync/atomic)

atomic provides functions to read/write primitive variable atomically, which prevents the issue that multiple processes write the same value to main memory.

atomic提供了以atomic方式读取/写入基本变量的函数,从而避免了多个进程将相同的值写入主内存的问题。

You need to change counter variable’s type to int32, which is supported by sync/atomic and then use AddInt32(*int32) and LoadInt32(*int32) to modify and read the variable respectively.

您需要将counter变量的类型更改为sync/atomic支持的int32 ,然后分别使用AddInt32(*int32)LoadInt32(*int32)来修改和读取变量。

package main


import (
	"fmt"
	"sync/atomic"


	"github.com/gin-gonic/gin"
)


func main() {
	var counter int32 = 0


	r := gin.Default()


	r.GET("/", func(c *gin.Context) {
		atomic.AddInt32(&counter, 1)
		c.String(200, "Hello world")
	})


	r.GET("/stats", func(c *gin.Context) {
		c.String(200, fmt.Sprintf("Number of page view: %d", atomic.LoadInt32(&counter)))
	})


	r.Run()
}

从同步包使用Mutex(Using Mutex from sync package)

You declare an unlocked mutex and call mutex.Lock() before modifying the variable. Make sure that you don’t forget to release the lock by using defer along with mutex.Unlock() so that once the first request has been processed, another request can be processed next.

您声明一个解锁的互斥锁,然后在修改变量之前调用mutex.Lock() 。 确保不要忘记通过将defermutex.Unlock()一起使用来释放锁定,以便在处理完第一个请求后,即可再处理另一个请求。

package main


import (
	"fmt"
	"sync"


	"github.com/gin-gonic/gin"
)


func main() {
	counter := 0
	var m sync.Mutex


	r := gin.Default()


	r.GET("/", func(c *gin.Context) {
		m.Lock()
		defer m.Unlock()
		counter++
		c.String(200, "Hello world")
	})


	r.GET("/stats", func(c *gin.Context) {
		c.String(200, fmt.Sprintf("Number of page view: %d", counter))
	})


	r.Run()
}

Page view counter is a simple example to give you some senses of this issue. Every variables and data types, used by multiple threads, have a potential to face the issue such as when you update data on map, modify a slice/array, update data on DB. A special data type that is “Thread-safe” can be use to guarantee that there is no concurrency issue.

页面浏览计数器是一个简单的示例,可以使您对该问题有所了解。 多个线程使用的每个变量和数据类型都有可能面临这样的问题,例如,当您更新地图上的数据,修改切片/数组,更新数据库上的数据时。 可以使用“线程安全”的特殊数据类型来确保不存在并发问题。

Concurrency issue is quite hard to debug on your local machine because it requires some load send to the server with some chance involve, but if you think about the problem early in the process of development, it will help you prevent the issue so you can focus on making an impact on the business instead.

并发问题很难在本地计算机上调试,因为它需要一些负载发送到服务器,并且有一定的机会,但是如果您在开发过程的早期就考虑到了问题,它将帮助您预防问题,从而使您可以集中精力而是对业务产生影响。

I hope you get some ideas. Keep coding!

希望您能有所想法。 继续编码!

翻译自: https://medium.com/@tanapoln/concurrency-issue-a-silence-killer-of-your-program-45e4f97ae7d7


http://www.niftyadmin.cn/n/952830.html

相关文章

oracle sql枢轴和unpivot运算符解释

如何在Oracle SQL Server中将行旋转到列?(How do you Pivot rows to columns in Oracle SQL Server?) There are several methods to convert multiple rows to columns in SQL query.有几种方法可以将多行转换为SQL查询中的列。 You can easily transform the dat…

解析请求体_面试官问:GET 请求能传图片吗?

作者:沉末https://juejin.im/post/6860253625030017031前言曾经遇到的面试题,觉得挺有意思,来说下我的答案及思考过程。首先,我们要知道的是,图片一般有两种传输方式:base64 和 file对象。base64 图片图片的…

pytorch自定义新层demo_pytorch自定义层如何实现?超简单!

在pytorch中,给我们提供了大量的预定义层,直接就能拿来用,但是对于科研小能手来说,肯定有各种关于神经网络的奇思妙想,这时候可能预定义层根本没法满足自己的需求,所以,需要我们能简单方便的实现…

rails部署到服务器上_将Rails应用程序部署到数字海洋

rails部署到服务器上You probably have deployed your rails applications to Heroku, it is pretty simple and straight forward. But what about deploying your app to Digital Ocean, I found it a little bit more challenging and tricky.您可能已经将Rails应用程序部署…

undolog 是binlog_mysql日志:redo log、binlog、undo log 区别与作用

一、redo log重做日志作用:确保事务的持久性。防止在发生故障的时间点,尚有脏页未写入磁盘,在重启mysql服务的时候,根据redo log进行重做,从而达到事务的持久性这一特性。内容:物理格式的日志,记…

jenkins使用git_使用git jenkins kubernetes的ci cd管道

jenkins使用gitIn this article, I will demonstrate Continuous Integration & Continuous Deployment using Jenkins, Git, and Kubernetes. Also before begin lets see a little bit of overview of these three popular technologies.在本文中,我将演示使用…

ios 侧滑返回传值卡死_iOS 侧滑返回的那点事

前言对于iOS用户来说,右滑返回是一个比较常见的。那么对于一个开放者来说,怎么去实现?其中又有哪些坑呢?目前的侧滑效果有两种:1.边缘触发侧滑(苹果原生)2.全屏触发侧滑而实现目前有三种方式:1.系统自带(不…

实车采集的数据重建场景_一文详解高精地图:自动驾驶的必由之路丨曼孚科技...

依据SAE International对自动驾驶发展阶段的划分,目前自动驾驶基本处于L2与L3阶段之间,典型的应用场景包括高级巡航、自动跟车、自动转向、自动刹车、紧急刹停等。随着未来一段时间内,自动驾驶级别的不断提升,自动驾驶系统需要面对…