Our team is building an on-demand food delivery in Thailand, and we’re facing concurrency issues all the time. The most junior developer I talked to didn’t seem aware of the problem, so we trained them. One day, our team decided to craft a new interview exercise, a page view counter, which involves some extends of the concurrency issue. We found that there are very few engineers who can write the system correctly. I came here to write about it to raise some awareness in this area.
我们的团队正在泰国建立按需交付的食品,而且我们一直都在面对并发问题。 我与之交谈的最初级的开发人员似乎没有意识到问题所在,因此我们对其进行了培训。 有一天,我们的团队决定进行一项新的采访练习,即页面浏览计数器,其中涉及并发问题的一些扩展。 我们发现很少有工程师能正确编写系统。 我来这里是为了写这本书,以提高对此领域的认识。
I’ll demonstrate the issue using Go as an example for the sake of simplicity. So, here are the requirements of the page view counter exercise:
为了简单起见,我将使用Go作为示例来演示该问题。 因此,这是页面视图计数器练习的要求:
- You build a very simple HTTP server, and you can use any web framework. 您构建了一个非常简单的HTTP服务器,并且可以使用任何Web框架。
- There are only 2 endpoints: the home page and stats page. 只有2个端点:主页和统计信息页面。
- You can store the stats on the memory so it can be reset to zero once the server is restarted. 您可以将统计信息存储在内存中,以便在服务器重启后将其重置为零。
- There are many users visiting your home page. 有许多用户访问您的主页。
I’ll use Gin as a framework, and here’s a code that satisfies the top 3 requirements.
我将使用Gin作为框架,这是满足前3个要求的代码。
package main
import (
"fmt"
"github.com/gin-gonic/gin"
)
func main() {
counter := 0
r := gin.Default()
r.GET("/", func(c *gin.Context) {
counter++
c.String(200, "Hello world")
})
r.GET("/stats", func(c *gin.Context) {
c.String(200, fmt.Sprintf("Number of page view: %d", counter))
})
r.Run()
}
Start server using the command below and go check at http://localhost:8080/ and http://localhost:8080/stats
使用以下命令启动服务器,然后检查http:// localhost:8080 /和http:// localhost:8080 / stats
go run main.go
It seems like nothing wrong with this code, but only yourself uses it when you do testing. By the way, the production system, there are many people using it at the same time.
这段代码似乎没有错,但是只有您自己在进行测试时才使用它。 顺便说一下,在生产系统中,有很多人同时使用它。
Let simulate by sending some load to the server. Please install Apache Bench on your machine and send load using this command.
通过向服务器发送一些负载来进行仿真。 请在您的机器上安装Apache Bench并使用此命令发送负载。
ab -n 1000 -c 20 http://localhost:8080/
The command above will send a total of 1,000 requests to the home page from 20 concurrent users (which means there are 20 users open your home page at the same time).
上面的命令将总共20个并发用户发送1000个请求到主页(这意味着有20个用户同时打开您的主页)。
Run it and check about the stats. Do you get 1,000 page views? What number do you get? Less than 1,000? Yes, it should less than 1,000, but the exact number is not known.
运行它并查看统计信息。 您获得1,000次页面浏览量吗? 你得到什么号码? 少于1,000? 是的,它应该少于1,000,但是确切的数目是未知的。
引擎盖下 (Under the hood)
It turns out that counter++
is not “atomic” nor “synchronized” so that multiple CPU cores can read and write the same value to and from main memory simultaneously.
事实证明, counter++
不是“原子的”或“同步的”,因此多个CPU内核可以同时在主内存中读取和写入相同的值。
Basically, counter++
will do 3 major steps:
基本上, counter++
将执行3个主要步骤:
Loads value of
counter
from main memory to CPU register将
counter
值从主存储器加载到CPU寄存器- Increments value by 1 on CPU在CPU上将值递增1
Copy value from CPU back to
counter
on main memory将值从CPU复制回主存储器上的
counter
If there are 2 threads processing requests and running on different CPUs. You can see a timeline of each CPU instruction executed as below:
如果有2个线程处理请求并在不同的CPU上运行。 您可以看到执行的每个CPU指令的时间轴如下:
|
|
LOAD "counter" |
#AX=0 |
| LOAD "counter"
| #AX=0
|
INC AX |
#AX=1 |
| INC AX
| #AX=1
|
MOV AX to "counter" |
#counter=1 |
| MOV AX to "counter"
| #counter=1
|
|
\ | /
\|/
-
CPU 1 CPU 2
At the end, both CPUs just update counter
value to be 1 instead of 2, which cause the stats counter to be less than the real hit count. The more concurrent, the more chance of incorrect counting.
最后,两个CPU都将counter
值更新为1,而不是2,这导致stats计数器小于实际命中计数。 并发越多,错误计数的机会就越大。
Well, how can we fix that?
好吧,我们该如何解决?
解 (Solution)
Most programming languages come with a synchronization primitive and atomic data type. In Go there are sync
and sync/atomic
package. In this case, sync/atomic
gives you better performance, but I will show you both solutions.
大多数编程语言都带有同步原语和原子数据类型。 在Go中有sync
和sync/atomic
包。 在这种情况下, sync/atomic
会为您提供更好的性能,但是我将向您展示这两种解决方案。
同步/原子 (sync/atomic)
atomic
provides functions to read/write primitive variable atomically, which prevents the issue that multiple processes write the same value to main memory.
atomic
提供了以atomic
方式读取/写入基本变量的函数,从而避免了多个进程将相同的值写入主内存的问题。
You need to change counter
variable’s type to int32
, which is supported by sync/atomic
and then use AddInt32(*int32)
and LoadInt32(*int32)
to modify and read the variable respectively.
您需要将counter
变量的类型更改为sync/atomic
支持的int32
,然后分别使用AddInt32(*int32)
和LoadInt32(*int32)
来修改和读取变量。
package main
import (
"fmt"
"sync/atomic"
"github.com/gin-gonic/gin"
)
func main() {
var counter int32 = 0
r := gin.Default()
r.GET("/", func(c *gin.Context) {
atomic.AddInt32(&counter, 1)
c.String(200, "Hello world")
})
r.GET("/stats", func(c *gin.Context) {
c.String(200, fmt.Sprintf("Number of page view: %d", atomic.LoadInt32(&counter)))
})
r.Run()
}
从同步包使用Mutex(Using Mutex from sync package)
You declare an unlocked mutex and call mutex.Lock()
before modifying the variable. Make sure that you don’t forget to release the lock by using defer
along with mutex.Unlock()
so that once the first request has been processed, another request can be processed next.
您声明一个解锁的互斥锁,然后在修改变量之前调用mutex.Lock()
。 确保不要忘记通过将defer
与mutex.Unlock()
一起使用来释放锁定,以便在处理完第一个请求后,即可再处理另一个请求。
package main
import (
"fmt"
"sync"
"github.com/gin-gonic/gin"
)
func main() {
counter := 0
var m sync.Mutex
r := gin.Default()
r.GET("/", func(c *gin.Context) {
m.Lock()
defer m.Unlock()
counter++
c.String(200, "Hello world")
})
r.GET("/stats", func(c *gin.Context) {
c.String(200, fmt.Sprintf("Number of page view: %d", counter))
})
r.Run()
}
Page view counter is a simple example to give you some senses of this issue. Every variables and data types, used by multiple threads, have a potential to face the issue such as when you update data on map, modify a slice/array, update data on DB. A special data type that is “Thread-safe” can be use to guarantee that there is no concurrency issue.
页面浏览计数器是一个简单的示例,可以使您对该问题有所了解。 多个线程使用的每个变量和数据类型都有可能面临这样的问题,例如,当您更新地图上的数据,修改切片/数组,更新数据库上的数据时。 可以使用“线程安全”的特殊数据类型来确保不存在并发问题。
Concurrency issue is quite hard to debug on your local machine because it requires some load send to the server with some chance involve, but if you think about the problem early in the process of development, it will help you prevent the issue so you can focus on making an impact on the business instead.
并发问题很难在本地计算机上调试,因为它需要一些负载发送到服务器,并且有一定的机会,但是如果您在开发过程的早期就考虑到了问题,它将帮助您预防问题,从而使您可以集中精力而是对业务产生影响。
I hope you get some ideas. Keep coding!
希望您能有所想法。 继续编码!
翻译自: https://medium.com/@tanapoln/concurrency-issue-a-silence-killer-of-your-program-45e4f97ae7d7