Multi-threaded server programming model: how to use mutex and condition variable correctly

This article has a detailed interpretation of the common programming model of multi-threaded servers. The multi-threaded server in this article is a web application running on the Linux operating system. The typical single-threaded server programming model and the typical multi-threaded server threading model and inter-process communication and inter-thread communication are introduced.

Summarize a common two-threading model, summarizing the best practices of inter-process communication and thread synchronization, in order to develop multi-threaded programs in a simple and standardized way.

The "multi-threaded server" in this article refers to a proprietary web application running on a Linux operating system. The hardware platform is Intel x64 series multi-core CPU, single or dual SMP server (each machine has a total of four cores or eight cores, more than ten GB of memory), and the machines are connected by 100M or Gigabit Ethernet. This is probably the mainstream configuration of the current civilian PC server.

This article does not cover Windows systems, does not involve human-computer interface (regardless of command line or graphics); does not consider file reading and writing (except for writing to disk), does not consider database operations, does not consider web applications; does not consider low-end single core Host or embedded system, regardless of handheld devices, does not consider special network equipment, does not consider high-end "=32 nuclear Unix host; only considers TCP, does not consider UDP, does not consider other data transmission and reception except LAN Mode (such as serial parallel port, USB port, data acquisition board, real-time control, etc.).

With so many restrictions above, the basic functions of the "web application" I will talk about can be summarized as "receiving data, counting, and then sending out." In this simplified model, it seems that there is no need to use multithreading, and single threading should do a good job. The question "Why do I need to write a multi-threaded program" is easy to trigger a war of words, I put it on another blog to discuss. Allow me to assume the background of "multithreaded programming" first.

The term "server" is sometimes used to refer to a program, sometimes to a process, sometimes to hardware (whether virtual or real), and to be distinguished by context. In addition, this article does not consider the virtualization scenario. When I say "two processes are not on the same machine", it means that they are not logically running in the same operating system, although they may be physically located in the same machine. Virtual machine".

This article assumes that the reader already has the knowledge and experience of multithreaded programming. This is not an introductory tutorial.

1 process and thread

"Process / process" is one of the two most important concepts in the operation (the other is a file). Roughly speaking, a process is "a program that is running in memory." The process in this article refers to the thing that the Linux operating system generates through the fork() system call, or the product of CreateProcess() under Windows, not the lightweight process in Erlang.

Each process has its own independent address space ("address in the same process" or "not in the same process" is an important decision point for system function partitioning. The Erlang book likens "process" to "human", which I feel is very good and provides us with a framework for thinking.

Everyone has their own memory, and people communicate through conversations (messages). The conversation can be either an interview (same server) or a phone call (different servers, network communication). The difference between an interview and a telephone conversation is that the interview can immediately know whether the other person is dead or not (crash, SIGCHLD), and the telephone conversation can only judge whether the other party is still alive through a periodic heartbeat.

With these metaphors, you can take "role play" when designing a distributed system. Several people in the team each play a process. The role of the person is determined by the code of the process (management, pipe distribution, management, etc.) ). Everyone has their own memories, but they don't know the memories of others. If you want to know what others think, you can only talk. (I don't think about shared memory like IPC.) Then you can think about fault tolerance (in case someone suddenly dies), expand (newcomers add in), load balancing (move a job to b), retirement (a To fix the bug, don't give him a new job, wait for him to finish the matter and restart him.) It's very convenient.

The concept of "threading" was probably only popularized after 1993. It is only a decade or more, and it is not a Unix operating system with a 40-year history. The emergence of threads has added a lot of confusion to Unix. Many C library functions (strtok(), ctime()) are not thread-safe and need to be redefined; the semantics of signal are also greatly complicated. As far as I know, the first (civilian) operating systems that support multithreaded programming are Solaris 2.2 and Windows NT 3.1, both of which were released in 1993. Then in 1995, the POSIX threads standard was established.

Threads are characterized by shared address spaces so that data can be shared efficiently. Multiple processes on a single machine can efficiently share code segments (the operating system can be mapped to the same physical memory) but cannot share data. If multiple processes share a large amount of memory, it is equivalent to writing a multi-process program as multi-threaded.

The value of "multithreading", I think, is to better exploit the performance of symmetric multiprocessing (SMP). Multithreading didn't have much value before SMP. Alan Cox said that A computer is a state machine. Threads are for people who can't program state machines. (The computer is a state machine. Threads are prepared for those who cannot write state machine programs.) If there is only one execution Unit, a CPU, then as Alan Cox said, it is most efficient to write programs according to the state machine, which is exactly the programming model shown in the next section.

2 Typical single-threaded server programming model

UNP3e has a good summary of this (Chapter 6: IO Models, Chapter 30: Client/Server Design Paradigms), which is not repeated here. As far as I know, in the high-performance network programs, the most widely used one is probably the "non-blocking IO + IO multiplexing" model, the Reactor mode, which I know is:

l lighttpd, single-threaded server. (nginx is estimated to be similar, pending)

l libevent/libev

l ACE, Poco C++ libraries (QT pending)

l Java NIO (Selector/SelectableChannel), Apache Mina, Netty (Java)

l POE (Perl)

l Twisted (Python)

In contrast, boost::asio and Windows I/O Completion Ports implement the Proactor pattern, which seems to be narrower. Of course, ACE also implements the Proactor pattern, not the table.

In the "non-blocking IO + IO multiplexing" model, the basic structure of the program is an event loop: (the code is only for the indication, not fully considered)

While (!done)

{

Int timeout_ms = max(1000, getNextTimedCallback());

Int retval = ::poll(fds, nfds, timeout_ms);

If (retval 0) {

Handling errors

} else {

Handling expired timers

If (retval 》 0) {

Handling IO events

}

}

}

Of course, select(2)/poll(2) has a lot of shortcomings. It can be replaced with epoll under Linux. Other operating systems also have corresponding high-performance alternatives (search for c10k problem).

The advantages of the Reactor model are obvious, programming is simple, and efficiency is good. Not only can network read and write be used, connection establishment (connect/accept) or even DNS resolution can be done in a non-blocking manner to improve concurrency and throughput. This is a good choice for IO-intensive applications, as is Lighttpd, whose internal fdevent structure is exquisite and worth learning. (There is no suboptimal scheme for blocking IO here.)

Of course, it's not that easy to implement a good Reactor. I haven't used the open source library. It's not recommended here.

3 Thread model of a typical multi-threaded server

There are not many documents I can find in this regard, there are probably several kinds:

1. Create one thread per request, using a blocking IO operation. This was the recommended practice for Java networking programming before Java 1.4 introduced NIO. Unfortunately, the scalability is not good.

2. Use a thread pool and also use blocking IO operations. This is a measure of performance improvement compared to 1.

3. Use non-blocking IO + IO multiplexing. That is the way of Java NIO.

4. Advanced mode such as Leader/Follower

By default, I will use the third, non-blocking IO + one loop per thread mode.

http://pod.tst.eu/http://cvs.schmorp.de/libev/ev.pod#THREADS_AND_COROUTINES

One loop per thread

Under this model, each IO thread in the program has an event loop (or Reactor) that handles both read and write and timed events (regular or single), and the code framework is the same as in Section 2.

The benefits of this approach are:

l The number of threads is basically fixed. It can be set when the program starts, and it will not be created and destroyed frequently.

l It is very convenient to load the load between threads.

The event loop represents the main loop of the thread. To make which thread work, register the timer or IO channel (TCP connection) into the loop of that thread. Connections that require real-time performance can use a single thread; a connection with a large amount of data can monopolize one thread and distribute data processing tasks to several other threads; other secondary auxiliary connections can share one thread.

For non-trivial server programs, non-blocking IO + IO multiplexing is generally used. Each connection/acceptor is registered to a Reactor. There are multiple Reactors in the program, and each thread has at most one Reactor.

Multithreaded programs put a higher demand on Reactor, which is "thread safe." To allow a thread to stuff into the loop of another thread, the loop must be thread-safe.

Thread Pool

However, for threads that do not have IO light computing tasks, using event loop is a bit wasteful, and I will use a complementary solution, the task queue (TaskQueue) implemented with blocking queue:
Multi-threaded server programming model: how to use mutex and condition variable correctly

The above ten lines of code implement a simple fixed number of thread pools, which is roughly equivalent to some kind of "configuration" of Java 5's ThreadPoolExecutor. Of course, in real projects, this code should be packaged into a class instead of using global objects. Also note the point: the lifetime of the Foo object.

In addition to the task queue, you can also use the blocking_queue "T" to implement the consumer-producer queue of data, that is, T is the data type rather than the function object, and the consumer (s) of the queue takes the data for processing. This is more specific than the task queue.

Blocking_queue "T" is a tool for multi-threaded programming. Its implementation can refer to the (Array|Linked) BlockingQueue in Java 5 util.concurrent. Usually C++ can use deque as the underlying container. The code in Java 5 is very readable, and the basic structure of the code is consistent with the textbook (1 mutex, 2 condition variables), which is much more robust. If you don't want to implement it yourself, it's better to use a ready-made library. (I have not used the free library, I will not recommend it here. Interested students can try the concurrent_queue "T" in Intel Threading Building Blocks.)

induction

To sum up, my recommended multi-threaded server programming mode is: event loop per thread + thread pool.

l The event loop is used as a non-blocking IO and timer.

l thread pool is used for calculation, specifically task queue or consumer-producer queue.

Writing server programs in this way requires a good Reactor-based network library to support. I have only used in-house products, and I have no way to compare and recommend the common C++ network libraries on the market. Sorry.

The parameters of the loop and the size of the thread pool need to be set according to the application. The basic principle is "impedance matching", so that the CPU and IO can operate efficiently. The specific considerations will be discussed later.

There is no talk about the exit of the thread, leaving the next blog "multi-threaded programming anti-pattern" to explore.

In addition, there may be individual threads that perform special tasks, such as logging, which is basically invisible to the application, but should be included when allocating resources (CPU and IO) to avoid overestimating the system. capacity.

4 interprocess communication and inter-thread communication

There are countless ways to communicate between processes (IPC) under Linux. Light UNPv2 lists: pipe, FIFO, POSIX message queue, shared memory, signals, and so on, not to mention Sockets. There are also many synchronization primitives, mutex, condition variable, reader-writer lock, record locking, semaphore, and so on.

How to choose? According to my personal experience, it is not expensive, and careful selection of three or four things can fully meet my work needs, and I can use it very well, and it is not easy to make mistakes.

5 interprocess communication

Interprocess communication I prefer Sockets (mainly TCP, I have not used UDP, nor Unix domain protocol), its biggest advantage is: can be cross-host, scalability. Anyway, it is multi-process. If a machine has insufficient processing power, it can naturally be handled by multiple machines. Distribute the process to multiple machines on the same LAN, and change the host:port configuration to continue using the program. Conversely, the other IPCs listed above cannot cross-machine (such as shared memory is the most efficient, but no matter how efficient it is to efficiently share the memory of two machines), limiting scalability.

In programming, TCP sockets and pipe are both file descriptors for sending and receiving byte streams, all of which can be read/write/fcntl/select/poll. The difference is that TCP is bidirectional, pipe is one-way (Linux), two-file communication between processes must have two file descriptors, which is inconvenient; and the process must have a parent-child relationship to use pipe, which limits the pipe. use. Under the communication model of sending and receiving byte streams, there is no more natural IPC than sockets/TCP. Of course, pipe also has a classic application scenario, which is used to asynchronously wake up select (or equivalent poll/epoll) calls when writing Reactor/Selector (Sun JVM does this in Linux).

The TCP port is monopolized by a process and the operating system automatically reclaims (the listening port and the established TCP socket are file descriptors, and the operating system closes all file descriptors at the end of the process). This means that even if the program quits unexpectedly, it will not leave garbage to the system. After the program restarts, it can be recovered relatively easily without restarting the operating system (this risk exists with the cross-process mutex). There is also a benefit, since the port is exclusive, it can prevent the program from starting repeatedly (the latter process can not get the port, naturally it can not work), resulting in unexpected results.

The two processes communicate via TCP. If one crashes, the operating system closes the connection, so that another process can be perceived almost immediately and can fail over quickly. Of course, the heartbeat of the application layer is also indispensable. I will talk about the design of the heartbeat protocol when I talk about the date and time of the server.

A natural benefit of the TCP protocol compared to other IPCs is "recordable reproducible", and tcpdump/Wireshark is a good helper for resolving protocol/state disputes between two processes.

In addition, if the network library has the "connection retry" function, we can not start the process in the system in a specific order, any process can be restarted separately, which is of great significance for the development of a robust distributed system.

Using TCP such byte stream communication, there will be marshal / unmarshal overhead, which requires us to choose the appropriate message format, specifically wire format. This will be the subject of my next blog, and I currently recommend Google Protocol Buffers.

Some people may say that the specific problem is specifically analyzed. If the two processes are on the same machine, they use shared memory. Otherwise, TCP is used. For example, MS SQL Server supports both communication modes. I asked, is it worthwhile to increase the complexity of the code for such a performance boost? TCP is a byte stream protocol, which can only be read sequentially, with write buffer; shared memory is a message protocol, a process fills a block of memory for the b process to read, basically the "stop" mode. To put these two methods into one program, you need to build an abstraction layer that encapsulates two IPCs. This can lead to opacity and increase the complexity of the test, and in the event of a crash in one of the communications, the state reconcile will be more troublesome than the sockets. I don't take it for me. Besides, are you willing to let tens of thousands of purchased SQL Server and your program share machine resources? The database servers in the product are often independent high-configuration servers, and generally do not run other programs that occupy resources at the same time.

TCP itself is a data stream protocol. In addition to using it directly for communication, it is also possible to build an upper layer communication protocol such as RPC/REST/SOAP on top of this document. In addition, in addition to peer-to-peer communication, application-level broadcast protocols are also very useful, making it easy to build scalable and controllable distributed systems.

This article does not specifically talk about network programming in Reactor mode. In fact, there are a lot of notable places here, such as retry connecting with back off, organizing timer with priority queue, etc., for later analysis.

6 Thread synchronization

The four principles of thread synchronization are ranked by importance:

1. The first principle is to minimize the sharing of objects and reduce the need for synchronization. An object can be exposed without being exposed to other threads; if it is exposed, the immutable object is prioritized; it is not possible to expose the modifiable object and use synchronization measures to fully protect it.

2. Second is the use of advanced concurrent programming components, such as TaskQueue, Producer-Consumer Queue, CountDownLatch, etc.;

3. Finally, when you must use the underlying synchronization primitives, use only non-recursive mutex and condition variables, and occasionally use a read-write lock;

4. Don't write your own lock-free code, don't guess out "Which approach will perform better", such as spin lock vs. mutex.

The first two are easy to understand. Here we focus on Article 3: The use of the underlying synchronization primitives.

Mutex (mutex)

The mutex is probably the most used synchronization primitive. Roughly speaking, it protects the critical section. At most, only one thread can be active in the critical section at a time. (Note that I am talking about mutex in pthreads, not the heavyweight cross-process Mutex in Windows.) When using mutex alone, we mainly want to protect shared data. My personal principle is:

l Use the RAII method to encapsulate the four operations of creating, destroying, locking, and unlocking mutex.

l Use only non-recursive mutex (ie non-reentrant mutex).

l The lock() and unlock() functions are not called manually, and everything is responsible for the construction and destructor of the Guard object on the stack. The lifetime of the Guard object is exactly equal to the critical section (when the analysis object is destructed is a C++ programmer) Basic skills). This way we guarantee to lock and unlock in the same function, avoid locking in foo(), and then go to bar() to unlock.

l Each time you construct a Guard object, think about the locks already held along the way (on the call stack) to prevent deadlocks due to different locking sequences. Since the Guard object is an object on the stack, it is very convenient to look at the function call stack to analyze the lock.

The secondary principles are:

l Do not use cross-process mutex, inter-process communication only uses TCP sockets.

l Locked to unlock in the same thread, thread a can not go to unlock the mutex that thread b has locked. (RAII automatic guarantee)

l Don't forget to unlock. (RAII automatic guarantee)

l Do not repeat the unlock. (RAII automatic guarantee)

l If necessary, consider using PTHREAD_MUTEX_ERRORCHECK to troubleshoot

Encapsulating these operations with RAII is a common practice. This is almost a standard practice in C++. I will give specific code examples later. I believe everyone has written or used similar code. The synchronized statement in Java and the using statement in C# have a similar effect, which guarantees that the lock is valid for one scope and does not forget to unlock due to an exception.

Mutex is probably the simplest synchronization primitive. It is almost impossible to use the few principles above. I have never violated these principles myself, and problems in coding can be quickly recruited and fixed.

Run the question: non-recursive mutex

Talk about my personal thought of sticking to a non-recursive mutex.

Mutex is divided into recursive and non-recursive. This is called POSIX. The other names are reentrant and non-reentrant. There is no difference between these two mutex as inter-thread synchronization tools. The only difference is that the same thread can repeatedly lock the recursive mutex, but it cannot repeatedly lock the non-recursive mutex.

The preferred non-recursive mutex is definitely not for performance, but for design intent. The performance difference between non-recursive and recursive is actually not big, because using one counter is less, the former is a little faster. Locking the non-recursive mutex multiple times in the same thread will immediately lead to a deadlock. I think this is its advantage. It helps us think about the code's expectation of locks and find problems early (in the coding phase).

There is no doubt that recursive mutex is easier to use, because you don't have to worry about a thread locking itself yourself, I guess this is why Java and Windows provide recursive mutex by default. (The intrinsic lock that comes with the Java language is reentrant, its ReentrantLock is provided in the concurrent library, and the CRITICAL_SECTION of Windows is reentrant. It seems that they do not provide a lightweight non-recursive mutex.)

Because of its convenience, recursive mutex may hide some of the problems in the code. The typical situation is that you think you can modify the object by getting a lock. I didn't expect the outer code to get the lock and modify (or read) the same object. Specific examples:

Std::vector "Foo" foos;

MutexLock mutex;

Void post(const Foo& f)

{

MutexLockGuard lock(mutex);

Foos.push_back(f);

}

Void traverse()

{

MutexLockGuard lock(mutex);

For (auto it = foos.begin(); it != foos.end(); ++it) { // Use 0x new way

It-"doit();

}

}

Post() locks and then modifies the foos object; traverse() locks and then traverses the foos array. One day, Foo::doit() indirectly calls post() (which is logically wrong), so it will be dramatic:

1. Mutex is non-recursive, so it is deadlocked.

2. Mutex is recursive. Since push_back may (but not always) cause the vector iterator to fail, the program occasionally crashes.

At this time, the superiority of non-recursive can be demonstrated: the logic errors of the program are exposed. Deadlocks are easier to debug, and the call stacks of each thread are typed out ((gdb) thread apply all bt). As long as each function is not particularly long, it is easy to see how it died. (On the other hand, the function is not allowed to be written too long.) Or you can use PTHREAD_MUTEX_ERRORCHECK to find the error all at once (provided that MutexLock has the debug option.)

The procedure is going to die anyway, it is better to die meaningfully, so that the coroner’s life is better.

If a function can be called either with a lock or with no lock, it is split into two functions:

1. With the same name as the original function, the function is locked and the second function is called instead.

2. Add the suffix WithLockHold to the function name, and lock the original function body without locking.

like this:

Void post(const Foo& f)

{

MutexLockGuard lock(mutex);

postWithLockHold(f); // Don't worry about overhead, the compiler will automatically inline

}

// This function was introduced to reflect the intent of the code author, although push_back can usually be manually inlined

Void postWithLockHold(const Foo& f)

{

Foos.push_back(f);

}

There are two possible problems (thanks to Shuimu netizen ilovecpp): a) Misuse of the locked version, deadlock. b) The unlocked version is misused and the data is corrupted.

For a), copying the previous method can be easier to troubleshoot. For b), if pthreads provide isLocked(), it can be written as:

Void postWithLockHold(const Foo& f)

{

Assert(mutex.isLocked()); // Currently just a wish

// . . .

}

In addition, the conspicuous suffix of WithLockHold also makes the misuse in the program easy to expose.

C++ doesn't have annotations. It can't be marked with @GuardedBy for methods or fields like Java, and programmers need to be careful. Although the solution here can't solve all multi-threaded errors once and for all, it can be a bit of a help.

I haven't encountered the need to use recursive mutex. I think that in the future, you can use the wrapper to switch to non-recursive mutex, the code will only be clearer.

This article only talks about the correct use of mutex itself. In C++, multithreaded programming will encounter many other race conditions. Please refer to the article "When destructor encounters multithreading - thread-safe object callback in C++"

http://blog.csdn.net/Solstice/archive/2010/01/22/5238671.aspx . Please note that the class naming here is different from that article. I now think that MutexLock and MutexLockGuard are better names.

Performance footer: Linux pthreads mutex is implemented with futex, which does not have to be locked into the system every time it is locked and unlocked, which is very efficient. The CRITICAL_SECTION for Windows is similar.

Conditional variable

Condition variable As the name suggests, one or more threads wait for a Boolean expression to be true, that is, waiting for another thread to "wake up" it. The scientific name of the condition variable is called the monitor. The Java object's built-in wait(), notify(), notifyAll() are condition variables (they are easy to use). Conditional variables have only one way to use them correctly, for the wait() side:

1. Must be used with mutex, the read and write of this Boolean expression is protected by this mutex

2. Call wait() when mutex is locked.

3. Put the judgment boolean condition and wait() into the while loop

Write the code as:

MutexLock mutex;

Condition cond(mutex);

Std::deque "int" queue;

Int dequeue()

{

MutexLockGuard lock(mutex);

While (queue.empty()) { // must use a loop; must wait () after judgment

Cond.wait(); // This step will atomically unlock mutex and enter blocking, not deadlock with enqueue

}

Assert(!queue.empty());

Int top = queue.front();

Queue.pop_front();

Return top;

}

For the signal/broadcast side:

1. It is not necessary to call signal (theoretically) if the mutex is locked.

2. Generally modify the Boolean expression before the signal

3. Modify Boolean expressions usually with mutex protection (at least as a full memory barrier)

Write the code as:

Void enqueue(int x)

{

MutexLockGuard lock(mutex);

Queue.push_back(x);

Cond.notify();

}

The above dequeue/enqueue actually implements a simple unbounded BlockingQueue.

Conditional variables are very low-level synchronization primitives that are rarely used directly. They are generally used to implement high-level synchronization measures, such as BlockingQueue or CountDownLatch.

Read-write locks and other

Reader-Writer lock, read-write lock is an excellent abstraction that clearly distinguishes between read and write behavior. It should be noted that the reader lock is reentrant, and the writer lock is not reentrant (including the non-liftable reader lock). This is the main reason why I say it is "excellent".

Encounter concurrent read and write, if the conditions are right, I will use "shared_ptr to achieve thread-safe copy-on-write" http://blog.csdn.net/Solstice/archive/2008/11/22/3351751.aspx The way, without the use of read and write locks. Of course this is not absolute.

Semaphore, I have not encountered the need to use semaphores, there is no way to talk about personal experience.

To say a big deal, if the program needs to solve complex IPC problems such as "philosopher dining", I think we should first examine several designs, why there are such complicated resources to compete between threads (one thread must grab at the same time) To two resources, one resource can be contending by two threads)? Can you give the "want to eat" thing to a thread that assigns tableware to the philosophers, and then each philosopher waits for a simple condition variable, and when someone has notified him to eat? Philosophically, the solution in textbooks is equal rights. Each philosopher has his own thread and goes to chopsticks by himself. I would rather use a centralized method to use a thread to specifically manage the distribution of tableware, so that other philosophers can take a thread. No. Waiting at the entrance of the cafeteria. This does not lose much efficiency, but makes the program much simpler. Although Windows's WaitForMultipleObjects trivializes this problem, properly simulating WaitForMultipleObjects under Linux is not something that ordinary programmers should do.

Encapsulate MutexLock, MutexLockGuard, and Condition

This section lists the code for the previous MutexLock, MutexLockGuard, and Condition classes. The first two classes are not very difficult. The latter one is a bit interesting.

MutexLock encapsulates the critical section (Critical secion), which is a simple resource class that encapsulates the creation and destruction of a mutex using the RAII trick [CCS:13]. The critical section is CRITICAL_SECTION on Windows and is reentrant; under Linux it is pthread_mutex_t, which by default is not reentrant. MutexLock is generally a data member of another class.

MutexLockGuard encapsulates the entry and exit of critical sections, ie locking and unlocking. MutexLockGuard is generally an object on the stack that has a scope just equal to the critical region.

These two classes should be able to be dictated on paper, there is not much to explain:

#include 《pthread.h》

#include "boost/noncopyable.hpp"

Class MutexLock : boost::noncopyable

{

Public:

MutexLock() // To save layout, single-line functions are not properly indented

{ pthread_mutex_init(&mutex_, NULL); }

~MutexLock()

{ pthread_mutex_destroy(&mutex_); }

Void lock() // The program is generally not actively calling

{ pthread_mutex_lock(&mutex_); }

Void unlock() // The program is generally not actively calling

{ pthread_mutex_unlock(&mutex_); }

Pthread_mutex_t* getPthreadMutex() // Called only by Condition, it is forbidden to call itself

{ return &mutex_; }

Private:

Pthread_mutex_t mutex_;

};

Class MutexLockGuard : boost::noncopyable

{

Public:

Explicit MutexLockGuard(MutexLock& mutex) : mutex_(mutex)

{ mutex_.lock(); }

~MutexLockGuard()

{ mutex_.unlock(); }

Private:

MutexLock& mutex_;

};

#define MutexLockGuard(x) static_assert(false, “missing mutex guard var name”)

Note that the last line of the code defines a macro that prevents the following errors from appearing in the program:

Void doit()

{

MutexLockGuard(mutex); // Without a variable name, a temporary object is generated and destroyed immediately, without locking the critical section

// The correct way to write is MutexLockGuard lock(mutex);

// critical section

}

Here MutexLock does not provide the trylock() function, because I have not used it, I can't think of the time when the program needs to "try to lock a lock", maybe the code I wrote is too simple.

I have seen someone write MutexLockGuard as a template. I didn't do this because its template type parameter is only possible with MutexLock. There is no need to add flexibility at will, so I artificially instantiate the template. In addition, a more radical approach is to put the lock/unlock in the private area and then set the Guard to the friend of the MutexLock. I think the programmer can be told in the comments, and the code review before the check-in is also easy to find. Misuse (grep getPthreadMutex).

This code does not reach industrial strength: a) Mutex is created as a PTHREAD_MUTEX_DEFAULT type instead of the PTHREAD_MUTEX_NORMAL type we envision (actually the two are likely to be equivalent), and it is rigorous to use mutexattr to display the type of the specified mutex. b) The return value is not checked. Here you can't use assert to check the return value because assert is an empty statement in the release build. The significance of checking the return value is to prevent resource shortages such as ENOMEM, which is generally only possible in highly loaded product programs. In the event of such an error, the program must immediately clear the scene and voluntarily withdraw, otherwise it will inexplicably collapse, causing difficulties after the investigation. Here we need a non-debug assert, maybe google-glog's CHECK() is a good idea.

The above two improvements are reserved for practice.

The implementation of the Condition class is a bit interesting.

Pthreads condition variable allows you to specify a mutex at wait(), but I can't think of a reason why a condition variable will work with a different mutex. Java's intrinsic condition and Conditon class don't support this, so I think I can give up this flexibility and be honest one-on-one. Conversely, the condition_varianle of boost::thread specifies the mutex at wait, please visit the complex design of its synchronization primitive:

l Concept has four types of Lockable, TimedLockable, SharedLockable, UpgradeLockable.

l Lock 有五六种: lock_guard, unique_lock, shared_lock, upgrade_lock, upgrade_to_unique_lock, scoped_try_lock.

l Mutex 有七种:mutex, try_mutex, timed_mutex, recursive_mutex, recursive_try_mutex, recursive_timed_mutex, shared_mutex.

恕我愚钝,见到boost::thread 这样如Rube Goldberg Machine 一样“灵活”的库我只得三揖绕道而行。这些class 名字也很无厘头,为什么不老老实实用reader_writer_lock 这样的通俗名字呢?非得增加精神负担,自己发明新名字。我不愿为这样的灵活性付出代价,宁愿自己做几个简简单单的一看就明白的classes 来用,这种简单的几行代码的轮子造造也无妨。提供灵活性固然是本事,然而在不需要灵活性的地方把代码写死,更需要大智慧。

下面这个Condition 简单地封装了pthread cond var,用起来也容易,见本节前面的例子。这里我用notify/notifyAll 作为函数名,因为signal 有别的含义,C++ 里的signal/slot,C 里的signal handler 等等。就别overload 这个术语了。

class Condition : boost::noncopyable

{

public:

Condition(MutexLock& mutex) : mutex_(mutex)

{ pthread_cond_init(&pcond_, NULL); }

~Condition()

{ pthread_cond_destroy(&pcond_); }

void wait()

{ pthread_cond_wait(&pcond_, mutex_.getPthreadMutex()); }

void notify()

{ pthread_cond_signal(&pcond_); }

void notifyAll()

{ pthread_cond_broadcast(&pcond_); }

private:

MutexLock& mutex_;

pthread_cond_t pcond_;

};

如果一个class 要包含MutexLock 和Condition,请注意它们的声明顺序和初始化顺序,mutex_ 应先于condition_ 构造,并作为后者的构造参数:

class CountDownLatch

{

public:

CountDownLatch(int count)

: count_(count),

mutex_(),

condition_(mutex_)

{ }

private:

int count_;

MutexLock mutex_; // 顺序很重要

Condition condition_;

};

请允许我再次强调,虽然本节花了大量篇幅介绍如何正确使用mutex 和condition variable,但并不代表我鼓励到处使用它们。这两者都是非常底层的同步原语,主要用来实现更高级的并发编程工具,一个多线程程序里如果大量使用mutex 和condition variable 来同步,基本跟用铅笔刀锯大树(孟岩语)没啥区别。

线程安全的Singleton 实现

研究Signleton 的线程安全实现的历史你会发现很多有意思的事情,一度人们认为Double checked locking 是王道,兼顾了效率与正确性。后来有神牛指出由于乱序执行的影响,DCL 是靠不住的。(这个又让我想起了SQL 注入,十年前用字符串拼接出SQL 语句是Web 开发的通行做法,直到有一天有人利用这个漏洞越权获得并修改网站数据,人们才幡然醒悟,赶紧修补。)Java 开发者还算幸运,可以借助内部静态类的装载来实现。C++ 就比较惨,要么次次锁,要么eager initialize、或者动用memory barrier 这样的大杀器( http://Papers/DDJ_Jul_Aug_2004_revised.pdf )。接下来Java 5 修订了内存模型,并增强了volatile 的语义,这下DCL (with volatile) 又是安全的了。然而C++ 的内存模型还在修订中,C++ 的volatile 目前还不能(将来也难说)保证DCL 的正确性(只在VS2005+ 上有效)。

其实没那么麻烦,在实践中用pthread once 就行:

#include 《pthread.h》

template《typename T》

class Singleton : boost::noncopyable

{

public:

static T& instance()

{

pthread_once(&ponce_, &Singleton::init);

return *value_;

}

static void init()

{

value_ = new T();

}

private:

static pthread_once_t ponce_;

static T* value_;

};

template《typename T》

pthread_once_t Singleton《T》::ponce_ = PTHREAD_ONCE_INIT;

template《typename T》

T* Singleton《T》::value_ = NULL;

上面这个Singleton 没有任何花哨的技巧,用pthread_once_t 来保证lazy-initialization 的线程安全。使用方法也很简单:

Foo& foo = Singleton《Foo》::instance();

当然,这个Singleton 没有考虑对象的销毁,在服务器程序里,这不是一个问题,因为当程序退出的时候自然就释放所有资源了(前提是程序里不使用不能由操作系统自动关闭的资源,比如跨进程的Mutex)。另外,这个Singleton 只能调用默认构造函数,如果用户想要指定T 的构造方式,我们可以用模板特化(template specialization) 技术来提供一个定制点,这需要引入另一层间接。

归纳

l 进程间通信首选TCP sockets

l 线程同步的四项原则

l 使用互斥器的条件变量的惯用手法(idiom),关键是RAII

用好这几样东西,基本上能应付多线程服务端开发的各种场合,只是或许有人会觉得性能没有发挥到极致。我认为,先把程序写正确了,再考虑性能优化,这在多线程下任然成立。让一个正确的程序变快,远比“让一个快的程序变正确”容易得多。

7 总结

在现代的多核计算背景下,线程是不可避免的。多线程编程是一项重要的个人技能,不能因为它难就本能地排斥,现在的软件开发比起10 年20 年前已经难了不知道多少倍。掌握多线程编程,才能更理智地选择用还是不用多线程,因为你能预估多线程实现的难度与收益,在一开始做出正确的选择。要知道把一个单线程程序改成多线程的,往往比重头实现一个多线程的程序更难。

掌握同步原语和它们的适用场合时多线程编程的基本功。以我的经验,熟练使用文中提到的同步原语,就能比较容易地编写线程安全的程序。本文没有考虑signal 对多线程编程的影响,Unix 的signal 在多线程下的行为比较复杂,一般要靠底层的网络库(如Reactor) 加以屏蔽,避免干扰上层应用程序的开发。

通篇来看,“效率”并不是我的主要考虑点,a) TCP 不是效率最高的IPC,b) 我提倡正确加锁而不是自己编写lock-free 算法(使用原子操作除外)。在程序的复杂度和性能之前取得平衡,并经考虑未来两三年扩容的可能(无论是CPU 变快、核数变多,还是机器数量增加,网络升级)。下一篇“多线程编程的反模式”会考察伸缩性方面的常见错误,我认为在分布式系统中,伸缩性(scalability) 比单机的性能优化更值得投入精力。

这篇文章记录了我目前对多线程编程的理解,用文中介绍的手法,我能解决自己面临的全部多线程编程任务。如果文章的观点与您不合,比如您使用了我没有推荐使用的技术或手法(共享内存、信号量等等),只要您理由充分,但行无妨。

这篇文章本来还有两节“多线程编程的反模式”与“多线程的应用场景”,考虑到字数已经超过一万了,且听下回分解吧:-)

后文预览:Sleep 反模式

我认为sleep 只能出现在测试代码中,比如写单元测试的时候。(涉及时间的单元测试不那么好写,短的如一两秒钟可以用sleep,长的如一小时一天得想其他办法,比如把算法提出来并把时间注入进去。)产品代码中线程的等待可分为两种:一种是无所事事的时候(要么等在select/poll/epoll 上。要么等在condition variable 上,等待BlockingQueue /CountDownLatch 亦可归入此类),一种是等着进入临界区(等在mutex 上)以便继续处理。在程序的正常执行中,如果需要等待一段时间,应该往event loop 里注册一个timer,然后在timer 的回调函数里接着干活,因为线程是个珍贵的共享资源,不能轻易浪费。如果多线程的安全性和效率要靠代码主动调用sleep 来保证,这是设计出了问题。等待一个事件发生,正确的做法是用select 或condition variable 或(更理想地)高层同步工具。当然,在GUI 编程中会有主动让出CPU 的做法,比如调用sleep(0) 来实现yield。

Insulated Power Cable

Insulated Power Cable,Bimetallic Crimp Lugs Cable,Pvc Copper Cable,Cable With Copper Tube Terminal

Taixing Longyi Terminals Co.,Ltd. , https://www.longyicopperterminals.com

This entry was posted in on