多线程为什么跑的比单线程还要慢的情况分析及验证
“多个人干活比一个人干活要快,多线程并行执行也比单线程要快”这是我学习编程长期以来的想法。然而在实际的开发过程中,并不是所有情况下都是这样。先看看下面的程序(点击下载):

ThreadTester是所有Tester的基类。所有的Tester都干的是同样一件事情,把counter增加到100000000,每次只能加1。
1:publicabstractclass ThreadTester
   2:     {
3:publicconstlong MAX_COUNTER_NUMBER = 100000000;
   4:  
5:privatelong _counter = 0;
   6:  
7://获得计数
8:publicvirtuallong GetCounter()
   9:         {
10:returnthis._counter;
  11:         }
  12:  
13://增加计数器
14:protectedvirtualvoid IncreaseCounter()
  15:         {
16:this._counter += 1;
  17:         }
  18:  
19://启动测试
20:publicabstractvoid Start();
  21:  
22://获得Counter从开始增加到现在的数字所耗的时间
23:publicabstractlong GetElapsedMillisecondsOfIncreaseCounter();
  24:  
25://测试是否正在运行
26:publicabstractbool IsTesterRunning();
  27:     }
SingleThreadTester是单线程计数。
1:class SingleThreadTester : ThreadTester
   2:     {
3:private Stopwatch _aStopWatch = new Stopwatch();
   4:  
5:publicoverridevoid Start()
   6:         {
   7:             _aStopWatch.Start();
   8:  
9: Thread aThread = new Thread(() => WorkInThread());
  10:             aThread.Start();
  11:         }
  12:  
13:publicoverridelong GetElapsedMillisecondsOfIncreaseCounter()
  14:         {
15:returnthis._aStopWatch.ElapsedMilliseconds;
  16:         }
  17:  
18:publicoverridebool IsTesterRunning()
  19:         {
20:return _aStopWatch.IsRunning;
  21:         }
  22:  
23:privatevoid WorkInThread()
  24:         {
25:while (true)
  26:             {
27:if (this.GetCounter() > ThreadTester.MAX_COUNTER_NUMBER)
  28:                 {
  29:                     _aStopWatch.Stop();
30:break;
  31:                 }
  32:  
33:this.IncreaseCounter();
  34:             }
  35:         }
  36:     }
TwoThreadSwitchTester是两个线程交替计数。
1:class TwoThreadSwitchTester : ThreadTester
   2:     {
3:private Stopwatch _aStopWatch = new Stopwatch();
4:private AutoResetEvent _autoResetEvent = new AutoResetEvent(false);
   5:  
6:publicoverridevoid Start()
   7:         {
   8:             _aStopWatch.Start();
   9:  
10: Thread aThread1 = new Thread(() => Work1InThread());
  11:             aThread1.Start();
  12:  
13: Thread aThread2 = new Thread(() => Work2InThread());
  14:             aThread2.Start();
  15:         }
  16:  
17:publicoverridelong GetElapsedMillisecondsOfIncreaseCounter()
  18:         {
19:returnthis._aStopWatch.ElapsedMilliseconds;
  20:         }
  21:  
22:publicoverridebool IsTesterRunning()
  23:         {
24:return _aStopWatch.IsRunning;
  25:         }
  26:  
27:privatevoid Work1InThread()
  28:         {
29:while (true)
  30:             {
  31:                 _autoResetEvent.WaitOne();
  32:
33:this.IncreaseCounter();
  34:  
35:if (this.GetCounter() > ThreadTester.MAX_COUNTER_NUMBER)
  36:                 {
  37:                     _aStopWatch.Stop();
38:break;
  39:                 }
  40:  
  41:                 _autoResetEvent.Set();
  42:             }
  43:         }
  44:  
45:privatevoid Work2InThread()
  46:         {
47:while (true)
  48:             {
  49:                 _autoResetEvent.Set();
  50:                 _autoResetEvent.WaitOne();
51:this.IncreaseCounter();
  52:  
53:if (this.GetCounter() > ThreadTester.MAX_COUNTER_NUMBER)
  54:                 {
  55:                     _aStopWatch.Stop();
56:break;
  57:                 }
  58:             }
  59:         }
  60:     }
MultiThreadTester可以指定线程数,多个线程争抢计数。
1:class MultiThreadTester : ThreadTester
   2:     {
3:private Stopwatch _aStopWatch = new Stopwatch();
4:privatereadonlyint _threadCount = 0;
5:privatereadonlyobject _counterLock = newobject();
   6:
7:public MultiThreadTester(int threadCount)
   8:         {
9:this._threadCount = threadCount;
  10:         }
  11:  
12:publicoverridevoid Start()
  13:         {
  14:             _aStopWatch.Start();
  15:  
16:for (int i = 0; i < _threadCount; i++)
  17:             {
18: Thread aThread = new Thread(() => WorkInThread());
  19:                 aThread.Start();
  20:             }
  21:         }
  22:  
23:publicoverridelong GetElapsedMillisecondsOfIncreaseCounter()
  24:         {
25:returnthis._aStopWatch.ElapsedMilliseconds;
  26:         }
  27:  
28:publicoverridebool IsTesterRunning()
  29:         {
30:return _aStopWatch.IsRunning;
  31:         }
  32:  
33:privatevoid WorkInThread()
  34:         {
35:while (true)
  36:             {
37:lock (_counterLock)
  38:                 {
39:if (this.GetCounter() > ThreadTester.MAX_COUNTER_NUMBER)
  40:                     {
  41:                         _aStopWatch.Stop();
42:break;
  43:                     }
  44:  
45:this.IncreaseCounter();
  46:                 }
  47:             }
  48:         }
  49:     }
Program的Main函数中,根据用户的选择来决定执行哪个测试类。
1:class Program
   2:     {
3:staticvoid Main(string[] args)
   4:         {
   5:  
6:string inputText = GetUserChoice();
   7:  
8:while (!"4".Equals(inputText))
   9:             {
  10:                 ThreadTester tester = GreateThreadTesterByInputText(inputText);
  11:                 tester.Start();
  12:  
13:while (true)
  14:                 {
  15:                     Console.WriteLine(GetStatusOfThreadTester(tester));
16:if (!tester.IsTesterRunning())
  17:                     {
18:break;
  19:                     }
  20:                     Thread.Sleep(100);
  21:                 }
  22:  
  23:                 inputText = GetUserChoice();
  24:             }
  25:  
26: Console.Write("Click enter to exit...");
  27:         }
  28:  
29:privatestaticstring GetStatusOfThreadTester(ThreadTester tester)
  30:         {
31:returnstring.Format("[耗时{0}ms] counter = {1}, {2}",
  32:                     tester.GetElapsedMillisecondsOfIncreaseCounter(), tester.GetCounter(),
33: tester.IsTesterRunning() ? "running" : "stopped");
  34:         }
  35:  
36:privatestatic ThreadTester GreateThreadTesterByInputText(string inputText)
  37:         {
38:switch (inputText)
  39:             {
40:case"1":
41:returnnew SingleThreadTester();
42:case"2":
43:returnnew TwoThreadSwitchTester();
44:default:
45:returnnew MultiThreadTester(100);
  46:             }
  47:         }
  48:  
49:privatestaticstring GetUserChoice()
  50:         {
51: Console.WriteLine(@"==Please select the option in the following list:==
  52: 1. SingleThreadTester
  53: 2. TwoThreadSwitchTester
  54: 3. MultiThreadTester
  55: 4. Exit");
  56:  
57:string inputText = Console.ReadLine();
  58:  
59:return inputText;
  60:         }
  61:     }
三个测试类,运行结果如下:
Single Thread:
[耗时407ms] counter = 100000001, stopped
[耗时453ms] counter = 100000001, stopped
[耗时412ms] counter = 100000001, stopped
Two Thread Switch:
[耗时161503ms] counter = 100000001, stopped
[耗时164508ms] counter = 100000001, stopped
[耗时164201ms] counter = 100000001, stopped
Multi Threads - 100 Threads:
[耗时3659ms] counter = 100000001, stopped
[耗时3950ms] counter = 100000001, stopped
[耗时3720ms] counter = 100000001, stopped
Multi Threads - 2 Threads:
[耗时3078ms] counter = 100000001, stopped
[耗时3160ms] counter = 100000001, stopped
[耗时3106ms] counter = 100000001, stopped
什么是线程上下文切换
上下文切换的精确定义可以参考: http://www.linfo.org/context_switch.html。多任务系统往往需要同时执行多道作业。作业数往往大于机器的CPU数,然而一颗CPU同时只能执行一项任务,为了让用户感觉这些任务正在同时进行,操作系统的设计者巧妙地利用了时间片轮转的方式,CPU给每个任务都服务一定的时间,然后把当前任务的状态保存下来,在加载下一任务的状态后,继续服务下一任务。任务的状态保存及再加载,这段过程就叫做上下文切换。时间片轮转的方式使多个任务在同一颗CPU上执行变成了可能,但同时也带来了保存现场和加载现场的直接消耗。(Note. 更精确地说, 上下文切换会带来直接和间接两种因素影响程序性能的消耗. 直接消耗包括: CPU寄存器需要保存和加载, 系统调度器的代码需要执行, TLB实例需要重新加载, CPU 的pipeline需要刷掉; 间接消耗指的是多核的cache之间得共享数据, 间接消耗对于程序的影响要看线程工作区操作数据的大小).

根据上面上下文切换的定义,我们做出下面的假设:
- 之所以TwoThreadSwitchTester执行速度最慢,因为线程上下文切换的次数最多,时间主要消耗在上下文切换了,两个线程交替计数,每计数一次就要做一次线程切换。
 - “Multi Threads - 100 Threads”比“Multi Threads - 2 Threads”开的线程数量要多,导致线程切换次数也比后者多,执行时间也比后者长。
 
由于Windows下没有像Linux下的vmstat这样的工具,这里我们使用Process Explorer看看程序执行的时候线程上线文切换的次数。
Single Thread:

计数期间,线程总共切换了580-548=32次。(548是启动程序后,初始的数值)
Two Thread Switch:

计数期间,线程总共切换了33673295-124=33673171次。(124是启动程序后,初始的数值)
Multi Threads - 100 Threads:

计数期间,线程总共切换了846-329=517次。(329是启动程序后,初始的数值)
Multi Threads - 2 Threads:

计数期间,线程总共切换了295-201=94次。(201是启动程序后,初始的数值)
从上面收集的数据来看,和我们的判断基本相符。
干活的其实是CPU,而不是线程
再想想原来学过的知识,之前一直以为线程多干活就快,简直是把学过的计算机原理都还给老师了。真正干活的不是线程,而是CPU。线程越多,干活不一定越快。
那么高并发的情况下什么时候适合单线程,什么时候适合多线程呢?
适合单线程的场景:单个线程的工作逻辑简单,而且速度非常快,比如从内存中读取某个值,或者从Hash表根据key获得某个value。Redis和Node.js这类程序都是单线程,适合单个线程简单快速的场景。
适合多线程的场景:单个线程的工作逻辑复杂,等待时间较长或者需要消耗大量系统运算资源,比如需要从多个远程服务获得数据并计算,或者图像处理。
例子程序:http://pan.baidu.com/s/1c05WrGO
参考:
- Context Switch – Wikipedia
 - 多线程的代价
 - Threading in C#
 - 为什么我要用 Node.js? 案例逐一介绍
 - 知乎——redis是个单线程的程序,为什么会这么快呢?每秒10000?这个有点不解,具体是快在哪里呢?EPOLL?内存?多线程
 - 从Java视角理解系统结构(一)CPU上下文切换