We currently deploy allura in a single-threaded mod_wsgi configuration. Using multiple threads in the past gave very poor performance. Figure out what doesn't work well with multiple threads.
Basic ab testing with 500 requests on a development machine using mod_wsgi shows reasonable behavior when adding threads. Note timings below end up varying between re-runs, so only consider significant changes. ab was run with concurrency matching mod_wsgi's proc*thread, and a few manual requests were made first to make sure it was warmed up
baseline: 2 proc, 1 thread
Concurrency Level: 2
Time taken for tests: 41.928930 seconds
Requests per second: 11.92 [#/sec] (mean)
Time per request: 167.716 [ms] (mean)
Time per request: 83.858 [ms] (mean, across all concurrent requests)
Percentage of the requests served within a certain time (ms)
50% 156
66% 158
75% 159
80% 161
90% 168
95% 308
98% 317
99% 321
100% 345 (longest request)
switch it around: 1 proc 2 threads
Slower, roughly by 2x total time. Makes sense since only 1 process can actually be busy
Concurrency Level: 2
Time taken for tests: 105.186368 seconds
Requests per second: 4.75 [#/sec] (mean)
Time per request: 420.745 [ms] (mean)
Time per request: 210.373 [ms] (mean, across all concurrent requests)
Transfer rate: 95.28 [Kbytes/sec] received
Percentage of the requests served within a certain time (ms)
50% 367
66% 377
75% 393
80% 427
90% 508
95% 602
98% 1234
99% 1508
100% 2312 (longest request)
even more threads: 1 proc 5 threads, 5 concur
Roughly same total time to serve 500 requests, but most individual requests were slower.
Concurrency Level: 5
Time taken for tests: 111.214292 seconds
Requests per second: 4.50 [#/sec] (mean)
Time per request: 1112.143 [ms] (mean)
Time per request: 222.429 [ms] (mean, across all concurrent requests)
Percentage of the requests served within a certain time (ms)
50% 1035
66% 1091
75% 1130
80% 1145
90% 1247
95% 1368
98% 2741
99% 3589
100% 4104 (longest request)
2 proc 2 threads
Compared to first baseline, this just adds a thread to each process. Results are very comparable on all aspects
Concurrency Level: 4
Time taken for tests: 46.208258 seconds
Requests per second: 10.82 [#/sec] (mean)
Time per request: 369.666 [ms] (mean)
Time per request: 92.417 [ms] (mean, across all concurrent requests)
Percentage of the requests served within a certain time (ms)
50% 348
66% 358
75% 365
80% 376
90% 475
95% 494
98% 516
99% 552
100% 849 (longest request)
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
I simulated a test where I/O is blocking requests, and threading should give an advantage to overall throughput, by adding a sleep(3) to the controller requested.
baseline: 2 procs, 1 thread
Concurrency Level: 2
Time taken for tests: 159.146704 seconds
Requests per second: 0.63 [#/sec] (mean)
Time per request: 3182.934 [ms] (mean)
Time per request: 1591.467 [ms] (mean, across all concurrent requests)
Percentage of the requests served within a certain time (ms)
50% 3111
66% 3112
75% 3113
80% 3118
90% 3353
95% 3750
98% 4065
99% 4272
100% 4272 (longest request)
add threads: 2 procs, 2 thread
This completed in about half the time, as expected. Upper end of requests were slower though.
Concurrency Level: 4
Time taken for tests: 85.386232 seconds
Requests per second: 1.17 [#/sec] (mean)
Time per request: 3415.449 [ms] (mean)
Time per request: 853.862 [ms] (mean, across all concurrent requests)
Percentage of the requests served within a certain time (ms)
50% 3245
66% 3267
75% 3354
80% 3382
90% 4305
95% 4640
98% 4986
99% 5074
100% 5074 (longest request)
2 procs, 1 thread
Just to confirm that the above 2x net speedup was not due to just running ab with concurrency 4, I ran with the 2 proc 1 thread configuration again, and kept ab at concurrency 2. As you can see, total time was at the same range as the first test, and individual requests took 2x slower (6sec) because ab was making twice as many simultaneous requests as mod_wsgi could service, so they had to wait.
~~~~
Concurrency Level: 4
Time taken for tests: 158.275591 seconds
Requests per second: 0.63 [#/sec] (mean)
Time per request: 6331.024 [ms] (mean)
Time per request: 1582.756 [ms] (mean, across all concurrent requests)
Percentage of the requests served within a certain time (ms)
50% 6219
66% 6221
75% 6232
80% 6327
90% 6887
95% 7063
98% 7772
99% 7925
100% 7925 (longest request)
~~~~
If you would like to refer to this comment somewhere else in this project, copy and paste the following link:
History of our configuration changes are at https://control.siteops.geek.net/git/?p=mastertree.git;a=history;f=host/sfn-web/etc/httpd/conf.d/wsgi-daemon.inc;h=f59577826ca85f3d6b61d1a31e44b4ca047685d9;hb=HEAD Graphite data doesn't go back to the 2012-10-08 to 2012-10-11 range, but stats.log files may still be on the hosts, so we can look at that to learn from that attempt.
My recollection is that performance was very erratic. I don't recall errors, but we should look logs to check.
TimerMiddleware was made thread-safe on 2012-11-05, so performance data during the 2012-10 attempt is not trustworthy.
Basic
ab
testing with 500 requests on a development machine using mod_wsgi shows reasonable behavior when adding threads. Note timings below end up varying between re-runs, so only consider significant changes.ab
was run with concurrency matching mod_wsgi's proc*thread, and a few manual requests were made first to make sure it was warmed upbaseline: 2 proc, 1 thread
switch it around: 1 proc 2 threads
Slower, roughly by 2x total time. Makes sense since only 1 process can actually be busy
even more threads: 1 proc 5 threads, 5 concur
Roughly same total time to serve 500 requests, but most individual requests were slower.
2 proc 2 threads
Compared to first baseline, this just adds a thread to each process. Results are very comparable on all aspects
I simulated a test where I/O is blocking requests, and threading should give an advantage to overall throughput, by adding a sleep(3) to the controller requested.
baseline: 2 procs, 1 thread
add threads: 2 procs, 2 thread
This completed in about half the time, as expected. Upper end of requests were slower though.
2 procs, 1 thread
Just to confirm that the above 2x net speedup was not due to just running
ab
with concurrency 4, I ran with the 2 proc 1 thread configuration again, and keptab
at concurrency 2. As you can see, total time was at the same range as the first test, and individual requests took 2x slower (6sec) becauseab
was making twice as many simultaneous requests as mod_wsgi could service, so they had to wait.~~~~
Concurrency Level: 4
Time taken for tests: 158.275591 seconds
Requests per second: 0.63 [#/sec] (mean)
Time per request: 6331.024 [ms] (mean)
Time per request: 1582.756 [ms] (mean, across all concurrent requests)
Percentage of the requests served within a certain time (ms)
50% 6219
66% 6221
75% 6232
80% 6327
90% 6887
95% 7063
98% 7772
99% 7925
100% 7925 (longest request)
~~~~
Seems to work fine - can reopen if issues arise.