diff --git a/configure.ac b/configure.ac
index 0262bab96..e016db0b9 100644
--- a/configure.ac
+++ b/configure.ac
@@ -253,6 +253,7 @@ AC_CONFIG_FILES([
    dep/src/sockets/Makefile
    dep/src/zlib/Makefile
    dep/Makefile
+   dep/tbb/Makefile
    doc/Doxyfile
    doc/Makefile
    Makefile
diff --git a/dep/Makefile.am b/dep/Makefile.am
index 2bd06d3a2..f2a9d7c95 100644
--- a/dep/Makefile.am
+++ b/dep/Makefile.am
@@ -23,5 +23,8 @@ if MANGOS_BUILD_ACE
 SUBDIRS += ACE_wrappers
 endif
 
+# Intel's TBB
+SUBDIRS += tbb
+
 ## Additional files to include when running 'make dist'
 #  Nothing yet.
diff --git a/dep/tbb/CHANGES b/dep/tbb/CHANGES
new file mode 100644
index 000000000..e6c71a98f
--- /dev/null
+++ b/dep/tbb/CHANGES
@@ -0,0 +1,678 @@
+TBB 2.2 Update 1 commercial-aligned release
+
+Changes (w.r.t. TBB 2.2 commercial-aligned release):
+
+- Incorporates all changes from open-source releases below.
+- Documentation was updated.
+- TBB scheduler auto-initialization now covers all possible use cases.
+- concurrent_queue: made argument types of sizeof used in paddings
+  consistent with those actually used.
+- Memory allocator was improved: supported corner case of user's malloc 
+    calling scalable_malloc (non-Windows), corrected processing of 
+    memory allocation requests during tbb memory allocator startup 
+    (Linux).
+- Windows malloc replacement has got better support for static objects.
+- In pipeline setups that do not allow actual parallelism, execution 
+    by a single thread is guaranteed, idle spinning eliminated, and 
+    performance improved.
+- RML refactoring and clean-up.
+- New constructor for concurrent_hash_map allows reserving space for 
+    a number of items.
+- Operator delete() added to the TBB exception classes.
+- Lambda support was improved in parallel_reduce.
+- gcc 4.3 warnings were fixed for concurrent_queue.
+- Fixed possible initialization deadlock in modules using TBB entities
+    during construction of global static objects.
+- Copy constructor in concurrent_hash_map was fixed.
+- Fixed a couple of rare crashes in the scheduler possible before 
+    in very specific use cases.
+- Fixed a rare crash in the TBB allocator running out of memory.
+- New tests were implemented, including test_lambda.cpp that checks 
+    support for lambda expressions.
+- A few other small changes in code, tests, and documentation.
+
+------------------------------------------------------------------------
+20090809 open-source release
+
+Changes (w.r.t. TBB 2.2 commercial-aligned release):
+
+- Fixed known exception safety issues in concurrent_vector.
+- Better concurrency of simultaneous grow requests in concurrent_vector.
+- TBB allocator further improves performance of large object allocation.
+- Problem with source of text relocations was fixed on Linux
+- Fixed bugs related to malloc replacement under Windows
+- A few other small changes in code and documentation.
+
+------------------------------------------------------------------------
+TBB 2.2 commercial-aligned release
+
+Changes (w.r.t. TBB 2.1 U4 commercial-aligned release):
+
+- Incorporates all changes from open-source releases below.
+- Architecture folders renamed from em64t to intel64 and from itanium
+    to ia64.
+- Major Interface version changed from 3 to 4. Deprecated interfaces 
+    might be removed in future releases.
+- Parallel algorithms that use partitioners have switched to use 
+    the auto_partitioner by default.
+- Improved memory allocator performance for allocations bigger than 8K.
+- Added new thread-bound filters functionality for pipeline.
+- New implementation of concurrent_hash_map that improves performance 
+    significantly.
+- A few other small changes in code and documentation.
+
+------------------------------------------------------------------------
+20090511 open-source release
+
+Changes (w.r.t. previous open-source release):
+
+- Basic support for MinGW32 development kit.
+- Added tbb::zero_allocator class that initializes memory with zeros.
+    It can be used as an adaptor to any STL-compatible allocator class.
+- Added tbb::parallel_for_each template function as alias to parallel_do.
+- Added more overloads for tbb::parallel_for.
+- Added support for exact exception propagation (can only be used with 
+    compilers that support C++0x std::exception_ptr).
+- tbb::atomic template class can be used with enumerations.
+- mutex, recursive_mutex, spin_mutex, spin_rw_mutex classes extended
+    with explicit lock/unlock methods.
+- Fixed size() and grow_to_at_least() methods of tbb::concurrent_vector
+    to provide space allocation guarantees. More methods added for
+    compatibility with std::vector, including some from C++0x.
+- Preview of a lambda-friendly interface for low-level use of tasks.
+- scalable_msize function added to the scalable allocator (Windows only).
+- Rationalized internal auxiliary functions for spin-waiting and backoff.
+- Several tests undergo decent refactoring.
+
+Changes affecting backward compatibility:
+
+- Improvements in concurrent_queue, including limited API changes. 
+    The previous version is deprecated; its functionality is accessible 
+    via methods of the new tbb::concurrent_bounded_queue class.
+- grow* and push_back methods of concurrent_vector changed to return
+    iterators; old semantics is deprecated. 
+
+------------------------------------------------------------------------
+TBB 2.1 Update 4 commercial-aligned release
+
+Changes (w.r.t. TBB 2.1 U3 commercial-aligned release):
+
+- Added tests for aligned memory allocations and malloc replacement.
+- Several improvements for better bundling with Intel(R) C++ Compiler.
+- A few other small changes in code and documentaion.
+
+Bugs fixed: 
+
+- 150 - request to build TBB examples with debug info in release mode.
+- backward compatibility issue with concurrent_queue on Windows.
+- dependency on VS 2005 SP1 runtime libraries removed.
+- compilation of GUI examples under XCode* 3.1 (1577).
+- On Windows, TBB allocator classes can be instantiated with const types 
+    for compatibility with MS implementation of STL containers (1566).
+
+------------------------------------------------------------------------
+20090313 open-source release
+
+Changes (w.r.t. 20081109 open-source release):
+
+- Includes all changes introduced in TBB 2.1 Update 2 & Update 3
+    commercial-aligned releases (see below for details).
+- Added tbb::parallel_invoke template function. It runs up to 10 
+    user-defined functions in parallel and waits for them to complete.
+- Added a special library providing ability to replace the standard
+    memory allocation routines in Microsoft* C/C++ RTL (malloc/free,
+    global new/delete, etc.) with the TBB memory allocator. 
+    Usage details are described in include/tbb/tbbmalloc_proxy.h file.
+- Task scheduler switched to use new implementation of its core 
+    functionality (deque based task pool, new structure of arena slots).
+- Preview of Microsoft* Visual Studio* 2005 project files for 
+    building the library is available in build/vsproject folder.
+- Added tests for aligned memory allocations and malloc replacement.
+- Added parallel_for/game_of_life.net example (for Windows only) 
+    showing TBB usage in a .NET application.
+- A number of other fixes and improvements to code, tests, makefiles,
+    examples and documents.
+
+Bugs fixed: 
+
+- The same list as in TBB 2.1 Update 4 right above.
+
+------------------------------------------------------------------------
+TBB 2.1 Update 3 commercial-aligned release
+
+Changes (w.r.t. TBB 2.1 U2 commercial-aligned release):
+
+- Added support for aligned allocations to the TBB memory allocator.
+- Added a special library to use with LD_PRELOAD on Linux* in order to 
+    replace the standard memory allocation routines in C/C++ with the 
+    TBB memory allocator.
+- Added null_mutex and null_rw_mutex: no-op classes interface-compliant 
+    to other TBB mutexes.
+- Improved performance of parallel_sort, to close most of the serial gap
+    with std::sort, and beat it on 2 and more cores.
+- A few other small changes.
+
+Bugs fixed:
+
+- the problem where parallel_for hanged after exception throw 
+    if affinity_partitioner was used (1556).
+- get rid of VS warnings about mbstowcs deprecation (1560),
+    as well as some other warnings.
+- operator== for concurrent_vector::iterator fixed to work correctly
+    with different vector instances.
+
+------------------------------------------------------------------------
+TBB 2.1 Update 2 commercial-aligned release
+
+Changes (w.r.t. TBB 2.1 U1 commercial-aligned release):
+
+- Incorporates all open-source-release changes down to TBB 2.1 U1,
+    except for:
+    - 20081019 addition of enumerable_thread_specific;
+- Warning level for Microsoft* Visual C++* compiler raised to /W4 /Wp64;
+    warnings found on this level were cleaned or suppressed.
+- Added TBB_runtime_interface_version API function.
+- Added new example: pipeline/square.
+- Added exception handling and cancellation support
+    for parallel_do and pipeline.
+- Added copy constructor and [begin,end) constructor to concurrent_queue.
+- Added some support for beta version of Intel(R) Parallel Amplifier.
+- Added scripts to set environment for cross-compilation of 32-bit 
+    applications on 64-bit Linux with Intel(R) C++ Compiler.
+- Fixed semantics of concurrent_vector::clear() to not deallocate
+    internal arrays. Fixed compact() to perform such deallocation later.
+- Fixed the issue with atomic<T*> when T is incomplete type.
+- Improved support for PowerPC* Macintosh*, including the fix 
+    for a bug in masked compare-and-swap reported by a customer.
+- As usual, a number of other improvements everywhere.
+
+------------------------------------------------------------------------
+20081109 open-source release
+
+Changes (w.r.t. previous open-source release):
+
+- Added new serial out of order filter for tbb::pipeline.
+- Fixed the issue with atomic<T*>::operator= reported at the forum.
+- Fixed the issue with using tbb::task::self() in task destructor 
+    reported at the forum.
+- A number of other improvements to code, tests, makefiles, examples 
+    and documents.
+
+Open-source contributions integrated: 
+- Changes in the memory allocator were partially integrated.
+
+------------------------------------------------------------------------
+20081019 open-source release
+
+Changes (w.r.t. previous open-source release):
+
+- Introduced enumerable_thread_specific<T>.  This new class provides a 
+    wrapper around native thread local storage as well as iterators and 
+    ranges for accessing the thread local copies (1533).
+- Improved support for Intel(R) Threading Analysis Tools
+    on Intel(R) 64 architecture.
+- Dependency from Microsoft* CRT was integrated to the libraries using 
+    manifests, to avoid issues if called from code that uses different 
+    version of Visual C++* runtime than the library.
+- Introduced new defines TBB_USE_ASSERT, TBB_USE_DEBUG, 
+    TBB_USE_PERFORMANCE_WARNINGS, TBB_USE_THREADING_TOOLS.
+- A number of other improvements to code, tests, makefiles, examples 
+    and documents.
+
+Open-source contributions integrated:
+
+- linker optimization: /incremental:no .
+
+------------------------------------------------------------------------
+20080925 open-source release
+
+Changes (w.r.t. previous open-source release):
+
+- Same fix for a memory leak in the memory allocator as in TBB 2.1 U1.
+- Improved support for lambda functions.
+- Fixed more concurrent_queue issues reported at the forum.
+- A number of other improvements to code, tests, makefiles, examples 
+    and documents.
+
+------------------------------------------------------------------------
+TBB 2.1 Update 1 commercial-aligned release
+
+Changes (w.r.t. TBB 2.1 Gold commercial-aligned release):
+
+- Fixed small memory leak in the memory allocator.
+- Incorporates all open-source-release changes down to TBB 2.1 GOLD,
+    except for:
+    - 20080825 changes for parallel_do;
+
+------------------------------------------------------------------------
+20080825 open-source release
+
+Changes (w.r.t. previous open-source release):
+
+- Added exception handling and cancellation support for parallel_do.
+- Added default HashCompare template argument for concurrent_hash_map.
+- Fixed concurrent_queue.clear() issues due to incorrect assumption
+    about clear() being private method.
+- Added the possibility to use TBB in applications that change
+    default calling conventions (Windows* only).
+- Many improvements to code, tests, examples, makefiles and documents.
+
+Bugs fixed:
+
+- 120, 130 - memset declaration missed in concurrent_hash_map.h
+
+------------------------------------------------------------------------
+20080724 open-source release
+
+Changes (w.r.t. previous open-source release):
+
+- Inline assembly for atomic operations improved for gcc 4.3
+- A few more improvements to the code.
+
+------------------------------------------------------------------------
+20080709 open-source release
+
+Changes (w.r.t. previous open-source release):
+
+- operator=() was added to the tbb_thread class according to
+    the current working draft for std::thread.
+- Recognizing SPARC* in makefiles for Linux* and Sun Solaris*.
+
+Bugs fixed:
+
+- 127 - concurrent_hash_map::range fixed to split correctly.
+
+Open-source contributions integrated:
+
+- fix_set_midpoint.diff by jyasskin
+- SPARC* support in makefiles by Raf Schietekat
+
+------------------------------------------------------------------------
+20080622 open-source release
+
+Changes (w.r.t. previous open-source release):
+
+- Fixed a hang that rarely happened on Linux
+    during deinitialization of the TBB scheduler.
+- Improved support for Intel(R) Thread Checker.
+- A few more improvements to the code.
+
+------------------------------------------------------------------------
+TBB 2.1 GOLD commercial-aligned release
+
+Changes (w.r.t. TBB 2.0 U3 commercial-aligned release):
+
+- All open-source-release changes down to, and including, TBB 2.0 GOLD
+    below, were incorporated into this release.
+
+------------------------------------------------------------------------
+20080605 open-source release
+
+Changes (w.r.t. previous open-source release):
+
+- Explicit control of exported symbols by version scripts added on Linux.
+- Interfaces polished for exception handling & algorithm cancellation.
+- Cache behavior improvements in the scalable allocator.
+- Improvements in text_filter, polygon_overlay, and other examples.
+- A lot of other stability improvements in code, tests, and makefiles.
+- First release where binary packages include headers/docs/examples, so
+    binary packages are now self-sufficient for using TBB.
+
+Open-source contributions integrated:
+
+- atomics patch (partially).
+- tick_count warning patch.
+
+Bugs fixed:
+
+- 118 - fix for boost compatibility.
+- 123 - fix for tbb_machine.h.
+
+------------------------------------------------------------------------
+20080512 open-source release
+
+Changes (w.r.t. previous open-source release):
+
+- Fixed a problem with backward binary compatibility
+    of debug Linux builds.
+- Sun* Studio* support added.
+- soname support added on Linux via linker script. To restore backward 
+    binary compatibility, *.so -> *.so.2 softlinks should be created.
+- concurrent_hash_map improvements - added few new forms of insert() 
+    method and fixed precondition and guarantees of erase() methods.   
+    Added runtime warning reporting about bad hash function used for 
+    the container. Various improvements for performance and concurrency.
+- Cancellation mechanism reworked so that it does not hurt scalability.
+- Algorithm parallel_do reworked. Requirement for Body::argument_type 
+    definition removed, and work item argument type can be arbitrarily 
+    cv-qualified.
+- polygon_overlay example added.
+- A few more improvements to code, tests, examples and Makefiles.
+
+Open-source contributions integrated:
+
+- Soname support patch for Bugzilla #112.
+
+Bugs fixed:
+
+- 112 - fix for soname support.
+
+------------------------------------------------------------------------
+TBB 2.0 U3 commercial-aligned release (package 017, April 20, 2008)
+
+Corresponds to commercial 019 (for Linux*, 020; for Mac OS* X, 018)
+packages.
+
+Changes (w.r.t. TBB 2.0 U2 commercial-aligned release):
+
+- Does not contain open-source-release changes below; this release is
+    only a minor update of TBB 2.0 U2.
+- Removed spin-waiting in pipeline and concurrent_queue.
+- A few more small bug fixes from open-source releases below.
+
+------------------------------------------------------------------------
+20080408 open-source release
+
+Changes (w.r.t. previous open-source release):
+
+- count_strings example reworked: new word generator implemented, hash 
+    function replaced, and tbb_allocator is used with std::string class.
+- Static methods of spin_rw_mutex were replaced by normal member 
+    functions, and the class name was versioned.
+- tacheon example was renamed to tachyon.
+- Improved support for Intel(R) Thread Checker.
+- A few more minor improvements.
+
+Open-source contributions integrated:
+
+- Two sets of Sun patches for IA Solaris support.
+
+------------------------------------------------------------------------
+20080402 open-source release
+
+Changes (w.r.t. previous open-source release):
+
+- Exception handling and cancellation support for tasks and algorithms 
+    fully enabled.
+- Exception safety guaranties defined and fixed for all concurrent 
+    containers.
+- User-defined memory allocator support added to all concurrent 
+    containers.
+- Performance improvement of concurrent_hash_map, spin_rw_mutex.
+- Critical fix for a rare race condition during scheduler 
+    initialization/de-initialization.
+- New methods added for concurrent containers to be closer to STL,
+    as well as automatic filters removal from pipeline
+    and __TBB_AtomicAND function.
+- The volatile keyword dropped from where it is not really needed.
+- A few more minor improvements.
+
+------------------------------------------------------------------------
+20080319 open-source release
+
+Changes (w.r.t. previous open-source release):
+
+- Support for gcc version 4.3 was added.
+- tbb_thread class, near compatible with std::thread expected in C++0x, 
+    was added.
+
+Bugs fixed:
+
+- 116 - fix for compilation issues with gcc version 4.2.1.
+- 120 - fix for compilation issues with gcc version 4.3.
+ 
+------------------------------------------------------------------------
+20080311 open-source release
+
+Changes (w.r.t. previous open-source release):
+
+- An enumerator added for pipeline filter types (serial vs. parallel).
+- New task_scheduler_observer class introduced, to observe when
+    threads start and finish interacting with the TBB task scheduler.
+- task_scheduler_init reverted to not use internal versioned class;
+    binary compatibility guaranteed with stable releases only.
+- Various improvements to code, tests, examples and Makefiles.
+ 
+------------------------------------------------------------------------
+20080304 open-source release
+
+Changes (w.r.t. previous open-source release):
+
+- Task-to-thread affinity support, previously kept under a macro,
+    now fully legalized.
+- Work-in-progress on cache_aligned_allocator improvements.
+- Pipeline really supports parallel input stage; it's no more serialized.
+- Various improvements to code, tests, examples and Makefiles.
+ 
+Bugs fixed:
+
+- 119 - fix for scalable_malloc sometimes failing to return a big block.
+- TR575 - fixed a deadlock occurring on Windows in startup/shutdown
+    under some conditions.
+
+------------------------------------------------------------------------
+20080226 open-source release
+
+Changes (w.r.t. previous open-source release):
+
+- Introduced tbb_allocator to select between standard allocator and
+    tbb::scalable_allocator when available.
+- Removed spin-waiting in pipeline and concurrent_queue.
+- Improved performance of concurrent_hash_map by using tbb_allocator.
+- Improved support for Intel(R) Thread Checker.
+- Various improvements to code, tests, examples and Makefiles.
+
+------------------------------------------------------------------------
+TBB 2.0 U2 commercial-aligned release (package 017, February 14, 2008)
+
+Corresponds to commercial 017 (for Linux*, 018; for Mac OS* X, 016)
+packages.
+
+Changes (w.r.t. TBB 2.0 U1 commercial-aligned release):
+
+- Does not contain open-source-release changes below; this release is
+    only a minor update of TBB 2.0 U1.
+- Add support for Microsoft* Visual Studio* 2008, including binary
+    libraries and VS2008 projects for examples.
+- Use SwitchToThread() not Sleep() to yield threads on Windows*.
+- Enhancements to Doxygen-readable comments in source code.
+- A few more small bug fixes from open-source releases below.
+
+Bugs fixed:
+
+- TR569 - Memory leak in concurrent_queue.
+
+------------------------------------------------------------------------
+20080207 open-source release
+
+Changes (w.r.t. previous open-source release):
+
+- Improvements and minor fixes in VS2008 projects for examples.
+- Improvements in code for gating worker threads that wait for work,
+  previously consolidated under #if IMPROVED_GATING, now legalized.
+- Cosmetic changes in code, examples, tests.
+
+Bugs fixed:
+
+- 113 - Iterators and ranges should be convertible to their const
+    counterparts.
+- TR569 - Memory leak in concurrent_queue.
+
+------------------------------------------------------------------------
+20080122 open-source release
+
+Changes (w.r.t. previous open-source release):
+
+- Updated examples/parallel_for/seismic to improve the visuals and to
+    use the affinity_partitioner (20071127 and forward) for better
+    performance.
+- Minor improvements to unittests and performance tests.
+
+------------------------------------------------------------------------
+20080115 open-source release
+
+Changes (w.r.t. previous open-source release):
+
+- Cleanup, simplifications and enhancements to the Makefiles for
+    building the libraries (see build/index.html for high-level
+    changes) and the examples.
+- Use SwitchToThread() not Sleep() to yield threads on Windows*.
+- Engineering work-in-progress on exception safety/support.
+- Engineering work-in-progress on affinity_partitioner for
+    parallel_reduce.
+- Engineering work-in-progress on improved gating for worker threads
+    (idle workers now block in the OS instead of spinning).
+- Enhancements to Doxygen-readable comments in source code.
+
+Bugs fixed:
+
+- 102 - Support for parallel build with gmake -j
+- 114 - /Wp64 build warning on Windows*.
+
+------------------------------------------------------------------------
+20071218 open-source release
+
+Changes (w.r.t. previous open-source release):
+
+- Full support for Microsoft* Visual Studio* 2008 in open-source.
+    Binaries for vc9/ will be available in future stable releases.
+- New recursive_mutex class.
+- Full support for 32-bit PowerMac including export files for builds.
+- Improvements to parallel_do.
+
+------------------------------------------------------------------------
+20071206 open-source release
+
+Changes (w.r.t. previous open-source release):
+
+- Support for Microsoft* Visual Studio* 2008 in building libraries
+    from source as well as in vc9/ projects for examples.
+- Small fixes to the affinity_partitioner first introduced in 20071127.
+- Small fixes to the thread-stack size hook first introduced in 20071127.
+- Engineering work in progress on concurrent_vector.
+- Engineering work in progress on exception behavior.
+- Unittest improvements.
+
+------------------------------------------------------------------------
+20071127 open-source release
+
+Changes (w.r.t. previous open-source release):
+
+- Task-to-thread affinity support (affinity partitioner) first appears.
+- More work on concurrent_vector.
+- New parallel_do algorithm (function-style version of parallel while)
+    and parallel_do/parallel_preorder example.
+- New task_scheduler_init() hooks for getting default_num_threads() and
+    for setting thread stack size.
+- Support for weak memory consistency models in the code base.
+- Futex usage in the task scheduler (Linux).
+- Started adding 32-bit PowerMac support.
+- Intel(R) 9.1 compilers are now the base supported Intel(R) compiler
+    version.
+- TBB libraries added to link line automatically on Microsoft Windows*
+    systems via #pragma comment linker directives.
+
+Open-source contributions integrated:
+
+- FreeBSD platform support patches.
+- AIX weak memory model patch.
+
+Bugs fixed:
+
+- 108 - Removed broken affinity.h reference.
+- 101 - Does not build on Debian Lenny (replaced arch with uname -m).
+
+------------------------------------------------------------------------
+20071030 open-source release
+
+Changes (w.r.t. previous open-source release):
+
+- More work on concurrent_vector.
+- Better support for building with -Wall -Werror (or not) as desired.
+- A few fixes to eliminate extraneous warnings.
+- Begin introduction of versioning hooks so that the internal/API
+    version is tracked via TBB_INTERFACE_VERSION.  The newest binary
+    libraries should always work with previously-compiled code when-
+    ever possible.
+- Engineering work in progress on using futex inside the mutexes (Linux).
+- Engineering work in progress on exception behavior.
+- Engineering work in progress on a new parallel_do algorithm.
+- Unittest improvements.
+
+------------------------------------------------------------------------
+20070927 open-source release
+
+Changes:
+
+- Minor update to TBB 2.0 U1 below.
+- Begin introduction of new concurrent_vector interfaces not released
+    with TBB 2.0 U1.
+
+------------------------------------------------------------------------
+TBB 2.0 U1 commercial-aligned release (package 014, October 1, 2007)
+
+Corresponds to commercial 014 (for Linux*, 016) packages.
+
+Changes (w.r.t. previous commercial-aligned release):
+
+- All open-source-release changes down to, and including, TBB 2.0 GOLD
+    below, were incorporated into this release.
+- Made a number of changes to the officially supported OS list:
+    Added Linux* OSs:
+	Asianux* 3, Debian* 4.0, Fedora Core* 6, Fedora* 7,
+	Turbo Linux* 11, Ubuntu* 7.04;
+    Dropped Linux* OSs:
+	Asianux* 2, Fedora Core* 4, Haansoft* Linux 2006 Server,
+	Mandriva/Mandrake* 10.1, Miracle Linux* 4.0,
+	Red Flag* DC Server 5.0;
+    Only Mac OS* X 10.4.9 (and forward) and Xcode* tool suite 2.4.1 (and
+	forward) are now supported.
+- Commercial installers on Linux* fixed to recommend the correct
+    binaries to use in more cases, with less unnecessary warnings.
+- Changes to eliminate spurious build warnings.
+
+Open-source contributions integrated:
+
+- Two small header guard macro patches; it also fixed bug #94.
+- New blocked_range3d class.
+
+Bugs fixed:
+
+- 93 - Removed misleading comments in task.h.
+- 94 - See above.
+
+------------------------------------------------------------------------
+20070815 open-source release
+
+Changes:
+
+- Changes to eliminate spurious build warnings.
+- Engineering work in progress on concurrent_vector allocator behavior.
+- Added hooks to use the Intel(R) compiler code coverage tools.
+
+Open-source contributions integrated:
+
+- Mac OS* X build warning patch.
+
+Bugs fixed:
+
+- 88 - Fixed TBB compilation errors if both VS2005 and Windows SDK are
+    installed.
+
+------------------------------------------------------------------------
+20070719 open-source release
+
+Changes:
+
+- Minor update to TBB 2.0 GOLD below.
+- Changes to eliminate spurious build warnings.
+
+------------------------------------------------------------------------
+TBB 2.0 GOLD commercial-aligned release (package 010, July 19, 2007)
+
+Corresponds to commercial 010 (for Linux*, 012) packages.
+
+- TBB open-source debut release.
+
+------------------------------------------------------------------------
+* Other names and brands may be claimed as the property of others.
diff --git a/dep/tbb/COPYING b/dep/tbb/COPYING
new file mode 100644
index 000000000..5af6ed874
--- /dev/null
+++ b/dep/tbb/COPYING
@@ -0,0 +1,353 @@
+		    GNU GENERAL PUBLIC LICENSE
+		       Version 2, June 1991
+
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+			    Preamble
+
+  The licenses for most software are designed to take away your
+freedom to share and change it.  By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software--to make sure the software is free for all its users.  This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it.  (Some other Free Software Foundation software is covered by
+the GNU Lesser General Public License instead.)  You can apply it to
+your programs, too.
+
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+  To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+  For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have.  You must make sure that they, too, receive or can get the
+source code.  And you must show them these terms so they know their
+rights.
+
+  We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+  Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software.  If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+  Finally, any free program is threatened constantly by software
+patents.  We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary.  To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.
+
+		    GNU GENERAL PUBLIC LICENSE
+   TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+  0. This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License.  The "Program", below,
+refers to any such program or work, and a "work based on the Program"
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language.  (Hereinafter, translation is included without limitation in
+the term "modification".)  Each licensee is addressed as "you".
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope.  The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+  1. You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and disclaimer of warranty; keep intact all the
+notices that refer to this License and to the absence of any warranty;
+and give any other recipients of the Program a copy of this License
+along with the Program.
+
+You may charge a fee for the physical act of transferring a copy, and
+you may at your option offer warranty protection in exchange for a fee.
+
+  2. You may modify your copy or copies of the Program or any portion
+of it, thus forming a work based on the Program, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+    a) You must cause the modified files to carry prominent notices
+    stating that you changed the files and the date of any change.
+
+    b) You must cause any work that you distribute or publish, that in
+    whole or in part contains or is derived from the Program or any
+    part thereof, to be licensed as a whole at no charge to all third
+    parties under the terms of this License.
+
+    c) If the modified program normally reads commands interactively
+    when run, you must cause it, when started running for such
+    interactive use in the most ordinary way, to print or display an
+    announcement including an appropriate copyright notice and a
+    notice that there is no warranty (or else, saying that you provide
+    a warranty) and that users may redistribute the program under
+    these conditions, and telling the user how to view a copy of this
+    License.  (Exception: if the Program itself is interactive but
+    does not normally print such an announcement, your work based on
+    the Program is not required to print an announcement.)
+
+These requirements apply to the modified work as a whole.  If
+identifiable sections of that work are not derived from the Program,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works.  But when you
+distribute the same sections as part of a whole which is a work based
+on the Program, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Program.
+
+In addition, mere aggregation of another work not based on the Program
+with the Program (or with a work based on the Program) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+  3. You may copy and distribute the Program (or a work based on it,
+under Section 2) in object code or executable form under the terms of
+Sections 1 and 2 above provided that you also do one of the following:
+
+    a) Accompany it with the complete corresponding machine-readable
+    source code, which must be distributed under the terms of Sections
+    1 and 2 above on a medium customarily used for software interchange; or,
+
+    b) Accompany it with a written offer, valid for at least three
+    years, to give any third party, for a charge no more than your
+    cost of physically performing source distribution, a complete
+    machine-readable copy of the corresponding source code, to be
+    distributed under the terms of Sections 1 and 2 above on a medium
+    customarily used for software interchange; or,
+
+    c) Accompany it with the information you received as to the offer
+    to distribute corresponding source code.  (This alternative is
+    allowed only for noncommercial distribution and only if you
+    received the program in object code or executable form with such
+    an offer, in accord with Subsection b above.)
+
+The source code for a work means the preferred form of the work for
+making modifications to it.  For an executable work, complete source
+code means all the source code for all modules it contains, plus any
+associated interface definition files, plus the scripts used to
+control compilation and installation of the executable.  However, as a
+special exception, the source code distributed need not include
+anything that is normally distributed (in either source or binary
+form) with the major components (compiler, kernel, and so on) of the
+operating system on which the executable runs, unless that component
+itself accompanies the executable.
+
+If distribution of executable or object code is made by offering
+access to copy from a designated place, then offering equivalent
+access to copy the source code from the same place counts as
+distribution of the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+  4. You may not copy, modify, sublicense, or distribute the Program
+except as expressly provided under this License.  Any attempt
+otherwise to copy, modify, sublicense or distribute the Program is
+void, and will automatically terminate your rights under this License.
+However, parties who have received copies, or rights, from you under
+this License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+  5. You are not required to accept this License, since you have not
+signed it.  However, nothing else grants you permission to modify or
+distribute the Program or its derivative works.  These actions are
+prohibited by law if you do not accept this License.  Therefore, by
+modifying or distributing the Program (or any work based on the
+Program), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Program or works based on it.
+
+  6. Each time you redistribute the Program (or any work based on the
+Program), the recipient automatically receives a license from the
+original licensor to copy, distribute or modify the Program subject to
+these terms and conditions.  You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties to
+this License.
+
+  7. If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Program at all.  For example, if a patent
+license would not permit royalty-free redistribution of the Program by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Program.
+
+If any portion of this section is held invalid or unenforceable under
+any particular circumstance, the balance of the section is intended to
+apply and the section as a whole is intended to apply in other
+circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system, which is
+implemented by public license practices.  Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+  8. If the distribution and/or use of the Program is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Program under this License
+may add an explicit geographical distribution limitation excluding
+those countries, so that distribution is permitted only in or among
+countries not thus excluded.  In such case, this License incorporates
+the limitation as if written in the body of this License.
+
+  9. The Free Software Foundation may publish revised and/or new versions
+of the General Public License from time to time.  Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+Each version is given a distinguishing version number.  If the Program
+specifies a version number of this License which applies to it and "any
+later version", you have the option of following the terms and conditions
+either of that version or of any later version published by the Free
+Software Foundation.  If the Program does not specify a version number of
+this License, you may choose any version ever published by the Free Software
+Foundation.
+
+  10. If you wish to incorporate parts of the Program into other free
+programs whose distribution conditions are different, write to the author
+to ask for permission.  For software which is copyrighted by the Free
+Software Foundation, write to the Free Software Foundation; we sometimes
+make exceptions for this.  Our decision will be guided by the two goals
+of preserving the free status of all derivatives of our free software and
+of promoting the sharing and reuse of software generally.
+
+			    NO WARRANTY
+
+  11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW.  EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.  THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU.  SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+  12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+		     END OF TERMS AND CONDITIONS
+
+	    How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+    <one line to give the program's name and a brief idea of what it does.>
+    Copyright (C) <year>  <name of author>
+
+    This program is free software; you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation; either version 2 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License along
+    with this program; if not, write to the Free Software Foundation, Inc.,
+    51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+    Gnomovision version 69, Copyright (C) year name of author
+    Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+    This is free software, and you are welcome to redistribute it
+    under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License.  Of course, the commands you use may
+be called something other than `show w' and `show c'; they could even be
+mouse-clicks or menu items--whatever suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a "copyright disclaimer" for the program, if
+necessary.  Here is a sample; alter the names:
+
+  Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+  `Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+  <signature of Ty Coon>, 1 April 1989
+  Ty Coon, President of Vice
+
+This General Public License does not permit incorporating your program into
+proprietary programs.  If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library.  If this is what you want to do, use the GNU Lesser General
+Public License instead of this License.
+----------------     END OF Gnu General Public License     ----------------
+
+The source code of Threading Building Blocks is distributed under version 2
+of the GNU General Public License, with the so-called "runtime exception,"
+as follows (or see any header or implementation file):
+
+   As a special exception, you may use this file as part of a free software
+   library without restriction.  Specifically, if other files instantiate
+   templates or use macros or inline functions from this file, or you compile
+   this file and link it with other files to produce an executable, this
+   file does not by itself cause the resulting executable to be covered by
+   the GNU General Public License.  This exception does not however
+   invalidate any other reasons why the executable file might be covered by
+   the GNU General Public License.
diff --git a/dep/tbb/Makefile.am b/dep/tbb/Makefile.am
new file mode 100644
index 000000000..98027104a
--- /dev/null
+++ b/dep/tbb/Makefile.am
@@ -0,0 +1,58 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+tbb_root = $(srcdir)
+
+include $(tbb_root)/build/common.inc
+
+# change these
+override work_dir = $(CWD)
+export work_dir
+override tbb_root = $(srcdir)
+export work_dir
+
+.PHONY: all tbb tbbmalloc
+
+#workaround for non-depend targets tbb and tbbmalloc which both depend on version_string.tmp
+#According to documentation submakes should run in parallel
+.NOTPARALLEL: tbb tbbmalloc
+
+all: tbb tbbmalloc
+
+tbb:
+	$(MAKE) -r -f $(tbb_root)/build/Makefile.tbb cfg=release tbb_root=$(tbb_root)
+
+tbbmalloc:
+	$(MAKE) -r -f $(tbb_root)/build/Makefile.tbbmalloc cfg=release malloc tbb_root=$(tbb_root)
+
+install-exec-local:
+	$(INSTALL) $(work_dir)/lib*.so* $(libdir)
+
+clean-local:
+	-rm -f *.d *.o
+	-rm -f lib*.so*
+	-rm -f *.def *.tmp tbbvars.*
+
diff --git a/dep/tbb/README b/dep/tbb/README
new file mode 100644
index 000000000..67ab8ad2e
--- /dev/null
+++ b/dep/tbb/README
@@ -0,0 +1,11 @@
+Threading Building Blocks - README
+
+See index.html for directions and documentation.
+
+If source is present (./Makefile and src/ directories),
+type 'gmake' in this directory to build and test.
+
+See examples/index.html for runnable examples and directions.
+
+See http://threadingbuildingblocks.org for full documentation
+and software information.
diff --git a/dep/tbb/build/FreeBSD.gcc.inc b/dep/tbb/build/FreeBSD.gcc.inc
new file mode 100644
index 000000000..300453525
--- /dev/null
+++ b/dep/tbb/build/FreeBSD.gcc.inc
@@ -0,0 +1,93 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+COMPILE_ONLY = -c -MMD
+PREPROC_ONLY = -E -x c
+INCLUDE_KEY = -I
+DEFINE_KEY = -D
+OUTPUT_KEY = -o #
+OUTPUTOBJ_KEY = -o #
+PIC_KEY = -fPIC
+WARNING_AS_ERROR_KEY = -Werror
+WARNING_KEY = -Wall
+DYLIB_KEY = -shared
+
+TBB_NOSTRICT = 1
+
+CPLUS = g++ 
+CONLY = gcc
+LIB_LINK_FLAGS = -shared
+LIBS = -lpthread 
+C_FLAGS = $(CPLUS_FLAGS)
+
+ifeq ($(cfg), release)
+        CPLUS_FLAGS = -O2 -DUSE_PTHREAD
+endif
+ifeq ($(cfg), debug)
+        CPLUS_FLAGS = -DTBB_USE_DEBUG -g -O0 -DUSE_PTHREAD
+endif
+
+ASM=
+ASM_FLAGS=
+
+TBB_ASM.OBJ=
+
+ifeq (ia64,$(arch))
+# Position-independent code (PIC) is a must on IA-64, even for regular (not shared) executables
+    CPLUS_FLAGS += $(PIC_KEY)
+endif 
+
+ifeq (intel64,$(arch))
+    CPLUS_FLAGS += -m64
+    LIB_LINK_FLAGS += -m64
+endif 
+
+ifeq (ia32,$(arch))
+    CPLUS_FLAGS += -m32
+    LIB_LINK_FLAGS += -m32
+endif 
+
+#------------------------------------------------------------------------------
+# Setting assembler data.
+#------------------------------------------------------------------------------
+ASSEMBLY_SOURCE=$(arch)-gas
+ifeq (ia64,$(arch))
+    ASM=as
+    TBB_ASM.OBJ = atomic_support.o lock_byte.o log2.o pause.o
+endif 
+#------------------------------------------------------------------------------
+# End of setting assembler data.
+#------------------------------------------------------------------------------
+
+#------------------------------------------------------------------------------
+# Setting tbbmalloc data.
+#------------------------------------------------------------------------------
+
+M_CPLUS_FLAGS = $(CPLUS_FLAGS) -fno-rtti -fno-exceptions -fno-schedule-insns2
+
+#------------------------------------------------------------------------------
+# End of setting tbbmalloc data.
+#------------------------------------------------------------------------------
diff --git a/dep/tbb/build/FreeBSD.inc b/dep/tbb/build/FreeBSD.inc
new file mode 100644
index 000000000..82b3daa14
--- /dev/null
+++ b/dep/tbb/build/FreeBSD.inc
@@ -0,0 +1,81 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+ifndef arch
+        ifeq ($(shell uname -m),i386)
+                export arch:=ia32
+        endif
+        ifeq ($(shell uname -m),ia64)
+                export arch:=ia64
+        endif
+        ifeq ($(shell uname -m),amd64)
+                export arch:=intel64
+        endif
+endif
+
+ifndef runtime
+        gcc_version:=$(shell gcc -v 2>&1 | grep 'gcc version' | sed -e 's/^gcc version //' | sed -e 's/ .*$$//')
+        os_version:=$(shell uname -r)
+        os_kernel_version:=$(shell uname -r | sed -e 's/-.*$$//')
+        export runtime:=cc$(gcc_version)_kernel$(os_kernel_version)
+endif
+
+native_compiler := gcc
+export compiler ?= gcc
+debugger ?= gdb
+
+CMD=$(SHELL) -c
+CWD=$(shell pwd)
+RM?=rm -f
+RD?=rmdir
+MD?=mkdir -p
+NUL= /dev/null
+SLASH=/
+MAKE_VERSIONS=sh $(tbb_root)/build/version_info_linux.sh $(CPLUS) $(CPLUS_FLAGS) $(INCLUDES) >version_string.tmp
+MAKE_TBBVARS=sh $(tbb_root)/build/generate_tbbvars.sh
+
+ifdef LD_LIBRARY_PATH
+        export LD_LIBRARY_PATH := .:$(LD_LIBRARY_PATH)
+else
+        export LD_LIBRARY_PATH := .
+endif
+
+####### Build settings ########################################################
+
+OBJ = o
+DLL = so
+
+TBB.DEF = 
+TBB.DLL = libtbb$(DEBUG_SUFFIX).$(DLL)
+TBB.LIB = $(TBB.DLL)
+LINK_TBB.LIB = $(TBB.LIB)
+
+MALLOC.DLL = libtbbmalloc$(DEBUG_SUFFIX).$(DLL)
+MALLOC.LIB = $(MALLOC.DLL)
+
+TBB_NOSTRICT=1
+
+TEST_LAUNCHER=sh $(tbb_root)/build/test_launcher.sh
diff --git a/dep/tbb/build/Makefile.rml b/dep/tbb/build/Makefile.rml
new file mode 100644
index 000000000..1ef95c4fa
--- /dev/null
+++ b/dep/tbb/build/Makefile.rml
@@ -0,0 +1,157 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+tbb_root ?= $(TBB22_INSTALL_DIR)
+BUILDING_PHASE=1
+include $(tbb_root)/build/common.inc
+DEBUG_SUFFIX=$(findstring _debug,_$(cfg))
+
+# default target
+default_rml: rml rml_test
+
+RML_ROOT ?= $(tbb_root)/src/rml
+RML_SERVER_ROOT = $(RML_ROOT)/server
+
+VPATH = $(tbb_root)/src/tbb $(tbb_root)/src/tbb/$(ASSEMBLY_SOURCE)
+VPATH += $(RML_ROOT)/server $(RML_ROOT)/client $(RML_ROOT)/test
+
+include $(tbb_root)/build/common_rules.inc
+
+#--------------------------------------------------------------------------
+# Define rules for making the RML server shared library and client objects.
+#--------------------------------------------------------------------------
+
+# Object files that make up RML server 
+RML_SERVER.OBJ = rml_server.$(OBJ)
+
+# Object files that RML clients need
+RML_TBB_CLIENT.OBJ = rml_tbb.$(OBJ) dynamic_link.$(OBJ)
+RML_OMP_CLIENT.OBJ = rml_omp.$(OBJ) omp_dynamic_link.$(OBJ)
+
+RML.OBJ = $(RML_SERVER.OBJ) $(RML_TBB_CLIENT.OBJ) $(RML_OMP_CLIENT.OBJ)
+ifeq (windows,$(tbb_os))
+RML_ASM.OBJ = $(if $(findstring intel64,$(arch)),$(TBB_ASM.OBJ))
+endif
+ifeq (linux,$(tbb_os))
+RML_ASM.OBJ = $(if $(findstring ia64,$(arch)),$(TBB_ASM.OBJ))
+endif
+
+RML_TBB_DEP= cache_aligned_allocator_rml.$(OBJ) dynamic_link_rml.$(OBJ) concurrent_vector_rml.$(OBJ) tbb_misc_rml.$(OBJ)
+TBB_DEP_NON_RML_TEST= cache_aligned_allocator_rml.$(OBJ) dynamic_link_rml.$(OBJ) $(RML_ASM.OBJ)
+TBB_DEP_RML_TEST= $(RML_ASM.OBJ)
+ifeq ($(cfg),debug)
+RML_TBB_DEP+= spin_mutex_rml.$(OBJ) 
+TBB_DEP_NON_RML_TEST+= tbb_misc_rml.$(OBJ) 
+TBB_DEP_RML_TEST+= tbb_misc_rml.$(OBJ) 
+endif
+LIBS += $(LIBDL)
+
+INCLUDES += $(INCLUDE_KEY)$(RML_ROOT)/include $(INCLUDE_KEY).
+T_INCLUDES = $(INCLUDES) $(INCLUDE_KEY)$(tbb_root)/src/test $(INCLUDE_KEY)$(RML_SERVER_ROOT)
+WARNING_SUPPRESS += $(RML_WARNING_SUPPRESS) 
+
+# Suppress superfluous warnings for RML compilation
+R_CPLUS_FLAGS =  $(subst DO_ITT_NOTIFY,DO_ITT_NOTIFY=0,$(CPLUS_FLAGS_NOSTRICT)) $(WARNING_SUPPRESS) \
+		 $(DEFINE_KEY)TBB_USE_THREADING_TOOLS=0 $(DEFINE_KEY)__TBB_RML_STATIC=1 $(DEFINE_KEY)__TBB_NO_IMPLICIT_LINKAGE=1
+
+%.$(OBJ): %.cpp
+	$(CPLUS) $(COMPILE_ONLY) $(R_CPLUS_FLAGS) $(PIC_KEY) $(INCLUDES) $<
+
+tbb_misc_rml.$(OBJ): version_string.tmp
+
+RML_TEST.OBJ = test_job_automaton.$(OBJ) test_thread_monitor.$(OBJ) test_rml_tbb.$(OBJ) test_rml_omp.$(OBJ) test_rml_mixed.$(OBJ)
+
+$(RML_TBB_DEP): %_rml.$(OBJ): %.cpp
+	$(CPLUS) $(COMPILE_ONLY) $(OUTPUTOBJ_KEY)$@ $(R_CPLUS_FLAGS) $(PIC_KEY) $(INCLUDES) $< 
+
+$(RML_TEST.OBJ): %.$(OBJ): %.cpp
+	$(CPLUS) $(COMPILE_ONLY) $(R_CPLUS_FLAGS) $(PIC_KEY) $(T_INCLUDES) $<
+
+ifneq (,$(RML.DEF))
+rml.def: $(RML.DEF)
+	$(CMD) "$(CPLUS) $(PREPROC_ONLY) $(RML.DEF) $(filter $(DEFINE_KEY)%,$(CPLUS_FLAGS)) >rml.def 2>$(NUL) || exit 0"
+
+LIB_LINK_FLAGS += $(EXPORT_KEY)rml.def
+$(RML.DLL): rml.def
+endif
+
+$(RML.DLL): BUILDING_LIBRARY = $(RML.DLL)
+$(RML.DLL): $(RML_TBB_DEP) $(RML_SERVER.OBJ) $(RML.RES) $(RML_NO_VERSION.DLL) $(RML_ASM.OBJ)
+	$(LIB_LINK_CMD) $(LIB_OUTPUT_KEY)$(RML.DLL) $(RML_SERVER.OBJ) $(RML_TBB_DEP) $(RML_ASM.OBJ) $(RML.RES) $(LIB_LINK_LIBS) $(LIB_LINK_FLAGS)
+
+ifneq (,$(RML_NO_VERSION.DLL))
+$(RML_NO_VERSION.DLL):
+	echo "INPUT ($(RML.DLL))" > $(RML_NO_VERSION.DLL)
+endif
+
+rml: $(RML.DLL) $(RML_TBB_CLIENT.OBJ) $(RML_OMP_CLIENT.OBJ)
+
+#------------------------------------------------------
+# End of rules for making the RML server shared library
+#------------------------------------------------------
+
+#------------------------------------------------------
+# Define rules for making the RML unit tests
+#------------------------------------------------------
+
+add_debug=$(basename $(1))_debug$(suffix $(1))
+cross_suffix=$(if $(crosstest),$(if $(DEBUG_SUFFIX),$(subst _debug,,$(1)),$(call add_debug,$(1))),$(1))
+
+RML_TESTS = test_job_automaton.exe test_thread_monitor.exe test_rml_tbb.exe test_rml_omp.exe test_rml_mixed.exe test_rml_omp_c_linkage.exe
+
+test_rml_tbb.exe: test_rml_tbb.$(OBJ) $(RML_TBB_CLIENT.OBJ) $(TBB_DEP_RML_TEST)
+	$(CPLUS) $(OUTPUT_KEY)$@ $(CPLUS_FLAGS) test_rml_tbb.$(OBJ) $(RML_TBB_CLIENT.OBJ) $(TBB_DEP_RML_TEST) $(LIBS) $(LINK_FLAGS)
+
+test_rml_omp.exe: test_rml_omp.$(OBJ) $(RML_OMP_CLIENT.OBJ) $(TBB_DEP_NON_RML_TEST)
+	$(CPLUS) $(OUTPUT_KEY)$@ $(CPLUS_FLAGS) test_rml_omp.$(OBJ) $(RML_OMP_CLIENT.OBJ) $(TBB_DEP_NON_RML_TEST) $(LIBS) $(LINK_FLAGS) 
+
+test_rml_mixed.exe: test_rml_mixed.$(OBJ) $(RML_TBB_CLIENT.OBJ) $(RML_OMP_CLIENT.OBJ) $(TBB_DEP_RML_TEST)
+	$(CPLUS) $(OUTPUT_KEY)$@ $(CPLUS_FLAGS) test_rml_mixed.$(OBJ) $(RML_TBB_CLIENT.OBJ) $(RML_OMP_CLIENT.OBJ) $(TBB_DEP_RML_TEST) $(LIBS) $(LINK_FLAGS) 
+
+rml_omp_stub.$(OBJ): rml_omp_stub.cpp
+	$(CPLUS) $(COMPILE_ONLY) $(M_CPLUS_FLAGS) $(WARNING_SUPPRESS) $(T_INCLUDES) $(PIC_KEY) $<
+
+test_rml_omp_c_linkage.exe: test_rml_omp_c_linkage.$(OBJ) rml_omp_stub.$(OBJ)
+	$(CONLY) $(C_FLAGS) $(OUTPUT_KEY)$@ test_rml_omp_c_linkage.$(OBJ) rml_omp_stub.$(OBJ)
+
+test_%.exe: test_%.$(OBJ) $(TBB_DEP_NON_RML_TEST)
+	$(CPLUS) $(OUTPUT_KEY)$@ $(CPLUS_FLAGS) $< $(TBB_DEP_NON_RML_TEST) $(LIBS) $(LINK_FLAGS)
+
+### run_cmd is usually empty
+rml_test: $(call cross_suffix,$(RML.DLL)) $(RML_TESTS)
+	$(run_cmd) ./test_job_automaton.exe
+	$(run_cmd) ./test_thread_monitor.exe
+	$(run_cmd) ./test_rml_tbb.exe
+	$(run_cmd) ./test_rml_omp.exe
+	$(run_cmd) ./test_rml_mixed.exe
+	$(run_cmd) ./test_rml_omp_c_linkage.exe
+
+#------------------------------------------------------
+# End of rules for making the TBBMalloc unit tests
+#------------------------------------------------------
+
+# Include automatically generated dependences
+-include *.d
diff --git a/dep/tbb/build/Makefile.tbb b/dep/tbb/build/Makefile.tbb
new file mode 100644
index 000000000..9f7484008
--- /dev/null
+++ b/dep/tbb/build/Makefile.tbb
@@ -0,0 +1,121 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+#------------------------------------------------------------------------------
+# Define rules for making the TBB shared library.
+#------------------------------------------------------------------------------
+
+tbb_root ?= "$(TBB22_INSTALL_DIR)"
+BUILDING_PHASE=1
+include $(tbb_root)/build/common.inc
+DEBUG_SUFFIX=$(findstring _debug,_$(cfg))
+
+#------------------------------------------------------------
+# Define static pattern rules dealing with .cpp source files
+#------------------------------------------------------------
+$(warning CONFIG: cfg=$(cfg) arch=$(arch) compiler=$(compiler) os=$(tbb_os) runtime=$(runtime))
+
+default_tbb: $(TBB.DLL)
+.PHONY: default_tbb tbbvars clean
+.PRECIOUS: %.$(OBJ)
+
+VPATH = $(tbb_root)/src/tbb/$(ASSEMBLY_SOURCE) $(tbb_root)/src/tbb $(tbb_root)/src/old $(tbb_root)/src/rml/client
+
+CPLUS_FLAGS += $(PIC_KEY) $(DEFINE_KEY)__TBB_BUILD=1
+
+ifeq (1,$(TBB_NOSTRICT))
+# GNU 3.2.3 headers have a ISO syntax that is rejected by Intel compiler in -strict_ansi mode.
+# The Mac uses gcc, so the list is empty for that platform.
+# The files below need the -strict_ansi flag downgraded to -ansi to compile
+
+KNOWN_NOSTRICT = concurrent_hash_map.o \
+	concurrent_queue.o    \
+	concurrent_vector_v2.o \
+	concurrent_vector.o
+
+endif
+
+# Object files (that were compiled from C++ code) that gmake up TBB
+TBB_CPLUS.OBJ = concurrent_hash_map.$(OBJ) \
+		concurrent_queue.$(OBJ) \
+		concurrent_vector.$(OBJ) \
+		dynamic_link.$(OBJ) \
+		itt_notify.$(OBJ) \
+		cache_aligned_allocator.$(OBJ) \
+		pipeline.$(OBJ) \
+		queuing_mutex.$(OBJ) \
+		queuing_rw_mutex.$(OBJ) \
+		spin_rw_mutex.$(OBJ) \
+		spin_mutex.$(OBJ) \
+		task.$(OBJ) \
+		tbb_misc.$(OBJ) \
+		mutex.$(OBJ) \
+		recursive_mutex.$(OBJ) \
+		tbb_thread.$(OBJ) \
+		itt_notify_proxy.$(OBJ) \
+		private_server.$(OBJ) \
+		rml_tbb.$(OBJ)
+
+# OLD/Legacy object files for backward binary compatibility
+ifeq (,$(findstring $(DEFINE_KEY)TBB_NO_LEGACY,$(CPLUS_FLAGS)))
+TBB_CPLUS_OLD.OBJ = \
+		concurrent_vector_v2.$(OBJ) \
+		concurrent_queue_v2.$(OBJ) \
+		spin_rw_mutex_v2.$(OBJ)
+endif
+
+# Object files that gmake up TBB (TBB_ASM.OBJ is platform-specific)
+TBB.OBJ = $(TBB_CPLUS.OBJ) $(TBB_CPLUS_OLD.OBJ) $(TBB_ASM.OBJ)
+
+# Suppress superfluous warnings for TBB compilation
+WARNING_KEY += $(WARNING_SUPPRESS)
+
+CXX_WARN_SUPPRESS = $(RML_WARNING_SUPPRESS)
+
+include $(tbb_root)/build/common_rules.inc
+
+ifneq (,$(TBB.DEF))
+tbb.def: $(TBB.DEF)
+	$(CMD) "$(CPLUS) $(PREPROC_ONLY) $(TBB.DEF) $(INCLUDES) $(filter $(DEFINE_KEY)%,$(CPLUS_FLAGS)) >tbb.def 2>$(NUL) || exit 0"
+
+LIB_LINK_FLAGS += $(EXPORT_KEY)tbb.def
+$(TBB.DLL): tbb.def
+endif
+
+$(TBB.DLL): BUILDING_LIBRARY = $(TBB.DLL)
+$(TBB.DLL): $(TBB.OBJ) $(TBB.RES) tbbvars $(TBB_NO_VERSION.DLL)
+	$(LIB_LINK_CMD) $(LIB_OUTPUT_KEY)$(TBB.DLL) $(TBB.OBJ) $(TBB.RES) $(LIB_LINK_LIBS) $(LIB_LINK_FLAGS)
+
+ifneq (,$(TBB_NO_VERSION.DLL))
+$(TBB_NO_VERSION.DLL):
+	echo "INPUT ($(TBB.DLL))" > $(TBB_NO_VERSION.DLL)
+endif
+
+#clean:
+#	$(RM) *.$(OBJ) *.$(DLL) *.res *.map *.ilk *.pdb *.exp *.manifest *.tmp *.d core core.*[0-9][0-9]
+
+# Include automatically generated dependences
+-include *.d
diff --git a/dep/tbb/build/Makefile.tbbmalloc b/dep/tbb/build/Makefile.tbbmalloc
new file mode 100644
index 000000000..a6470f809
--- /dev/null
+++ b/dep/tbb/build/Makefile.tbbmalloc
@@ -0,0 +1,184 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+# default target
+default_malloc: malloc malloc_test
+
+tbb_root ?= $(TBB22_INSTALL_DIR)
+BUILDING_PHASE=1
+TEST_RESOURCE = $(TBB.RES)
+include $(tbb_root)/build/common.inc
+DEBUG_SUFFIX=$(findstring _debug,_$(cfg))
+
+MALLOC_ROOT ?= $(tbb_root)/src/tbbmalloc
+MALLOC_SOURCE_ROOT ?= $(MALLOC_ROOT)
+
+VPATH = $(tbb_root)/src/tbb/$(ASSEMBLY_SOURCE) $(tbb_root)/src/tbb $(tbb_root)/src/test
+VPATH += $(MALLOC_ROOT) $(MALLOC_SOURCE_ROOT)
+
+KNOWN_NOSTRICT = test_ScalableAllocator_STL.$(OBJ) test_malloc_compliance.$(OBJ) test_malloc_overload.$(OBJ)
+
+CPLUS_FLAGS += $(if $(crosstest),$(DEFINE_KEY)__TBBMALLOC_NO_IMPLICIT_LINKAGE=1)
+
+include $(tbb_root)/build/common_rules.inc
+
+#------------------------------------------------------
+# Define rules for making the TBBMalloc shared library.
+#------------------------------------------------------
+
+# Object files that make up TBBMalloc
+MALLOC_CPLUS.OBJ = tbbmalloc.$(OBJ) dynamic_link.$(OBJ)
+MALLOC_CUSTOM.OBJ += tbb_misc_malloc.$(OBJ)
+MALLOC_ASM.OBJ = $(TBB_ASM.OBJ)
+
+# MALLOC_CPLUS.OBJ is built in two steps due to Intel Compiler Tracker # C69574
+MALLOC.OBJ := $(MALLOC_CPLUS.OBJ) $(MALLOC_ASM.OBJ) $(MALLOC_CUSTOM.OBJ) MemoryAllocator.$(OBJ) itt_notify_proxy.$(OBJ)
+MALLOC_CPLUS.OBJ += MemoryAllocator.$(OBJ)
+PROXY.OBJ := proxy.$(OBJ) tbb_function_replacement.$(OBJ)
+M_CPLUS_FLAGS := $(subst $(WARNING_KEY),,$(M_CPLUS_FLAGS)) $(DEFINE_KEY)__TBB_BUILD=1
+M_INCLUDES = $(INCLUDES) $(INCLUDE_KEY)$(MALLOC_ROOT) $(INCLUDE_KEY)$(MALLOC_SOURCE_ROOT)
+
+# Suppress superfluous warnings for TBBmalloc compilation
+$(MALLOC.OBJ): M_CPLUS_FLAGS += $(WARNING_SUPPRESS)
+
+itt_notify_proxy.$(OBJ): C_FLAGS += $(PIC_KEY)
+
+$(PROXY.OBJ): %.$(OBJ): %.cpp
+	$(CPLUS) $(COMPILE_ONLY) $(CPLUS_FLAGS) $(PIC_KEY) $(M_INCLUDES) $<
+
+$(MALLOC_CPLUS.OBJ): %.$(OBJ): %.cpp
+	$(CPLUS) $(COMPILE_ONLY) $(M_CPLUS_FLAGS) $(PIC_KEY) $(M_INCLUDES) $<
+
+tbb_misc_malloc.$(OBJ): tbb_misc.cpp version_string.tmp
+	$(CPLUS) $(COMPILE_ONLY) $(subst -strict_ansi,-ansi,$(M_CPLUS_FLAGS)) $(PIC_KEY) $(OUTPUTOBJ_KEY)$@ $(INCLUDE_KEY). $(INCLUDES) $<
+
+MALLOC_LINK_FLAGS = $(LIB_LINK_FLAGS)
+PROXY_LINK_FLAGS = $(LIB_LINK_FLAGS) 
+
+ifneq (,$(MALLOC.DEF))
+tbbmalloc.def: $(MALLOC.DEF)
+	$(CMD) "$(CPLUS) $(PREPROC_ONLY) $(MALLOC.DEF) $(filter $(DEFINE_KEY)%,$(CPLUS_FLAGS)) >tbbmalloc.def 2>$(NUL) || exit 0"
+
+MALLOC_LINK_FLAGS += $(EXPORT_KEY)tbbmalloc.def
+$(MALLOC.DLL): tbbmalloc.def
+endif
+
+$(MALLOC.DLL): BUILDING_LIBRARY = $(MALLOC.DLL)
+$(MALLOC.DLL): $(MALLOC.OBJ) $(MALLOC.RES) $(MALLOC_NO_VERSION.DLL)
+	$(LIB_LINK_CMD) $(LIB_OUTPUT_KEY)$(MALLOC.DLL) $(MALLOC.OBJ) $(MALLOC.RES) $(LIB_LINK_LIBS) $(MALLOC_LINK_FLAGS)
+
+ifneq (,$(MALLOCPROXY.DEF))
+tbbmallocproxy.def: $(MALLOCPROXY.DEF)
+	$(CMD) "$(CPLUS) $(PREPROC_ONLY) $(MALLOCPROXY.DEF) $(filter $(DEFINE_KEY)%,$(CPLUS_FLAGS)) >tbbmallocproxy.def 2>$(NUL) || exit 0"
+
+PROXY_LINK_FLAGS += $(EXPORT_KEY)tbbmallocproxy.def
+$(MALLOCPROXY.DLL): tbbmallocproxy.def
+endif
+
+ifneq (,$(MALLOCPROXY.DLL))
+$(MALLOCPROXY.DLL): BUILDING_LIBRARY = $(MALLOCPROXY.DLL)
+$(MALLOCPROXY.DLL): $(PROXY.OBJ) $(MALLOCPROXY_NO_VERSION.DLL) $(MALLOC.DLL) $(MALLOC.RES)
+	$(LIB_LINK_CMD) $(LIB_OUTPUT_KEY)$(MALLOCPROXY.DLL) $(PROXY.OBJ) $(MALLOC.RES) $(LIB_LINK_LIBS) $(LINK_MALLOC.LIB) $(PROXY_LINK_FLAGS)
+
+malloc: $(MALLOCPROXY.DLL)
+endif
+
+ifneq (,$(MALLOC_NO_VERSION.DLL))
+$(MALLOC_NO_VERSION.DLL):
+	echo "INPUT ($(MALLOC.DLL))" > $(MALLOC_NO_VERSION.DLL)
+endif
+
+ifneq (,$(MALLOCPROXY_NO_VERSION.DLL))
+$(MALLOCPROXY_NO_VERSION.DLL):
+	echo "INPUT ($(MALLOCPROXY.DLL))" > $(MALLOCPROXY_NO_VERSION.DLL)
+endif
+
+malloc: $(MALLOC.DLL) $(MALLOCPROXY.DLL)
+
+malloc_dll: $(MALLOC.DLL) 
+
+malloc_proxy_dll: $(MALLOCPROXY.DLL)
+
+.PHONY: malloc malloc_dll malloc_proxy_dll
+
+#------------------------------------------------------
+# End of rules for making the TBBMalloc shared library
+#------------------------------------------------------
+
+#------------------------------------------------------
+# Define rules for making the TBBMalloc unit tests
+#------------------------------------------------------
+
+add_debug=$(basename $(1))_debug$(suffix $(1))
+cross_suffix=$(if $(crosstest),$(if $(DEBUG_SUFFIX),$(subst _debug,,$(1)),$(call add_debug,$(1))),$(1))
+
+MALLOC_MAIN_TESTS = test_ScalableAllocator.$(TEST_EXT) test_ScalableAllocator_STL.$(TEST_EXT) test_malloc_compliance.$(TEST_EXT) test_malloc_regression.$(TEST_EXT)
+MALLOC_OVERLOAD_TESTS =  test_malloc_overload.$(TEST_EXT) test_malloc_overload_proxy.$(TEST_EXT)
+
+MALLOC_LIB = $(call cross_suffix,$(MALLOC.LIB))
+MALLOC_PROXY_LIB = $(call cross_suffix,$(MALLOCPROXY.LIB))
+
+ifeq (windows.gcc,$(tbb_os).$(compiler))
+test_malloc_overload.$(TEST_EXT): LIBS += $(MALLOC_PROXY_LIB)
+endif
+
+test_malloc_overload.$(TEST_EXT): test_malloc_overload.$(OBJ)
+	$(CPLUS) $(OUTPUT_KEY)$@ $(CPLUS_FLAGS) $< $(LIBDL) $(LIBS) $(LINK_FLAGS)
+test_malloc_overload_proxy.$(TEST_EXT): test_malloc_overload.$(OBJ) $(MALLOC_PROXY_LIB)
+	$(CPLUS) $(OUTPUT_KEY)$@ $(CPLUS_FLAGS) $< $(LIBDL) $(MALLOC_PROXY_LIB) $(LIBS) $(LINK_FLAGS)
+
+test_malloc_whitebox.$(TEST_EXT): test_malloc_whitebox.cpp $(MALLOC_ASM.OBJ) tbb_misc_malloc.$(OBJ)
+	$(CPLUS) $(OUTPUT_KEY)$@ $(M_CPLUS_FLAGS) $(M_INCLUDES) $^ $(LIBS) $(LINK_FLAGS)
+
+$(MALLOC_MAIN_TESTS): %.$(TEST_EXT): %.$(OBJ) $(MALLOC_LIB)
+	$(CPLUS) $(OUTPUT_KEY)$@ $(CPLUS_FLAGS) $< $(MALLOC_LIB) $(LIBS) $(LINK_FLAGS)
+
+ifeq (,$(NO_C_TESTS))
+MALLOC_C_TESTS = test_malloc_pure_c.$(TEST_EXT)
+
+$(MALLOC_C_TESTS): %.$(TEST_EXT): %.$(OBJ) $(MALLOC_LIB)
+	$(CPLUS) $(OUTPUT_KEY)$@ $(CPLUS_FLAGS) $^ $(LIBS) $(LINK_FLAGS)
+endif
+
+# run_cmd is usually empty
+malloc_test: $(call cross_suffix,$(MALLOC.DLL)) $(MALLOC_MAIN_TESTS) $(MALLOC_C_TESTS) $(MALLOC_OVERLOAD_TESTS) test_malloc_whitebox.$(TEST_EXT) $(AUX_TEST_DEPENDENCIES)
+	$(run_cmd) ./test_malloc_whitebox.$(TEST_EXT) 1:4
+	$(run_cmd) $(TEST_LAUNCHER) -l $(call cross_suffix,$(MALLOCPROXY.DLL)) test_malloc_overload.$(TEST_EXT)
+	$(run_cmd) $(TEST_LAUNCHER) test_malloc_overload_proxy.$(TEST_EXT)
+	$(run_cmd) $(TEST_LAUNCHER) test_malloc_compliance.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_ScalableAllocator.$(TEST_EXT)
+	$(run_cmd) ./test_ScalableAllocator_STL.$(TEST_EXT)
+	$(run_cmd) ./test_malloc_regression.$(TEST_EXT)
+ifeq (,$(NO_C_TESTS))
+	$(run_cmd) ./test_malloc_pure_c.$(TEST_EXT)
+endif
+
+#------------------------------------------------------
+# End of rules for making the TBBMalloc unit tests
+#------------------------------------------------------
+
+# Include automatically generated dependences
+-include *.d
diff --git a/dep/tbb/build/Makefile.test b/dep/tbb/build/Makefile.test
new file mode 100644
index 000000000..8b9c339fe
--- /dev/null
+++ b/dep/tbb/build/Makefile.test
@@ -0,0 +1,310 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+#------------------------------------------------------------------------------
+# Define rules for making the TBB tests.
+#------------------------------------------------------------------------------
+.PHONY: default test_tbb_plain test_tbb_old clean
+
+default: test_tbb_plain test_tbb_old
+
+tbb_root ?= $(TBB22_INSTALL_DIR)
+BUILDING_PHASE=1
+TEST_RESOURCE = $(TBB.RES)
+include $(tbb_root)/build/common.inc
+DEBUG_SUFFIX=$(findstring _debug,$(call cross_cfg,_$(cfg)))
+
+#------------------------------------------------------------
+# Define static pattern rules dealing with .cpp source files
+#------------------------------------------------------------
+
+VPATH = $(tbb_root)/src/tbb/$(ASSEMBLY_SOURCE) $(tbb_root)/src/tbb $(tbb_root)/src/rml/client $(tbb_root)/src/old $(tbb_root)/src/test $(tbb_root)/src/perf
+
+CPLUS_FLAGS += $(if $(crosstest),$(DEFINE_KEY)__TBB_NO_IMPLICIT_LINKAGE=1)
+
+ifeq (1,$(TBB_NOSTRICT))
+# GNU 3.2.3 headers have a ISO syntax that is rejected by Intel compiler in -strict_ansi mode.
+# The Mac uses gcc 4.0, so the list is empty for that platform.
+# The files below need the -strict_ansi flag downgraded to -ansi to compile
+
+KNOWN_NOSTRICT += \
+	test_concurrent_hash_map.o	\
+	test_concurrent_vector.o	\
+	test_concurrent_queue.o	        \
+	test_enumerable_thread_specific.o \
+	test_handle_perror.o		\
+	test_cache_aligned_allocator_STL.o	\
+	test_task_scheduler_init.o	\
+	test_model_plugin.o	\
+	test_parallel_do.o	\
+	test_lambda.o	\
+	test_eh_algorithms.o	\
+	test_parallel_sort.o    \
+	test_parallel_for_each.o	\
+	test_task_group.o	\
+	test_tbb_header.o	\
+	test_combinable.o	\
+	test_tbb_version.o
+
+endif
+
+include $(tbb_root)/build/common_rules.inc
+
+# Rule for generating executable test
+%.$(TEST_EXT): %.$(OBJ) $(TBB.LIB)
+	$(CPLUS) $(OUTPUT_KEY)$@ $(CPLUS_FLAGS) $< $(LINK_TBB.LIB) $(LIBS) $(LINK_FLAGS)
+
+# Rules for generating a test DLL
+%.$(DLL).$(OBJ): %.cpp
+	$(CPLUS) $(COMPILE_ONLY) $(OUTPUTOBJ_KEY)$@ $(CPLUS_FLAGS_NOSTRICT) $(PIC_KEY) $(DEFINE_KEY)_USRDLL $(INCLUDES) $<
+%.$(DLL): %.$(DLL).$(OBJ) $(TBB.LIB)
+	$(CPLUS) $(OUTPUT_KEY)$@ $(CPLUS_FLAGS) $(PIC_KEY) $< $(LINK_TBB.LIB) $(LIBS) $(LINK_FLAGS) $(DYLIB_KEY)
+
+# Rules for the tests, which use TBB in a dynamically loadable library
+test_model_plugin.$(TEST_EXT): test_model_plugin.$(OBJ) test_model_plugin.$(DLL)
+	$(CPLUS) $(OUTPUT_KEY)$@ $(CPLUS_FLAGS) $< $(LIBDL) $(LIBS) $(LINK_FLAGS)
+
+TASK_CPP_DEPENDENCIES = $(TBB_ASM.OBJ) \
+		cache_aligned_allocator.$(OBJ) \
+		dynamic_link.$(OBJ) \
+		tbb_misc.$(OBJ) \
+		tbb_thread.$(OBJ) \
+		itt_notify.$(OBJ) \
+		mutex.$(OBJ) \
+		spin_rw_mutex.$(OBJ) \
+		spin_mutex.$(OBJ) \
+		private_server.$(OBJ) \
+		rml_tbb.$(OBJ)
+
+ifeq (,$(codecov))
+    TASK_CPP_DEPENDENCIES += itt_notify_proxy.$(OBJ)
+endif
+
+# These executables don't depend on the TBB library, but include task.cpp directly
+TASK_CPP_DIRECTLY_INCLUDED = test_eh_tasks.$(TEST_EXT) \
+ test_task_leaks.$(TEST_EXT) \
+ test_task_assertions.$(TEST_EXT) \
+ test_assembly.$(TEST_EXT)
+
+$(TASK_CPP_DIRECTLY_INCLUDED): WARNING_KEY += $(WARNING_SUPPRESS)
+
+$(TASK_CPP_DIRECTLY_INCLUDED): %.$(TEST_EXT) : %.$(OBJ) $(TASK_CPP_DEPENDENCIES)
+	$(CPLUS) $(OUTPUT_KEY)$@ $(CPLUS_FLAGS) $^ $(LIBDL) $(LIBS) $(LINK_FLAGS)
+
+test_handle_perror.$(TEST_EXT): test_handle_perror.$(OBJ) tbb_misc.$(OBJ) $(TBB_ASM.OBJ)
+	$(CPLUS) $(OUTPUT_KEY)$@ $(CPLUS_FLAGS) $^ $(LINK_TBB.LIB) $(LIBS) $(LINK_FLAGS)
+
+test_tbb_header2.$(OBJ): test_tbb_header.cpp
+	$(CPLUS) $(COMPILE_ONLY) $(CPLUS_FLAGS_NOSTRICT) $(CXX_ONLY_FLAGS) $(CXX_WARN_SUPPRESS) $(INCLUDES) $(DEFINE_KEY)__TBB_TEST_SECONDARY=1 $< $(OUTPUTOBJ_KEY)$@
+
+# Detecting "multiple definition" linker error using the test that covers the whole library
+test_tbb_header.$(TEST_EXT): test_tbb_header.$(OBJ) test_tbb_header2.$(OBJ) $(TBB.LIB)
+	$(CPLUS) $(OUTPUT_KEY)$@ $(CPLUS_FLAGS) $^ $(LINK_TBB.LIB) $(LIBS) $(LINK_FLAGS)
+
+# Rules for the tests, which depend on tbbmalloc
+test_concurrent_hash_map_string.$(TEST_EXT): test_concurrent_hash_map_string.$(OBJ)
+	$(CPLUS) $(OUTPUT_KEY)$@ $(CPLUS_FLAGS) $< $(LINK_TBB.LIB) $(MALLOC.LIB) $(LIBS) $(LINK_FLAGS)
+
+# These are in alphabetical order
+TEST_TBB_PLAIN.EXE = test_assembly.$(TEST_EXT)   \
+	test_aligned_space.$(TEST_EXT)               \
+	test_task_assertions.$(TEST_EXT)             \
+	test_atomic.$(TEST_EXT)                      \
+	test_blocked_range.$(TEST_EXT)               \
+	test_blocked_range2d.$(TEST_EXT)             \
+	test_blocked_range3d.$(TEST_EXT)             \
+	test_compiler.$(TEST_EXT)                    \
+	test_concurrent_queue.$(TEST_EXT)            \
+	test_concurrent_vector.$(TEST_EXT)           \
+	test_concurrent_hash_map.$(TEST_EXT)         \
+	test_enumerable_thread_specific.$(TEST_EXT)  \
+	test_handle_perror.$(TEST_EXT)               \
+	test_halt.$(TEST_EXT)                        \
+	test_lambda.$(TEST_EXT)                      \
+	test_model_plugin.$(TEST_EXT)                \
+	test_mutex.$(TEST_EXT)                       \
+	test_mutex_native_threads.$(TEST_EXT)        \
+	test_rwm_upgrade_downgrade.$(TEST_EXT)       \
+	test_cache_aligned_allocator_STL.$(TEST_EXT) \
+	test_cache_aligned_allocator.$(TEST_EXT)     \
+	test_parallel_for.$(TEST_EXT)                \
+	test_parallel_reduce.$(TEST_EXT)             \
+	test_parallel_sort.$(TEST_EXT)               \
+	test_parallel_scan.$(TEST_EXT)               \
+	test_parallel_while.$(TEST_EXT)              \
+	test_parallel_do.$(TEST_EXT)                 \
+	test_pipeline.$(TEST_EXT)                    \
+	test_pipeline_with_tbf.$(TEST_EXT)           \
+	test_task_scheduler_init.$(TEST_EXT)         \
+	test_task_scheduler_observer.$(TEST_EXT)     \
+	test_task.$(TEST_EXT)                        \
+	test_task_leaks.$(TEST_EXT)                  \
+	test_tbb_thread.$(TEST_EXT)                  \
+	test_tick_count.$(TEST_EXT)                  \
+	test_inits_loop.$(TEST_EXT)                  \
+	test_yield.$(TEST_EXT)                       \
+	test_eh_tasks.$(TEST_EXT)                    \
+	test_eh_algorithms.$(TEST_EXT)               \
+	test_parallel_invoke.$(TEST_EXT)             \
+	test_task_group.$(TEST_EXT)                  \
+	test_ittnotify.$(TEST_EXT)                   \
+	test_parallel_for_each.$(TEST_EXT)           \
+	test_tbb_header.$(TEST_EXT)                  \
+	test_combinable.$(TEST_EXT)                  \
+	test_task_auto_init.$(TEST_EXT)              \
+	test_tbb_version.$(TEST_EXT)                 # insert new files right above
+
+ifdef OPENMP_FLAG
+	TEST_TBB_PLAIN.EXE += test_tbb_openmp
+test_openmp.$(TEST_EXT): test_openmp.cpp
+	$(CPLUS) $(OPENMP_FLAG) $(OUTPUT_KEY)$@ $(CPLUS_FLAGS) $(INCLUDES) $< $(LIBS) $(LINK_TBB.LIB) $(LINK_FLAGS)
+.PHONY: test_tbb_openmp
+test_tbb_openmp: test_openmp.$(TEST_EXT)
+	./test_openmp.$(TEST_EXT) 1:4
+
+endif
+
+# Run tests that are in TEST_TBB_PLAIN.EXE
+# The test are ordered so that simpler components are tested first.
+# If a component Y uses component X, then tests for Y should come after tests for X.
+# Note that usually run_cmd is empty, and tests run directly
+test_tbb_plain: $(TEST_TBB_PLAIN.EXE)
+	$(run_cmd) ./test_assembly.$(TEST_EXT)
+	$(run_cmd) ./test_compiler.$(TEST_EXT)
+        # Yes, 4:8 is intended on the next line. 
+	$(run_cmd) ./test_yield.$(TEST_EXT) 4:8
+	$(run_cmd) ./test_handle_perror.$(TEST_EXT)
+	$(run_cmd) ./test_task_auto_init.$(TEST_EXT)
+	$(run_cmd) ./test_task_scheduler_init.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_task_scheduler_observer.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_task_assertions.$(TEST_EXT)
+	$(run_cmd) ./test_task.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_task_leaks.$(TEST_EXT)
+	$(run_cmd) ./test_atomic.$(TEST_EXT)
+	$(run_cmd) ./test_cache_aligned_allocator.$(TEST_EXT)
+	$(run_cmd) ./test_cache_aligned_allocator_STL.$(TEST_EXT)
+	$(run_cmd) ./test_blocked_range.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_blocked_range2d.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_blocked_range3d.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_parallel_for.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_parallel_sort.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_aligned_space.$(TEST_EXT)
+	$(run_cmd) ./test_parallel_reduce.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_parallel_scan.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_parallel_while.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_parallel_do.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_inits_loop.$(TEST_EXT)
+	$(run_cmd) ./test_lambda.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_mutex.$(TEST_EXT) 1
+	$(run_cmd) ./test_mutex.$(TEST_EXT) 2
+	$(run_cmd) ./test_mutex.$(TEST_EXT) 4
+	$(run_cmd) ./test_mutex_native_threads.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_rwm_upgrade_downgrade.$(TEST_EXT) 4
+        # Yes, 4:8 is intended on the next line. 
+	$(run_cmd) ./test_halt.$(TEST_EXT) 4:8
+	$(run_cmd) ./test_pipeline.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_pipeline_with_tbf.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_tick_count.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_concurrent_queue.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_concurrent_vector.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_concurrent_hash_map.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_enumerable_thread_specific.$(TEST_EXT) 0:4
+	$(run_cmd) ./test_combinable.$(TEST_EXT) 0:4
+	$(run_cmd) ./test_model_plugin.$(TEST_EXT) 4
+	$(run_cmd) ./test_eh_tasks.$(TEST_EXT) 2:4
+	$(run_cmd) ./test_eh_algorithms.$(TEST_EXT) 2:4
+	$(run_cmd) ./test_tbb_thread.$(TEST_EXT)
+	$(run_cmd) ./test_parallel_invoke.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_task_group.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_ittnotify.$(TEST_EXT) 2:2
+	$(run_cmd) ./test_parallel_for_each.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_tbb_header.$(TEST_EXT)
+	$(run_cmd) ./test_tbb_version.$(TEST_EXT)
+
+CPLUS_FLAGS_DEPRECATED = $(DEFINE_KEY)TBB_DEPRECATED=1 $(subst $(WARNING_KEY),,$(CPLUS_FLAGS_NOSTRICT)) $(WARNING_SUPPRESS)
+
+TEST_TBB_OLD.OBJ = test_concurrent_vector_v2.$(OBJ) test_concurrent_queue_v2.$(OBJ) test_mutex_v2.$(OBJ)
+
+TEST_TBB_DEPRECATED.OBJ = test_concurrent_queue_deprecated.$(OBJ) \
+	test_concurrent_vector_deprecated.$(OBJ) \
+
+
+# For deprecated files, we don't mind warnings etc., thus compilation rules are most relaxed
+$(TEST_TBB_OLD.OBJ): %.$(OBJ): %.cpp
+	$(CPLUS) $(COMPILE_ONLY) $(CPLUS_FLAGS_DEPRECATED) $(INCLUDES) $<
+
+%_deprecated.$(OBJ): %.cpp
+	$(CPLUS) $(COMPILE_ONLY) $(OUTPUTOBJ_KEY)$@ $(CPLUS_FLAGS_DEPRECATED) $(INCLUDES) $<
+
+TEST_TBB_OLD.EXE = $(subst .$(OBJ),.$(TEST_EXT),$(TEST_TBB_OLD.OBJ) $(TEST_TBB_DEPRECATED.OBJ))
+
+ifeq (,$(NO_LEGACY_TESTS))
+test_tbb_old: $(TEST_TBB_OLD.EXE)
+	$(run_cmd) ./test_concurrent_vector_v2.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_concurrent_vector_deprecated.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_concurrent_queue_v2.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_concurrent_queue_deprecated.$(TEST_EXT) 1:4
+	$(run_cmd) ./test_mutex_v2.$(TEST_EXT) 1
+	$(run_cmd) ./test_mutex_v2.$(TEST_EXT) 2
+	$(run_cmd) ./test_mutex_v2.$(TEST_EXT) 4
+else
+test_tbb_old:
+	@echo Legacy tests skipped
+endif
+
+ifneq (,$(codecov))
+codecov_gen:
+	profmerge
+	codecov $(if $(findstring -,$(codecov)),$(codecov),) -demang -comp $(tbb_root)/build/codecov.txt
+endif
+
+test_% debug_%: test_%.$(TEST_EXT) $(AUX_TEST_DEPENDENCIES)
+ifeq (,$(repeat))
+	$(run_cmd) ./$< $(args)
+else
+ifeq (windows,$(tbb_os))
+	for /L %%i in (1,1,$(repeat)) do echo %%i of $(repeat): && $(run_cmd) $< $(args)
+else
+	for ((i=1;i<=$(repeat);++i)); do echo $$i of $(repeat): && $(run_cmd) ./$< $(args); done
+endif
+endif # repeat
+ifneq (,$(codecov))
+	profmerge
+	codecov $(if $(findstring -,$(codecov)),$(codecov),) -demang -comp $(tbb_root)/build/codecov.txt
+endif
+
+time_%: time_%.$(TEST_EXT) $(AUX_TEST_DEPENDENCIES)
+	$(run_cmd) ./$< $(args)
+
+
+clean_%: 
+	$(RM) $*.$(OBJ) $*.exe $*.$(DLL) $*.$(LIBEXT) $*.res $*.map $*.ilk $*.pdb $*.exp $*.*manifest $*.tmp $*.d
+
+clean:
+	$(RM) *.$(OBJ) *.exe *.$(DLL) *.$(LIBEXT) *.res *.map *.ilk *.pdb *.exp *.manifest *.tmp *.d pgopti.* *.dyn core core.*[0-9][0-9]
+
+# Include automatically generated dependences
+-include *.d
diff --git a/dep/tbb/build/SunOS.gcc.inc b/dep/tbb/build/SunOS.gcc.inc
new file mode 100644
index 000000000..f60073bf3
--- /dev/null
+++ b/dep/tbb/build/SunOS.gcc.inc
@@ -0,0 +1,99 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+COMPILE_ONLY = -c -MMD
+PREPROC_ONLY = -E -x c
+INCLUDE_KEY = -I
+DEFINE_KEY = -D
+OUTPUT_KEY = -o #
+OUTPUTOBJ_KEY = -o #
+PIC_KEY = -fPIC
+WARNING_AS_ERROR_KEY = -Werror
+WARNING_KEY = -Wall
+DYLIB_KEY = -shared
+LIBDL = -ldl
+
+TBB_NOSTRICT = 1
+
+CPLUS = g++ 
+LIB_LINK_FLAGS = -shared
+LIBS = -lpthread -lrt -ldl 
+C_FLAGS = $(CPLUS_FLAGS) -x c
+
+ifeq ($(cfg), release)
+        CPLUS_FLAGS = -O2 -DUSE_PTHREAD
+endif
+ifeq ($(cfg), debug)
+        CPLUS_FLAGS = -DTBB_USE_DEBUG -g -O0 -DUSE_PTHREAD
+endif
+
+ASM=
+ASM_FLAGS=
+
+TBB_ASM.OBJ=
+
+ifeq (ia64,$(arch))
+# Position-independent code (PIC) is a must for IA-64
+    CPLUS_FLAGS += $(PIC_KEY)
+endif 
+
+ifeq (intel64,$(arch))
+    CPLUS_FLAGS += -m64
+    LIB_LINK_FLAGS += -m64
+endif 
+
+ifeq (ia32,$(arch))
+    CPLUS_FLAGS += -m32
+    LIB_LINK_FLAGS += -m32
+endif 
+
+# for some gcc versions on Solaris, -m64 may imply V9, but perhaps not everywhere (TODO: verify)
+ifeq (sparc,$(arch))
+    CPLUS_FLAGS    += -mcpu=v9 -m64
+    LIB_LINK_FLAGS += -mcpu=v9 -m64
+endif 
+
+#------------------------------------------------------------------------------
+# Setting assembler data.
+#------------------------------------------------------------------------------
+ASSEMBLY_SOURCE=$(arch)-gas
+ifeq (ia64,$(arch))
+    ASM=ias
+    TBB_ASM.OBJ = atomic_support.o lock_byte.o log2.o pause.o
+endif 
+#------------------------------------------------------------------------------
+# End of setting assembler data.
+#------------------------------------------------------------------------------
+
+#------------------------------------------------------------------------------
+# Setting tbbmalloc data.
+#------------------------------------------------------------------------------
+
+M_CPLUS_FLAGS = $(CPLUS_FLAGS) -fno-rtti -fno-exceptions -fno-schedule-insns2
+
+#------------------------------------------------------------------------------
+# End of setting tbbmalloc data.
+#------------------------------------------------------------------------------
diff --git a/dep/tbb/build/SunOS.inc b/dep/tbb/build/SunOS.inc
new file mode 100644
index 000000000..a3b378ab7
--- /dev/null
+++ b/dep/tbb/build/SunOS.inc
@@ -0,0 +1,90 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+ifndef arch
+        arch:=$(shell uname -p)
+        ifeq ($(arch),i386)
+            ifeq ($(shell isainfo -b),64)
+                arch:=intel64
+            else
+                arch:=ia32
+            endif
+        endif
+        export arch
+# For non-IA systems running Sun OS, 'arch' will contain whatever is printed by uname -p.
+# In particular, for SPARC architecture it will contain "sparc".
+endif
+
+ifndef runtime
+        gcc_version:=$(shell gcc -v 2>&1 | grep 'gcc version' | sed -e 's/^gcc version //' | sed -e 's/ .*$$//')
+        os_version:=$(shell uname -r)
+        os_kernel_version:=$(shell uname -r | sed -e 's/-.*$$//')
+        export runtime:=cc$(gcc_version)_kernel$(os_kernel_version)
+endif
+
+native_compiler := suncc
+export compiler ?= suncc
+# debugger ?= gdb
+
+CMD=$(SHELL) -c
+CWD=$(shell pwd)
+RM?=rm -f
+RD?=rmdir
+MD?=mkdir -p
+NUL= /dev/null
+SLASH=/
+MAKE_VERSIONS=bash $(tbb_root)/build/version_info_sunos.sh $(CPLUS) $(CPLUS_FLAGS) $(INCLUDES) >version_string.tmp
+MAKE_TBBVARS=bash $(tbb_root)/build/generate_tbbvars.sh
+
+ifeq ($(compiler),suncc)
+        export TBB_CUSTOM_VARS_SH=CXXFLAGS="-I$(CWD)/../include -library=stlport4 $(CXXFLAGS) -M$(CWD)/../build/suncc.map.pause"
+        export TBB_CUSTOM_VARS_CSH=CXXFLAGS "-I$(CWD)/../include -library=stlport4 $(CXXFLAGS) -M$(CWD)/../build/suncc.map.pause"
+endif
+	
+ifdef LD_LIBRARY_PATH
+        export LD_LIBRARY_PATH := .:$(LD_LIBRARY_PATH)
+else
+        export LD_LIBRARY_PATH := .
+endif
+
+####### Build settings ########################################################
+
+OBJ = o
+DLL = so
+
+TBB.DEF = 
+TBB.DLL = libtbb$(DEBUG_SUFFIX).$(DLL)
+TBB.LIB = $(TBB.DLL)
+LINK_TBB.LIB = $(TBB.LIB)
+
+MALLOC.DLL = libtbbmalloc$(DEBUG_SUFFIX).$(DLL)
+MALLOC.LIB = $(MALLOC.DLL)
+
+MALLOCPROXY.DLL = libtbbmalloc_proxy$(DEBUG_SUFFIX).$(DLL)
+
+TBB_NOSTRICT=1
+
+TEST_LAUNCHER=sh $(tbb_root)/build/test_launcher.sh
diff --git a/dep/tbb/build/SunOS.suncc.inc b/dep/tbb/build/SunOS.suncc.inc
new file mode 100644
index 000000000..9aac11756
--- /dev/null
+++ b/dep/tbb/build/SunOS.suncc.inc
@@ -0,0 +1,95 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+COMPILE_ONLY = -c -xMMD -errtags
+PREPROC_ONLY = -E -xMMD
+INCLUDE_KEY = -I
+DEFINE_KEY = -D
+OUTPUT_KEY = -o #
+OUTPUTOBJ_KEY = -o #
+PIC_KEY = -KPIC
+DYLIB_KEY = -G
+LIBDL = -ldl
+# WARNING_AS_ERROR_KEY = -errwarn=%all
+WARNING_AS_ERROR_KEY = Warning as error
+WARNING_SUPPRESS = -erroff=unassigned,attrskipunsup,badargtype2w,badbinaryopw,wbadasg,wvarhidemem
+tbb_strict=0
+
+TBB_NOSTRICT = 1
+
+CPLUS = CC
+CONLY = cc
+LIB_LINK_FLAGS = -G -R . -M$(tbb_root)/build/suncc.map.pause
+LINK_FLAGS += -M$(tbb_root)/build/suncc.map.pause
+LIBS = -lpthread -lrt -R .
+C_FLAGS = $(CPLUS_FLAGS)
+
+ifeq ($(cfg), release)
+        CPLUS_FLAGS = -mt -xO2 -library=stlport4 -DUSE_PTHREAD $(WARNING_SUPPRESS)
+endif
+ifeq ($(cfg), debug)
+        CPLUS_FLAGS = -mt -DTBB_USE_DEBUG -g -library=stlport4 -DUSE_PTHREAD $(WARNING_SUPPRESS)
+endif
+
+ASM=
+ASM_FLAGS=
+
+TBB_ASM.OBJ=
+
+ifeq (intel64,$(arch))
+    CPLUS_FLAGS += -m64
+    ASM_FLAGS += -m64
+    LIB_LINK_FLAGS += -m64
+endif 
+
+ifeq (ia32,$(arch))
+    CPLUS_FLAGS += -m32
+    LIB_LINK_FLAGS += -m32
+endif 
+
+# TODO: verify whether -m64 implies V9 on relevant Sun Studio versions
+#       (those that handle gcc assembler syntax)
+ifeq (sparc,$(arch))
+    CPLUS_FLAGS    += -m64
+    LIB_LINK_FLAGS += -m64
+endif 
+
+#------------------------------------------------------------------------------
+# Setting assembler data.
+#------------------------------------------------------------------------------
+ASSEMBLY_SOURCE=$(arch)-fbe
+#------------------------------------------------------------------------------
+# End of setting assembler data.
+#------------------------------------------------------------------------------
+
+#------------------------------------------------------------------------------
+# Setting tbbmalloc data.
+#------------------------------------------------------------------------------
+M_INCLUDES = $(INCLUDES) -I$(MALLOC_ROOT) -I$(MALLOC_SOURCE_ROOT)
+M_CPLUS_FLAGS = $(CPLUS_FLAGS)
+#------------------------------------------------------------------------------
+# End of setting tbbmalloc data.
+#------------------------------------------------------------------------------
diff --git a/dep/tbb/build/codecov.txt b/dep/tbb/build/codecov.txt
new file mode 100644
index 000000000..e22f8059a
--- /dev/null
+++ b/dep/tbb/build/codecov.txt
@@ -0,0 +1,7 @@
+src/tbb
+src/tbbmalloc
+include/tbb
+src/rml/server
+src/rml/client
+src/rml/include
+source/malloc
diff --git a/dep/tbb/build/common.inc b/dep/tbb/build/common.inc
new file mode 100644
index 000000000..4ccb36ade
--- /dev/null
+++ b/dep/tbb/build/common.inc
@@ -0,0 +1,97 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+ifndef tbb_os
+ ifeq ($(OS), Windows_NT)
+  export tbb_os=windows
+ else
+  OS:=$(shell uname)
+  ifeq ($(OS),)
+   $(error "$(OS) is not supported")
+  else
+   export tbb_os=$(OS)
+   ifeq ($(OS), Linux)
+    export tbb_os=linux
+   endif
+   ifeq ($(OS), Darwin)
+    export tbb_os=macos
+   endif
+  endif # OS successfully detected
+ endif # !Windows
+endif # !tbb_os
+
+ifeq ($(wildcard $(tbb_root)/build/$(tbb_os).inc),)
+  $(error "$(tbb_os)" is not supported. Add build/$(tbb_os).inc file with os-specific settings )
+endif
+
+# detect arch and runtime versions, provide common os-specific definitions
+include $(tbb_root)/build/$(tbb_os).inc
+
+ifeq ($(arch),)
+ $(error Architecture not detected)
+endif
+ifeq ($(runtime),)
+ $(error Runtime version not detected)
+endif
+ifeq ($(wildcard $(tbb_root)/build/$(tbb_os).$(compiler).inc),)
+  $(error Compiler "$(compiler)" is not supported on $(tbb_os). Add build/$(tbb_os).$(compiler).inc file with compiler-specific settings )
+endif
+
+# Support for running debug tests to release library and vice versa
+flip_cfg=$(subst _flipcfg,_release,$(subst _release,_debug,$(subst _debug,_flipcfg,$(1))))
+cross_cfg = $(if $(crosstest),$(call flip_cfg,$(1)),$(1))
+
+ifdef BUILDING_PHASE
+ # Setting default configuration to release
+ cfg?=release
+ # No lambas or other C++0x extensions by default for compilers that implement them as experimental features
+ lambdas ?= 0
+ cpp0x ?= 0
+  # include compiler-specific build configurations
+ -include $(tbb_root)/build/$(tbb_os).$(compiler).inc
+ ifdef extra_inc
+  -include $(tbb_root)/build/$(extra_inc)
+ endif
+endif
+ifneq ($(BUILDING_PHASE),1)
+ # definitions for top-level Makefiles
+ origin_build_dir:=$(origin tbb_build_dir)
+ tbb_build_dir?=$(tbb_root)$(SLASH)build
+ tbb_build_prefix?=$(tbb_os)_$(arch)_$(compiler)_$(runtime)
+ work_dir=$(tbb_build_dir)$(SLASH)$(tbb_build_prefix)
+ ifneq ($(BUILDING_PHASE),0)
+  work_dir:=$(work_dir)
+  # assign new value for tbb_root if path is not absolute (the filter keeps only /* paths)
+  ifeq ($(filter /% $(SLASH)%, $(subst :, ,$(tbb_root)) ),)
+   ifeq ($(origin_build_dir),undefined)
+    override tbb_root:=../..
+   else
+    override tbb_root:=$(CWD)/$(tbb_root)
+   endif
+  endif
+  export tbb_root
+ endif # BUILDING_PHASE != 0
+endif  # BUILDING_PHASE != 1
diff --git a/dep/tbb/build/common_rules.inc b/dep/tbb/build/common_rules.inc
new file mode 100644
index 000000000..5957af5ed
--- /dev/null
+++ b/dep/tbb/build/common_rules.inc
@@ -0,0 +1,125 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+.PRECIOUS: %.$(OBJ) %.$(DLL).$(OBJ) %.exe
+
+ifeq ($(tbb_strict),1)
+  ifeq ($(WARNING_AS_ERROR_KEY),)
+    $(error WARNING_AS_ERROR_KEY is empty)
+  endif
+  # Do not remove line below!
+  WARNING_KEY += $(WARNING_AS_ERROR_KEY)
+endif
+
+ifndef TEST_EXT
+    TEST_EXT = exe
+endif
+
+INCLUDES += $(INCLUDE_KEY)$(tbb_root)/src $(INCLUDE_KEY)$(tbb_root)/src/rml/include $(INCLUDE_KEY)$(tbb_root)/include
+
+CPLUS_FLAGS += $(WARNING_KEY) $(CXXFLAGS)
+LINK_FLAGS += $(LDFLAGS)
+LIB_LINK_FLAGS += $(LDFLAGS)
+CPLUS_FLAGS_NOSTRICT:=$(subst -strict_ansi,-ansi,$(CPLUS_FLAGS))
+
+LIB_LINK_CMD ?= $(CPLUS) $(PIC_KEY)
+ifeq ($(origin LIB_OUTPUT_KEY), undefined)
+    LIB_OUTPUT_KEY = $(OUTPUT_KEY)
+endif
+ifeq ($(origin LIB_LINK_LIBS), undefined)
+    LIB_LINK_LIBS = $(LIBDL) $(LIBS)
+endif
+
+CONLY ?= $(CPLUS)
+
+# The most generic rules
+%.$(OBJ): %.cpp
+	$(CPLUS) $(COMPILE_ONLY) $(CPLUS_FLAGS) $(CXX_ONLY_FLAGS) $(CXX_WARN_SUPPRESS) $(INCLUDES) $<
+
+%.$(OBJ): %.c
+	$(CONLY) $(COMPILE_ONLY) $(C_FLAGS) $(INCLUDES) $<
+
+%.$(OBJ): %.asm
+	$(ASM) $(ASM_FLAGS) $<
+
+%.$(OBJ): %.s
+	cpp <$< | grep -v '^#' >$*.tmp
+	$(ASM) $(ASM_FLAGS) -o $@ $*.tmp
+	rm $*.tmp
+
+# Rule for generating .E file if needed for visual inspection
+%.E: %.cpp
+	$(CPLUS) $(CPLUS_FLAGS) $(CXX_ONLY_FLAGS) $(INCLUDES) $(PREPROC_ONLY) $< >$@
+
+# TODO Rule for generating .asm file if needed for visual inspection
+%.asm: %.cpp
+	$(CPLUS) /c /Fa $(CPLUS_FLAGS) $(CXX_ONLY_FLAGS) $(INCLUDES) $<
+
+# TODO Rule for generating .s file if needed for visual inspection
+%.s: %.cpp
+	$(CPLUS) -S $(CPLUS_FLAGS) $(CXX_ONLY_FLAGS) $(INCLUDES) $<
+
+# Customizations
+
+ifeq (1,$(TBB_NOSTRICT))
+# GNU 3.2.3 headers have a ISO syntax that is rejected by Intel compiler in -strict_ansi mode.
+# The Mac uses gcc, so the list is empty for that platform.
+# The files below need the -strict_ansi flag downgraded to -ansi to compile
+
+$(KNOWN_NOSTRICT): %.$(OBJ): %.cpp
+	$(CPLUS) $(COMPILE_ONLY) $(CPLUS_FLAGS_NOSTRICT) $(CXX_ONLY_FLAGS) $(INCLUDES) $<
+endif
+
+$(KNOWN_WARNINGS): %.$(OBJ): %.cpp
+	$(CPLUS) $(COMPILE_ONLY) $(subst $(WARNING_KEY),,$(CPLUS_FLAGS_NOSTRICT)) $(CXX_ONLY_FLAGS) $(INCLUDES) $<
+
+tbb_misc.$(OBJ): tbb_misc.cpp version_string.tmp
+	$(CPLUS) $(COMPILE_ONLY) $(CPLUS_FLAGS_NOSTRICT) $(CXX_ONLY_FLAGS) $(INCLUDE_KEY). $(INCLUDES) $<
+
+tbb_misc.E: tbb_misc.cpp version_string.tmp
+	$(CPLUS) $(CPLUS_FLAGS_NOSTRICT) $(CXX_ONLY_FLAGS) $(INCLUDE_KEY). $(INCLUDES) $(PREPROC_ONLY) $< >$@
+
+%.res: %.rc version_string.tmp $(TBB.MANIFEST)
+	rc /Fo$@ $(INCLUDES) $(filter /D%,$(CPLUS_FLAGS)) $<
+
+tbbvars:
+	$(MAKE_TBBVARS)
+
+ifneq (,$(TBB.MANIFEST))
+$(TBB.MANIFEST):
+	cmd /C "echo #include ^<stdio.h^> >tbbmanifest.c"
+	cmd /C "echo int main(){return 0;} >>tbbmanifest.c"
+	cl $(C_FLAGS) tbbmanifest.c
+
+version_string.tmp: $(TBB.MANIFEST)
+	$(MAKE_VERSIONS)
+	cmd /C "echo #define TBB_MANIFEST 1 >> version_string.tmp"
+
+else
+version_string.tmp:
+	$(MAKE_VERSIONS)
+endif
+
diff --git a/dep/tbb/build/detect.js b/dep/tbb/build/detect.js
new file mode 100644
index 000000000..b11c95497
--- /dev/null
+++ b/dep/tbb/build/detect.js
@@ -0,0 +1,129 @@
+// Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+//
+// This file is part of Threading Building Blocks.
+//
+// Threading Building Blocks is free software; you can redistribute it
+// and/or modify it under the terms of the GNU General Public License
+// version 2 as published by the Free Software Foundation.
+//
+// Threading Building Blocks is distributed in the hope that it will be
+// useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+// of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with Threading Building Blocks; if not, write to the Free Software
+// Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+//
+// As a special exception, you may use this file as part of a free software
+// library without restriction.  Specifically, if other files instantiate
+// templates or use macros or inline functions from this file, or you compile
+// this file and link it with other files to produce an executable, this
+// file does not by itself cause the resulting executable to be covered by
+// the GNU General Public License.  This exception does not however
+// invalidate any other reasons why the executable file might be covered by
+// the GNU General Public License.
+
+function doWork() {
+		var WshShell = WScript.CreateObject("WScript.Shell");
+
+		var fso = new ActiveXObject("Scripting.FileSystemObject");
+
+		var tmpExec;
+
+		if ( WScript.Arguments.Count() > 1 && WScript.Arguments(1) == "gcc" ) {
+			if ( WScript.Arguments(0) == "/arch" ) {
+				WScript.Echo( "ia32" );
+			}
+			else if ( WScript.Arguments(0) == "/runtime" ) {
+				WScript.Echo( "mingw" );
+			}
+			return;
+		}
+
+		//Compile binary
+		tmpExec = WshShell.Exec("cmd /c echo int main(){return 0;} >detect.c");
+		while ( tmpExec.Status == 0 ) {
+			WScript.Sleep(100);
+		}
+		
+		tmpExec = WshShell.Exec("cl /MD detect.c /link /MAP");
+		while ( tmpExec.Status == 0 ) {
+			WScript.Sleep(100);
+		}
+
+		if ( WScript.Arguments(0) == "/arch" ) {
+			//read compiler banner
+			var clVersion = tmpExec.StdErr.ReadAll();
+			
+			//detect target architecture
+			var intel64=/AMD64|EM64T|x64/mgi;
+			var ia64=/IA-64|Itanium/mgi;
+			var ia32=/80x86/mgi;
+			if ( clVersion.match(intel64) ) {
+				WScript.Echo( "intel64" );
+			} else if ( clVersion.match(ia64) ) {
+				WScript.Echo( "ia64" );
+			} else if ( clVersion.match(ia32) ) {
+				WScript.Echo( "ia32" );
+			} else {
+				WScript.Echo( "unknown" );
+			}
+		}
+
+		if ( WScript.Arguments(0) == "/runtime" ) {
+			//read map-file
+			var map = fso.OpenTextFile("detect.map", 1, 0);
+			var mapContext = map.readAll();
+			map.Close();
+			
+			//detect runtime
+			var vc71=/MSVCR71\.DLL/mgi;
+			var vc80=/MSVCR80\.DLL/mgi;
+			var vc90=/MSVCR90\.DLL/mgi;
+			var vc100=/MSVCR100\.DLL/mgi;
+			var psdk=/MSVCRT\.DLL/mgi;
+			if ( mapContext.match(vc71) ) {
+				WScript.Echo( "vc7.1" );
+			} else if ( mapContext.match(vc80) ) {
+				WScript.Echo( "vc8" );
+			} else if ( mapContext.match(vc90) ) {
+				WScript.Echo( "vc9" );
+			} else if ( mapContext.match(vc100) ) {
+				WScript.Echo( "vc10" );
+			} else if ( mapContext.match(psdk) ) {
+				// Our current naming convention assumes vc7.1 for 64-bit Windows PSDK
+				WScript.Echo( "vc7.1" ); 
+			} else {
+				WScript.Echo( "unknown" );
+			}
+		}
+
+		// delete intermediate files
+		if ( fso.FileExists("detect.c") )
+			fso.DeleteFile ("detect.c", false);
+		if ( fso.FileExists("detect.obj") )
+			fso.DeleteFile ("detect.obj", false);
+		if ( fso.FileExists("detect.map") )
+			fso.DeleteFile ("detect.map", false);
+		if ( fso.FileExists("detect.exe") )
+			fso.DeleteFile ("detect.exe", false);
+		if ( fso.FileExists("detect.exe.manifest") )
+			fso.DeleteFile ("detect.exe.manifest", false);
+}
+
+if ( WScript.Arguments.Count() > 0 ) {
+	
+	try {
+		doWork();
+	} catch( error )
+	{
+		WScript.Echo( "unknown" );
+		WScript.Quit( 0 );
+	}
+
+} else {
+
+	WScript.Echo( "/arch or /runtime should be set" );
+}
+
diff --git a/dep/tbb/build/generate_tbbvars.bat b/dep/tbb/build/generate_tbbvars.bat
new file mode 100644
index 000000000..0a2088589
--- /dev/null
+++ b/dep/tbb/build/generate_tbbvars.bat
@@ -0,0 +1,98 @@
+@echo off
+REM
+REM Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+REM
+REM This file is part of Threading Building Blocks.
+REM
+REM Threading Building Blocks is free software; you can redistribute it
+REM and/or modify it under the terms of the GNU General Public License
+REM version 2 as published by the Free Software Foundation.
+REM
+REM Threading Building Blocks is distributed in the hope that it will be
+REM useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+REM of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+REM GNU General Public License for more details.
+REM
+REM You should have received a copy of the GNU General Public License
+REM along with Threading Building Blocks; if not, write to the Free Software
+REM Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+REM
+REM As a special exception, you may use this file as part of a free software
+REM library without restriction.  Specifically, if other files instantiate
+REM templates or use macros or inline functions from this file, or you compile
+REM this file and link it with other files to produce an executable, this
+REM file does not by itself cause the resulting executable to be covered by
+REM the GNU General Public License.  This exception does not however
+REM invalidate any other reasons why the executable file might be covered by
+REM the GNU General Public License.
+REM
+if exist tbbvars.bat exit
+echo Generating tbbvars.bat
+echo @echo off>tbbvars.bat
+setlocal
+for %%D in ("%tbb_root%") do set actual_root=%%~fD
+if x%1==x goto without
+
+echo SET TBB22_INSTALL_DIR=%actual_root%>>tbbvars.bat
+echo SET TBB_ARCH_PLATFORM=%arch%\%runtime%>>tbbvars.bat
+echo SET INCLUDE=%%TBB22_INSTALL_DIR%%\include;%%INCLUDE%%>>tbbvars.bat
+echo SET LIB=%%TBB22_INSTALL_DIR%%\build\%1;%%LIB%%>>tbbvars.bat
+echo SET PATH=%%TBB22_INSTALL_DIR%%\build\%1;%%PATH%%>>tbbvars.bat
+
+if exist tbbvars.sh goto skipsh
+set fslash_root=%actual_root:\=/%
+echo Generating tbbvars.sh
+echo #!/bin/sh>tbbvars.sh
+echo export TBB22_INSTALL_DIR="%fslash_root%">>tbbvars.sh
+echo TBB_ARCH_PLATFORM="%arch%\%runtime%">>tbbvars.sh
+echo if [ -z "${PATH}" ]; then>>tbbvars.sh
+echo     export PATH="${TBB22_INSTALL_DIR}/build/%1">>tbbvars.sh
+echo else>>tbbvars.sh
+echo     export PATH="${TBB22_INSTALL_DIR}/build/%1;$PATH">>tbbvars.sh
+echo fi>>tbbvars.sh
+echo if [ -z "${LIB}" ]; then>>tbbvars.sh
+echo     export LIB="${TBB22_INSTALL_DIR}/build/%1">>tbbvars.sh
+echo else>>tbbvars.sh
+echo     export LIB="${TBB22_INSTALL_DIR}/build/%1;$LIB">>tbbvars.sh
+echo fi>>tbbvars.sh
+echo if [ -z "${INCLUDE}" ]; then>>tbbvars.sh
+echo     export INCLUDE="${TBB22_INSTALL_DIR}/include">>tbbvars.sh
+echo else>>tbbvars.sh
+echo     export INCLUDE="${TBB22_INSTALL_DIR}/include;$INCLUDE">>tbbvars.sh
+echo fi>>tbbvars.sh
+:skipsh
+
+if exist tbbvars.csh goto skipcsh
+echo Generating tbbvars.csh
+echo #!/bin/csh>tbbvars.csh
+echo setenv TBB22_INSTALL_DIR "%actual_root%">>tbbvars.csh
+echo setenv TBB_ARCH_PLATFORM "%arch%\%runtime%">>tbbvars.csh
+echo if (! $?PATH) then>>tbbvars.csh
+echo     setenv PATH "${TBB22_INSTALL_DIR}\build\%1">>tbbvars.csh
+echo else>>tbbvars.csh
+echo     setenv PATH "${TBB22_INSTALL_DIR}\build\%1;$PATH">>tbbvars.csh
+echo endif>>tbbvars.csh
+echo if (! $?LIB) then>>tbbvars.csh
+echo     setenv LIB "${TBB22_INSTALL_DIR}\build\%1">>tbbvars.csh
+echo else>>tbbvars.csh
+echo     setenv LIB "${TBB22_INSTALL_DIR}\build\%1;$LIB">>tbbvars.csh
+echo endif>>tbbvars.csh
+echo if (! $?INCLUDE) then>>tbbvars.csh
+echo     setenv INCLUDE "${TBB22_INSTALL_DIR}\include">>tbbvars.csh
+echo else>>tbbvars.csh
+echo     setenv INCLUDE "${TBB22_INSTALL_DIR}\include;$INCLUDE">>tbbvars.csh
+echo endif>>tbbvars.csh
+)
+:skipcsh
+exit
+
+:without
+set bin_dir=%CD%
+echo SET tbb_root=%actual_root%>>tbbvars.bat
+echo SET tbb_bin=%bin_dir%>>tbbvars.bat
+echo SET TBB_ARCH_PLATFORM=%arch%\%runtime%>>tbbvars.bat
+echo SET INCLUDE="%%tbb_root%%\include";%%INCLUDE%%>>tbbvars.bat
+echo SET LIB="%%tbb_bin%%";%%LIB%%>>tbbvars.bat
+echo SET PATH="%%tbb_bin%%";%%PATH%%>>tbbvars.bat
+
+endlocal
diff --git a/dep/tbb/build/generate_tbbvars.sh b/dep/tbb/build/generate_tbbvars.sh
new file mode 100644
index 000000000..1e1b02c58
--- /dev/null
+++ b/dep/tbb/build/generate_tbbvars.sh
@@ -0,0 +1,132 @@
+#!/bin/bash
+#
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+# Script used to generate tbbvars.[c]sh scripts
+bin_dir="$PWD"  # 
+cd "$tbb_root"  # keep this comments here
+tbb_root="$PWD" # to make it unsensible
+cd "$bin_dir"   # to EOL encoding
+[ "`uname`" = "Darwin" ] && dll_path="DYLD_LIBRARY_PATH" || dll_path="LD_LIBRARY_PATH" #
+custom_exp="$CXXFLAGS" #
+if [ -z "$TBB_CUSTOM_VARS_SH" ]; then #
+custom_exp_sh="" #
+else #
+custom_exp_sh="export $TBB_CUSTOM_VARS_SH" #
+fi #
+if [ -z "$TBB_CUSTOM_VARS_CSH" ]; then #
+custom_exp_csh="" #
+else #
+custom_exp_csh="setenv $TBB_CUSTOM_VARS_CSH" #
+fi #
+if [ -z "$1" ]; then # custom tbb_build_dir, can't make with TBB_INSTALL_DIR
+[ -f ./tbbvars.sh ] || cat >./tbbvars.sh <<EOF
+#!/bin/bash
+tbb_root="${tbb_root}" #
+tbb_bin="${bin_dir}" #
+if [ -z "\$CPATH" ]; then #
+    export CPATH="\${tbb_root}/include" #
+else #
+    export CPATH="\${tbb_root}/include:\$CPATH" #
+fi #
+if [ -z "\$LIBRARY_PATH" ]; then #
+    export LIBRARY_PATH="\${tbb_bin}" #
+else #
+    export LIBRARY_PATH="\${tbb_bin}:\$LIBRARY_PATH" #
+fi #
+if [ -z "\$${dll_path}" ]; then #
+    export ${dll_path}="\${tbb_bin}" #
+else #
+    export ${dll_path}="\${tbb_bin}:\$${dll_path}" #
+fi #
+${custom_exp_sh} #
+EOF
+[ -f ./tbbvars.csh ] || cat >./tbbvars.csh <<EOF
+#!/bin/csh
+setenv tbb_root "${tbb_root}" #
+setenv tbb_bin "${bin_dir}" #
+if (! \$?CPATH) then #
+    setenv CPATH "\${tbb_root}/include" #
+else #
+    setenv CPATH "\${tbb_root}/include:\$CPATH" #
+endif #
+if (! \$?LIBRARY_PATH) then #
+    setenv LIBRARY_PATH "\${tbb_bin}" #
+else #
+    setenv LIBRARY_PATH "\${tbb_bin}:\$LIBRARY_PATH" #
+endif #
+if (! \$?${dll_path}) then #
+    setenv ${dll_path} "\${tbb_bin}" #
+else #
+    setenv ${dll_path} "\${tbb_bin}:\$${dll_path}" #
+endif #
+${custom_exp_csh} #
+EOF
+else # make with TBB_INSTALL_DIR
+[ -f ./tbbvars.sh ] || cat >./tbbvars.sh <<EOF
+#!/bin/bash
+export TBB22_INSTALL_DIR="${tbb_root}" #
+tbb_bin="\${TBB22_INSTALL_DIR}/build/$1" #
+if [ -z "\$CPATH" ]; then #
+    export CPATH="\${TBB22_INSTALL_DIR}/include" #
+else #
+    export CPATH="\${TBB22_INSTALL_DIR}/include:\$CPATH" #
+fi #
+if [ -z "\$LIBRARY_PATH" ]; then #
+    export LIBRARY_PATH="\${tbb_bin}" #
+else #
+    export LIBRARY_PATH="\${tbb_bin}:\$LIBRARY_PATH" #
+fi #
+if [ -z "\$${dll_path}" ]; then #
+    export ${dll_path}="\${tbb_bin}" #
+else #
+    export ${dll_path}="\${tbb_bin}:\$${dll_path}" #
+fi #
+${custom_exp_sh} #
+EOF
+[ -f ./tbbvars.csh ] || cat >./tbbvars.csh <<EOF
+#!/bin/csh
+setenv TBB22_INSTALL_DIR "${tbb_root}" #
+setenv tbb_bin "\${TBB22_INSTALL_DIR}/build/$1" #
+if (! \$?CPATH) then #
+    setenv CPATH "\${TBB22_INSTALL_DIR}/include" #
+else #
+    setenv CPATH "\${TBB22_INSTALL_DIR}/include:\$CPATH" #
+endif #
+if (! \$?LIBRARY_PATH) then #
+    setenv LIBRARY_PATH "\${tbb_bin}" #
+else #
+    setenv LIBRARY_PATH "\${tbb_bin}:\$LIBRARY_PATH" #
+endif #
+if (! \$?${dll_path}) then #
+    setenv ${dll_path} "\${tbb_bin}" #
+else #
+    setenv ${dll_path} "\${tbb_bin}:\$${dll_path}" #
+endif #
+${custom_exp_csh} #
+EOF
+fi #
diff --git a/dep/tbb/build/index.html b/dep/tbb/build/index.html
new file mode 100644
index 000000000..db6018612
--- /dev/null
+++ b/dep/tbb/build/index.html
@@ -0,0 +1,230 @@
+<HTML>
+<BODY>
+
+<H2>Overview</H2>
+This directory contains the internal Makefile infrastructure for Threading Building Blocks.
+
+<P>
+See below for how to <A HREF=#build>build</A> TBB and how to <A HREF=#port>port</A> TBB
+to a new platform, operating system or architecture.
+</P>
+
+<H2>Files</H2>
+The files here are not intended to be used directly.  See below for usage.
+<DL>
+<DT><A HREF="Makefile.tbb">Makefile.tbb</A>
+<DD>Main Makefile to build the TBB library.
+    Invoked via 'make tbb' from <A HREF=../Makefile>top-level Makefile</A>.
+<DT><A HREF="Makefile.tbbmalloc">Makefile.tbbmalloc</A>
+<DD>Main Makefile to build the TBB scalable memory allocator library as well as its tests.
+    Invoked via 'make tbbmalloc' from <A HREF=../Makefile>top-level Makefile</A>.
+<DT><A HREF="Makefile.test">Makefile.test</A>
+<DD>Main Makefile to build and run the tests for the TBB library.
+    Invoked via 'make test' from <A HREF=../Makefile>top-level Makefile</A>.
+<DT><A HREF="common.inc">common.inc</A>
+<DD>Main common included Makefile that includes OS-specific and compiler-specific Makefiles.
+<DT>&lt;os&gt;.inc
+<DD>OS-specific Makefile for a particular &lt;os&gt;.
+<DT>&lt;os&gt;.&lt;compiler&gt;.inc
+<DD>Compiler-specific Makefile for a particular &lt;os&gt; / &lt;compiler&gt; combination.
+<DT>*.sh
+<DD>Infrastructure utilities for Linux*, Mac OS* X, and UNIX*-related systems.
+<DT>*.js, *.bat
+<DD>Infrastructure utilities for Windows* systems.
+</DL>
+
+<A NAME=build><H2>To Build</H2></A>
+<P>
+To port TBB to a new platform, operating system or architecture, see the <A HREF=#port>porting directions</A> below.
+</P>
+
+<H3>Software prerequisites:</H3>
+<OL>
+<LI>C++ compiler for the platform, operating system and architecture of interest.
+    Either the native compiler for your system, or, optionally, the appropriate Intel&reg; C++ compiler, may be used.
+<LI>GNU make utility. On Windows*, if a UNIX* emulator is used to run GNU make,
+    it should be able to run Windows* utilities and commands. On Linux*, Mac OS* X, etc.,
+    shell commands issued by GNU make should execute in a Bourne or BASH compatible shell.
+</OL>
+
+<P>
+TBB libraries can be built by performing the following steps.
+On systems that support only one ABI (e.g., 32-bit), these steps build the libraries for that ABI.
+On systems that support both 64-bit and 32-bit libraries, these steps build the 64-bit libraries
+(Linux*, Mac OS* X, and related systems) or whichever ABI is selected in the development environment (Windows* systems).
+</P>
+<OL>
+<LI>Change to the <A HREF=../index.html>top-level directory</A> of the installed software.
+<LI>If using the Intel&reg; C++ compiler, make sure the appropriate compiler is available in your PATH
+    (e.g., by sourcing the appropriate iccvars script for the compiler to be used).
+<LI>Invoke GNU make using no arguments, for example, 'gmake'.
+</OL>
+
+<P>
+To build TBB libraries for other than the default ABI (e.g., to build 32-bit libraries on Linux*, Mac OS* X,
+or related systems that support both 64-bit and 32-bit libraries), perform the following steps.
+</P>
+<OL>
+<LI>Change to the <A HREF=../index.html>top-level directory</A> of the installed software.
+<LI>If using the Intel&reg; C++ compiler, make sure the appropriate compiler is available in your PATH
+    (e.g., by sourcing the appropriate iccvars script for the compiler to be used).
+<LI>Invoke GNU make as follows, 'gmake arch=ia32'.
+</OL>
+
+<P>The default make target will build the release and debug versions of the TBB library.</P>
+<P>Other targets are available in the top-level Makefile. You might find the following targets useful:
+<UL>
+<LI>'make test' will build and run TBB <A HREF=../src/test>unit-tests</A>;
+<LI>'make examples' will build and run TBB <A HREF=../examples/index.html>examples</A>;
+<LI>'make all' will do all of the above.
+</UL>
+See also the list of other targets below.
+</P>
+
+<P>
+By default, the libraries will be built in sub-directories within the build/ directory.
+The sub-directories are named according to the operating system, architecture, compiler and software environment used
+(the sub-directory names also distinguish release vs. debug libraries).  On Linux*, the software environment comprises
+the GCC, libc and kernel version used.  On Mac OS* X, the software environment comprises the GCC and OS version used.
+On Windows, the software environment comprises the Microsoft* Visual Studio* version used.
+See below for how to change the default build directory.
+</P>
+
+<P>
+To perform different build and/or test operations, use the following steps.
+</P>
+<OL>
+<LI>Change to the <A HREF=../index.html>top-level directory</A> of the installed software.
+<LI>If using the Intel&reg; C++ compiler, make sure the appropriate compiler is available in your PATH
+    (e.g., by sourcing the appropriate iccvars script for the compiler to be used).
+<LI>Invoke GNU make by using one or more of the following commands.
+    <DL>
+    <DT><TT>make</TT>
+    <DD>Default build.  Equivalent to 'make tbb tbbmalloc'.
+    <DT><TT>make all</TT>
+    <DD>Equivalent to 'make tbb tbbmalloc test examples'.
+    <DT><TT>cd src;make release</TT>
+    <DD>Build and test release libraries only.
+    <DT><TT>cd src;make debug</TT>
+    <DD>Build and test debug libraries only. 
+    <DT><TT>make tbb</TT>
+    <DD>Make TBB release and debug libraries.
+    <DT><TT>make tbbmalloc</TT>
+    <DD>Make TBB scalable memory allocator libraries.
+    <DT><TT>make test</TT>
+    <DD>Compile and run unit-tests
+    <DT><TT>make examples</TT>
+    <DD>Build libraries and run all examples, like doing 'make debug clean release' from
+	<A HREF=../examples/Makefile>the general example Makefile</A>.
+    <DT><TT>make compiler=<B>{</B>icl, icc<B>}</B> <B>[</B>(above options or targets)<B>]</B></TT>
+    <DD>Build and run as above, but use Intel&reg; compilers instead of default, native compilers
+	(e.g., icl instead of cl.exe on Windows* systems, or icc instead of g++ on Linux* or Mac OS* X systems).
+    <DT><TT>make arch=<B>{</B>ia32, intel64, ia64<B>}</B> <B>[</B>(above options or targets)<B>]</B></TT>
+    <DD>Build and run as above, but build libraries for the selected ABI.
+        Might be useful for cross-compilation; ensure proper environment is set before running this command.
+    <DT><TT>make tbb_root=<B>{</B>(TBB directory)<B>}</B> <B>[</B>(above options or targets)<B>]</B></TT>
+    <DD>Build and run as above; for use when invoking 'make' from a directory other than
+	the <A HREF=../index.html>top-level directory</A>.
+    <DT><TT>make tbb_build_dir=<B>{</B>(build directory)<B>}</B> <B>[</B>(above options or targets)<B>]</B></TT>
+    <DD>Build and run as above, but place the built libraries in the specified directory, rather than in the default
+	sub-directory within the build/ directory. This command might have troubles with the build in case the sources 
+	installed to the directory with spaces in the path.
+    <DT><TT>make tbb_build_prefix=<B>{</B>(build sub-directory)<B>}</B> <B>[</B>(above options or targets)<B>]</B></TT>
+    <DD>Build and run as above, but place the built libraries in the specified sub-directory within the build/ directory,
+	rather than using the default sub-directory name.
+    <DT><TT>make <B>[</B>(above options)<B>]</B> clean</TT>
+    <DD>Remove any executables or intermediate files produced by the above commands.
+        Includes build directories, object files, libraries and test executables.
+    </DL>
+</OL>
+
+<A NAME=port><H2>To Port</H2></A>
+<P>
+This section provides information on how to port TBB to a new platform, operating system or architecture.
+A subset or a superset of these steps may be required for porting to a given platform.
+</P>
+
+<H4>To port the TBB source code:</H4>
+<OL>
+<LI>If porting to a new architecture, create a file that describes the architecture-specific details for that architecture.
+    <UL>
+    <LI>Create a &lt;os&gt;_&lt;architecture&gt;.h file in the <A HREF=../include/tbb/machine>include/tbb/machine</A> directory
+	that describes these details.
+	<UL>
+	<LI>The &lt;os&gt;_&lt;architecture&gt;.h is named after the operating system and architecture as recognized by
+	    <A HREF=../include/tbb/tbb_machine.h>include/tbb/tbb_machine.h</A> and the Makefile infrastructure.
+	<LI>This file defines the implementations of synchronization operations, and also the
+	    scheduler yield function, for the operating system and architecture.
+	<LI>Several examples of &lt;os&gt;_&lt;architecture&gt;.h files can be found in the
+	    <A HREF=../include/tbb/machine>include/tbb/machine</A> directory.
+	    <UL>
+	    <LI>A minimal implementation defines the 4-byte and 8-byte compare-and-swap operations,
+		and the scheduler yield function.  See <A HREF=../include/tbb/machine/mac_ppc.h>include/tbb/machine/mac_ppc.h</A>
+		for an example of a minimal implementation.
+	    <LI>More complex implementation examples can also be found in the
+		<A HREF=../include/tbb/machine>include/tbb/machine</A> directory
+		that implement all the individual variants of synchronization operations that TBB uses.
+		Such implementations are more verbose but may achieve better performance on a given architecture.
+	    <LI>In a given implementation, any synchronization operation that is not defined is implemented, by default,
+		in terms of 4-byte or 8-byte compare-and-swap.  More operations can thus be added incrementally to increase
+		the performance of an implementation.
+	    <LI>In most cases, synchronization operations are implemented as inline assembly code; examples also exist,
+		(e.g., for Intel&reg; Itanium&reg; processors) that use out-of-line assembly code in *.s or *.asm files
+		(see the assembly code sub-directories in the <A HREF=../src/tbb>src/tbb</A> directory).
+	    </UL>
+	</UL>
+    <LI>Modify <A HREF=../include/tbb/tbb_machine.h>include/tbb/tbb_machine.h</A>, if needed, to invoke the appropriate
+	&lt;os&gt;_&lt;architecture&gt;.h file in the <A HREF=../include/tbb/machine>include/tbb/machine</A> directory.
+    </UL>
+<LI>Add an implementation of DetectNumberOfWorkers() in <A HREF=../src/tbb/tbb_misc.h>src/tbb/tbb_misc.h</A>,
+    if needed, that returns the number of cores found on the system.  This is used to determine the default
+    number of threads for the TBB task scheduler.
+<LI>Either properly define FillDynamicLinks for use in
+    <A HREF=../src/tbb/cache_aligned_allocator.cpp>src/tbb/cache_aligned_allocator.cpp</A>,
+    or hardcode the allocator to be used.
+<LI>Additional types might be required in the union defined in
+    <A HREF=../include/tbb/aligned_space.h>include/tbb/aligned_space.h</A>
+    to ensure proper alignment on your platform.
+<LI>Changes may be required in <A HREF=../include/tbb/tick_count.h>include/tbb/tick_count.h</A>
+    for systems that do not provide gettimeofday.
+</OL>
+
+<H4>To port the Makefile infrastructure:</H4>
+Modify the appropriate files in the Makefile infrastructure to add a new platform, operating system or architecture as needed.
+See the Makefile infrastructure files for examples.
+<OL>
+<LI>The <A HREF=../Makefile>top-level Makefile</A> includes <A HREF=common.inc>common.inc</A> to determine the operating system.
+    <UL>
+    <LI>To add a new operating system, add the appropriate test to <A HREF=common.inc>common.inc</A>,
+	and create the needed &lt;os&gt;.inc and &lt;os&gt;.&lt;compiler&gt;.inc files (see below).
+    </UL>
+<LI>The &lt;os&gt;.inc file makes OS-specific settings for a particular &lt;os&gt;.
+    <UL>
+    <LI>For example, <A HREF=linux.inc>linux.inc</A> makes settings specific to Linux* systems.
+    <LI>This file performs OS-dependent tests to determine the specific platform and/or architecture,
+	and sets other platform-dependent values.
+    <LI>Add a new &lt;os&gt;.inc file for each new operating system added.
+    </UL>
+<LI>The &lt;os&gt;.&lt;compiler&gt;.inc file makes compiler-specific settings for a particular
+    &lt;os&gt; / &lt;compiler&gt; combination.
+    <UL>
+    <LI>For example, <A HREF=linux.gcc.inc>linux.gcc.inc</A> makes specific settings for using GCC on Linux* systems,
+	and <A HREF=linux.icc.inc>linux.icc.inc</A> makes specific settings for using the Intel&reg; C++ compiler on Linux* systems.
+    <LI>This file sets particular compiler, assembler and linker options required when using a particular
+	&lt;os&gt; / &lt;compiler&gt; combination.
+    <LI>Add a new &lt;os&gt;.&lt;compiler&gt;.inc file for each new &lt;os&gt; / &lt;compiler&gt; combination added.
+    </UL>
+</OL>
+
+<HR>
+<A HREF="../index.html">Up to parent directory</A>
+<P></P>
+Copyright &copy; 2005-2009 Intel Corporation.  All Rights Reserved.
+<P></P>
+Intel, Pentium, Intel Xeon, Itanium, Intel XScale and VTune are 
+registered trademarks or trademarks of Intel Corporation or its 
+subsidiaries in the United States and other countries. 
+<P></P>
+* Other names and brands may be claimed as the property of others.
+</BODY>
+</HTML>
diff --git a/dep/tbb/build/linux.gcc.inc b/dep/tbb/build/linux.gcc.inc
new file mode 100644
index 000000000..05b3b3fff
--- /dev/null
+++ b/dep/tbb/build/linux.gcc.inc
@@ -0,0 +1,107 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+COMPILE_ONLY = -c -MMD
+PREPROC_ONLY = -E -x c
+INCLUDE_KEY = -I
+DEFINE_KEY = -D
+OUTPUT_KEY = -o #
+OUTPUTOBJ_KEY = -o #
+PIC_KEY = -fPIC
+WARNING_AS_ERROR_KEY = -Werror
+WARNING_KEY = -Wall
+WARNING_SUPPRESS = -Wno-parentheses
+RML_WARNING_SUPPRESS = -Wno-non-virtual-dtor
+DYLIB_KEY = -shared
+LIBDL = -ldl
+
+TBB_NOSTRICT = 1
+
+CPLUS = g++ 
+CONLY = gcc
+LIB_LINK_FLAGS = -shared -Wl,-soname=$(BUILDING_LIBRARY)
+LIBS = -lpthread -lrt 
+C_FLAGS = $(CPLUS_FLAGS)
+
+ifeq ($(cfg), release)
+        CPLUS_FLAGS = -DDO_ITT_NOTIFY -O2 -DUSE_PTHREAD
+endif
+ifeq ($(cfg), debug)
+        CPLUS_FLAGS = -DTBB_USE_DEBUG -DDO_ITT_NOTIFY -g -O0 -DUSE_PTHREAD
+endif
+
+ifneq (0,$(cpp0x))
+    CXX_ONLY_FLAGS = -std=c++0x
+endif
+
+ASM=
+ASM_FLAGS=
+
+TBB_ASM.OBJ=
+
+ifeq (ia64,$(arch))
+# Position-independent code (PIC) is a must on IA-64, even for regular (not shared) executables
+    CPLUS_FLAGS += $(PIC_KEY)
+endif 
+
+ifeq (intel64,$(arch))
+    CPLUS_FLAGS += -m64
+    LIB_LINK_FLAGS += -m64
+endif 
+
+ifeq (ia32,$(arch))
+    CPLUS_FLAGS += -m32
+    LIB_LINK_FLAGS += -m32
+endif 
+
+# for some gcc versions on Solaris, -m64 may imply V9, but perhaps not everywhere (TODO: verify)
+ifeq (sparc,$(arch))
+    CPLUS_FLAGS    += -mcpu=v9 -m64
+    LIB_LINK_FLAGS += -mcpu=v9 -m64
+endif 
+
+#------------------------------------------------------------------------------
+# Setting assembler data.
+#------------------------------------------------------------------------------
+ASSEMBLY_SOURCE=$(arch)-gas
+ifeq (ia64,$(arch))
+    ASM=as
+    ASM_FLAGS += -xexplicit
+    TBB_ASM.OBJ = atomic_support.o lock_byte.o log2.o pause.o ia64_misc.o
+endif 
+#------------------------------------------------------------------------------
+# End of setting assembler data.
+#------------------------------------------------------------------------------
+
+#------------------------------------------------------------------------------
+# Setting tbbmalloc data.
+#------------------------------------------------------------------------------
+
+M_CPLUS_FLAGS = $(CPLUS_FLAGS) -fno-rtti -fno-exceptions -fno-schedule-insns2
+
+#------------------------------------------------------------------------------
+# End of setting tbbmalloc data.
+#------------------------------------------------------------------------------
diff --git a/dep/tbb/build/linux.icc.inc b/dep/tbb/build/linux.icc.inc
new file mode 100644
index 000000000..9c368cbaa
--- /dev/null
+++ b/dep/tbb/build/linux.icc.inc
@@ -0,0 +1,98 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+COMPILE_ONLY = -c -MMD
+PREPROC_ONLY = -E -x c
+INCLUDE_KEY = -I
+DEFINE_KEY = -D
+OUTPUT_KEY = -o #
+OUTPUTOBJ_KEY = -o #
+PIC_KEY = -fPIC
+WARNING_AS_ERROR_KEY = -Werror
+WARNING_KEY = -w1
+DYLIB_KEY = -shared
+LIBDL = -ldl
+export COMPILER_VERSION := ICC: $(shell icc -V </dev/null 2>&1 | grep 'Version')
+#TODO: autodetection of arch from COMPILER_VERSION!!
+
+TBB_NOSTRICT = 1
+
+CPLUS = icpc 
+CONLY = icc
+
+ifeq (release,$(cfg))
+CPLUS_FLAGS = -O2 -strict_ansi -DUSE_PTHREAD
+else
+CPLUS_FLAGS = -O0 -g -strict_ansi -DUSE_PTHREAD -DTBB_USE_DEBUG
+endif
+
+ifneq (,$(codecov))
+    CPLUS_FLAGS += -prof-genx
+else
+    CPLUS_FLAGS += -DDO_ITT_NOTIFY
+endif
+
+OPENMP_FLAG = -openmp
+LIB_LINK_FLAGS = -shared -i-static -Wl,-soname=$(BUILDING_LIBRARY)
+LIBS = -lpthread -lrt 
+C_FLAGS = $(CPLUS_FLAGS)
+
+ASM=
+ASM_FLAGS=
+
+TBB_ASM.OBJ=
+
+ifeq (ia64,$(arch))
+# Position-independent code (PIC) is a must on IA-64, even for regular (not shared) executables
+    CPLUS_FLAGS += $(PIC_KEY)
+endif 
+
+ifneq (00,$(lambdas)$(cpp0x))
+	CPLUS_FLAGS += -std=c++0x -D_TBB_CPP0X
+endif
+
+#------------------------------------------------------------------------------
+# Setting assembler data.
+#------------------------------------------------------------------------------
+ASSEMBLY_SOURCE=$(arch)-gas
+ifeq (ia64,$(arch))
+    ASM=ias
+    TBB_ASM.OBJ = atomic_support.o lock_byte.o log2.o pause.o ia64_misc.o
+endif 
+#------------------------------------------------------------------------------
+# End of setting assembler data.
+#------------------------------------------------------------------------------
+
+#------------------------------------------------------------------------------
+# Setting tbbmalloc data.
+#------------------------------------------------------------------------------
+
+M_CPLUS_FLAGS = $(CPLUS_FLAGS) -fno-rtti -fno-exceptions
+
+#------------------------------------------------------------------------------
+# End of setting tbbmalloc data.
+#------------------------------------------------------------------------------
+
diff --git a/dep/tbb/build/linux.inc b/dep/tbb/build/linux.inc
new file mode 100644
index 000000000..d85844501
--- /dev/null
+++ b/dep/tbb/build/linux.inc
@@ -0,0 +1,108 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+ifndef arch
+        uname_m:=$(shell uname -m)
+        ifeq ($(uname_m),i686)
+                export arch:=ia32
+        endif
+        ifeq ($(uname_m),ia64)
+                export arch:=ia64
+        endif
+        ifeq ($(uname_m),x86_64)
+                export arch:=intel64
+        endif
+        ifeq ($(uname_m),sparc64)
+                export arch:=sparc
+        endif
+endif
+
+ifndef runtime
+        #gcc_version:=$(shell gcc -v 2>&1 | grep 'gcc --version' | sed -e 's/^gcc version //' | sed -e 's/ .*$$//')
+        gcc_version_full=$(shell gcc --version | grep 'gcc'| egrep -o ' [0-9]+\.[0-9]+\.[0-9]+.*' | sed -e 's/^\ //')
+        gcc_version=$(shell echo "$(gcc_version_full)" | egrep -o '^[0-9]+\.[0-9]+\.[0-9]+\s*' | head -n 1 | sed -e 's/ *//g')
+        os_version:=$(shell uname -r)
+        os_kernel_version:=$(shell uname -r | sed -e 's/-.*$$//')
+        export os_glibc_version_full:=$(shell getconf GNU_LIBC_VERSION | grep glibc | sed -e 's/^glibc //')
+        os_glibc_version:=$(shell echo "$(os_glibc_version_full)" | sed -e '2,$$d' -e 's/-.*$$//')
+        export runtime:=cc$(gcc_version)_libc$(os_glibc_version)_kernel$(os_kernel_version)
+endif
+
+native_compiler := gcc
+export compiler ?= gcc
+debugger ?= gdb
+
+CMD=sh -c
+CWD=$(shell pwd)
+RM?=rm -f
+RD?=rmdir
+MD?=mkdir -p
+NUL= /dev/null
+SLASH=/
+MAKE_VERSIONS=sh $(tbb_root)/build/version_info_linux.sh $(CPLUS) $(CPLUS_FLAGS) $(INCLUDES) >version_string.tmp
+MAKE_TBBVARS=sh $(tbb_root)/build/generate_tbbvars.sh
+
+ifdef LD_LIBRARY_PATH
+        export LD_LIBRARY_PATH := .:$(LD_LIBRARY_PATH)
+else
+        export LD_LIBRARY_PATH := .
+endif
+
+####### Build settings ########################################################
+
+OBJ = o
+DLL = so
+LIBEXT = so
+SONAME_SUFFIX =$(shell grep TBB_COMPATIBLE_INTERFACE_VERSION $(tbb_root)/include/tbb/tbb_stddef.h | egrep -o [0-9.]+)
+
+def_prefix = $(if $(findstring 32,$(arch)),lin32,$(if $(findstring intel64,$(arch)),lin64,lin64ipf))
+TBB.DEF = $(tbb_root)/src/tbb/$(def_prefix)-tbb-export.def
+
+EXPORT_KEY = -Wl,--version-script,
+TBB.DLL = $(TBB_NO_VERSION.DLL).$(SONAME_SUFFIX)
+TBB.LIB = $(TBB.DLL)
+TBB_NO_VERSION.DLL=libtbb$(DEBUG_SUFFIX).$(DLL)
+LINK_TBB.LIB = $(TBB_NO_VERSION.DLL)
+
+MALLOC_NO_VERSION.DLL = libtbbmalloc$(DEBUG_SUFFIX).$(DLL)
+MALLOC.DEF = $(MALLOC_ROOT)/lin-tbbmalloc-export.def
+MALLOC.DLL = $(MALLOC_NO_VERSION.DLL).$(SONAME_SUFFIX)
+MALLOC.LIB = $(MALLOC_NO_VERSION.DLL)
+LINK_MALLOC.LIB = $(MALLOC_NO_VERSION.DLL)
+
+MALLOCPROXY_NO_VERSION.DLL = libtbbmalloc_proxy$(DEBUG_SUFFIX).$(DLL)
+MALLOCPROXY.DEF = $(MALLOC_ROOT)/$(def_prefix)-proxy-export.def
+MALLOCPROXY.DLL = $(MALLOCPROXY_NO_VERSION.DLL).$(SONAME_SUFFIX)
+MALLOCPROXY.LIB = $(MALLOCPROXY_NO_VERSION.DLL)
+
+RML_NO_VERSION.DLL = libirml$(DEBUG_SUFFIX).$(DLL)
+RML.DEF = $(RML_SERVER_ROOT)/lin-rml-export.def
+RML.DLL = $(RML_NO_VERSION.DLL).1
+RML.LIB = $(RML_NO_VERSION.DLL)
+
+TBB_NOSTRICT=1
+
+TEST_LAUNCHER=sh $(tbb_root)/build/test_launcher.sh
diff --git a/dep/tbb/build/macos.gcc.inc b/dep/tbb/build/macos.gcc.inc
new file mode 100644
index 000000000..14a90162c
--- /dev/null
+++ b/dep/tbb/build/macos.gcc.inc
@@ -0,0 +1,89 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+CPLUS = g++
+CONLY = gcc
+COMPILE_ONLY = -c -MMD
+PREPROC_ONLY = -E -x c
+INCLUDE_KEY = -I
+DEFINE_KEY = -D
+OUTPUT_KEY = -o #
+OUTPUTOBJ_KEY = -o #
+PIC_KEY = -fPIC
+WARNING_AS_ERROR_KEY = -Werror
+WARNING_KEY = -Wall
+WARNING_SUPPRESS =
+DYLIB_KEY = -dynamiclib
+EXPORT_KEY = -Wl,-exported_symbols_list,
+LIBDL = -ldl
+
+LIBS = -lpthread
+LINK_FLAGS = 
+LIB_LINK_FLAGS = -dynamiclib
+C_FLAGS = $(CPLUS_FLAGS)
+
+ifeq ($(cfg), release)
+    CPLUS_FLAGS = -O2
+else
+    CPLUS_FLAGS = -g -O0 -DTBB_USE_DEBUG
+endif
+
+CPLUS_FLAGS += -DUSE_PTHREAD
+
+ifeq (intel64,$(arch))
+    CPLUS_FLAGS += -m64
+    LINK_FLAGS += -m64
+    LIB_LINK_FLAGS += -m64
+endif
+
+ifeq (ia32,$(arch))
+    CPLUS_FLAGS += -m32
+    LINK_FLAGS += -m32
+    LIB_LINK_FLAGS += -m32
+endif
+
+ifeq (ppc64,$(arch))
+    CPLUS_FLAGS += -arch ppc64
+    LINK_FLAGS += -arch ppc64
+    LIB_LINK_FLAGS += -arch ppc64
+endif
+
+ifeq (ppc,$(arch))
+    CPLUS_FLAGS += -arch ppc
+    LINK_FLAGS += -arch ppc
+    LIB_LINK_FLAGS += -arch ppc
+endif
+
+#------------------------------------------------------------------------------
+# Setting tbbmalloc data.
+#------------------------------------------------------------------------------
+
+M_CPLUS_FLAGS = $(CPLUS_FLAGS) -fno-rtti -fno-exceptions -fno-schedule-insns2
+
+#------------------------------------------------------------------------------
+# End of setting tbbmalloc data.
+#------------------------------------------------------------------------------
+
diff --git a/dep/tbb/build/macos.icc.inc b/dep/tbb/build/macos.icc.inc
new file mode 100644
index 000000000..7507ec07c
--- /dev/null
+++ b/dep/tbb/build/macos.icc.inc
@@ -0,0 +1,75 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+CPLUS = icpc
+CONLY = icc
+COMPILE_ONLY = -c -MMD
+PREPROC_ONLY = -E -x c
+INCLUDE_KEY = -I
+DEFINE_KEY = -D
+OUTPUT_KEY = -o #
+OUTPUTOBJ_KEY = -o #
+PIC_KEY = -fPIC
+WARNING_AS_ERROR_KEY = -Werror
+WARNING_KEY = -w1
+DYLIB_KEY = -dynamiclib
+EXPORT_KEY = -Wl,-exported_symbols_list,
+LIBDL = -ldl
+export COMPILER_VERSION := $(shell icc -V </dev/null 2>&1 | grep 'Version')
+#TODO: autodetection of arch from COMPILER_VERSION!!
+
+OPENMP_FLAG = -openmp
+LIBS = -lpthread
+LINK_FLAGS = 
+LIB_LINK_FLAGS = -dynamiclib -i-static
+C_FLAGS = $(CPLUS_FLAGS)
+
+ifeq ($(cfg), release)
+    CPLUS_FLAGS = -O2 -fno-omit-frame-pointer
+else
+    CPLUS_FLAGS = -g -O0 -DTBB_USE_DEBUG
+endif
+
+CPLUS_FLAGS += -DUSE_PTHREAD
+
+ifneq (,$(codecov))
+    CPLUS_FLAGS += -prof-genx
+endif
+
+ifneq (00,$(lambdas)$(cpp0x))
+	CPLUS_FLAGS += -std=c++0x -D_TBB_CPP0X
+endif
+
+
+#------------------------------------------------------------------------------
+# Setting tbbmalloc data.
+#------------------------------------------------------------------------------
+
+M_CPLUS_FLAGS = $(CPLUS_FLAGS) -fno-rtti -fno-exceptions
+
+#------------------------------------------------------------------------------
+# End of setting tbbmalloc data.
+#------------------------------------------------------------------------------
diff --git a/dep/tbb/build/macos.inc b/dep/tbb/build/macos.inc
new file mode 100644
index 000000000..4e2f4dbcf
--- /dev/null
+++ b/dep/tbb/build/macos.inc
@@ -0,0 +1,85 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+####### Detections and Commands ###############################################
+ifndef arch
+ ifeq ($(shell /usr/sbin/sysctl -n hw.machine),Power Macintosh)
+   ifeq ($(shell /usr/sbin/sysctl -n hw.optional.64bitops),1)
+     export arch:=ppc64
+   else
+     export arch:=ppc32
+   endif
+ else
+   ifeq ($(shell /usr/sbin/sysctl -n hw.optional.x86_64 2>/dev/null),1)
+     export arch:=intel64
+   else
+     export arch:=ia32
+   endif
+ endif
+endif
+
+ifndef runtime
+ #gcc_version:=$(shell gcc -v 2>&1 | grep 'gcc version' | sed -e 's/^gcc version //' | sed -e 's/ .*$$//' )
+ gcc_version_full=$(shell gcc --version | grep 'gcc'| egrep -o ' [0-9]+\.[0-9]+\.[0-9]+.*' | sed -e 's/^\ //')
+ gcc_version=$(shell echo "$(gcc_version_full)" | egrep -o '^[0-9]+\.[0-9]+\.[0-9]+\s*' | head -n 1 | sed -e 's/ *//g')
+ os_version:=$(shell /usr/bin/sw_vers -productVersion)
+ export runtime:=cc$(gcc_version)_os$(os_version)
+endif
+
+native_compiler := gcc
+export compiler ?= gcc
+debugger ?= gdb
+
+CMD=$(SHELL) -c
+CWD=$(shell pwd)
+RM?=rm -f
+RD?=rmdir
+MD?=mkdir -p
+NUL= /dev/null
+SLASH=/
+MAKE_VERSIONS=sh $(tbb_root)/build/version_info_macos.sh $(CPLUS) $(CPLUS_FLAGS) $(INCLUDES) >version_string.tmp
+MAKE_TBBVARS=sh $(tbb_root)/build/generate_tbbvars.sh
+
+####### Build settings ########################################################
+
+OBJ=o
+DLL=dylib
+LIBEXT=dylib
+
+def_prefix = $(if $(findstring 32,$(arch)),mac32,mac64)
+
+TBB.DEF = $(tbb_root)/src/tbb/$(def_prefix)-tbb-export.def
+TBB.DLL = libtbb$(DEBUG_SUFFIX).$(DLL)
+TBB.LIB = $(TBB.DLL)
+LINK_TBB.LIB = $(TBB.LIB)
+
+MALLOC.DEF = $(MALLOC_ROOT)/$(def_prefix)-tbbmalloc-export.def
+MALLOC.DLL = libtbbmalloc$(DEBUG_SUFFIX).$(DLL)
+MALLOC.LIB = $(MALLOC.DLL)
+
+TBB_NOSTRICT=1
+
+TEST_LAUNCHER=sh $(tbb_root)/build/test_launcher.sh
diff --git a/dep/tbb/build/suncc.map.pause b/dep/tbb/build/suncc.map.pause
new file mode 100644
index 000000000..a92d08eb1
--- /dev/null
+++ b/dep/tbb/build/suncc.map.pause
@@ -0,0 +1 @@
+hwcap_1 = OVERRIDE;
\ No newline at end of file
diff --git a/dep/tbb/build/test_launcher.bat b/dep/tbb/build/test_launcher.bat
new file mode 100644
index 000000000..bc52a4414
--- /dev/null
+++ b/dep/tbb/build/test_launcher.bat
@@ -0,0 +1,36 @@
+@echo off
+REM
+REM Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+REM
+REM This file is part of Threading Building Blocks.
+REM
+REM Threading Building Blocks is free software; you can redistribute it
+REM and/or modify it under the terms of the GNU General Public License
+REM version 2 as published by the Free Software Foundation.
+REM
+REM Threading Building Blocks is distributed in the hope that it will be
+REM useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+REM of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+REM GNU General Public License for more details.
+REM
+REM You should have received a copy of the GNU General Public License
+REM along with Threading Building Blocks; if not, write to the Free Software
+REM Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+REM
+REM As a special exception, you may use this file as part of a free software
+REM library without restriction.  Specifically, if other files instantiate
+REM templates or use macros or inline functions from this file, or you compile
+REM this file and link it with other files to produce an executable, this
+REM file does not by itself cause the resulting executable to be covered by
+REM the GNU General Public License.  This exception does not however
+REM invalidate any other reasons why the executable file might be covered by
+REM the GNU General Public License.
+REM
+
+REM no LD_PRELOAD under Windows
+if "%1"=="-l" (
+    echo skip
+    exit
+)
+
+%*
diff --git a/dep/tbb/build/test_launcher.sh b/dep/tbb/build/test_launcher.sh
new file mode 100644
index 000000000..0f691ba7c
--- /dev/null
+++ b/dep/tbb/build/test_launcher.sh
@@ -0,0 +1,42 @@
+#!/bin/sh
+#
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+while getopts  "l:" flag #
+do #
+    if [ `uname` != 'Linux' ] ; then #
+        echo 'skip' #
+        exit #
+    fi #
+    LD_PRELOAD=$OPTARG #
+    shift `expr $OPTIND - 1` #
+done #
+# Set stack limit
+ulimit -s 10240 # 
+# Run the command line passed via parameters
+export LD_PRELOAD #
+./$* # 
diff --git a/dep/tbb/build/version_info_linux.sh b/dep/tbb/build/version_info_linux.sh
new file mode 100644
index 000000000..87d75516e
--- /dev/null
+++ b/dep/tbb/build/version_info_linux.sh
@@ -0,0 +1,42 @@
+#!/bin/sh
+#
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+# Script used to generate version info string
+echo "#define __TBB_VERSION_STRINGS \\"
+echo '"TBB:' "BUILD_HOST\t\t"`hostname -s`" ("`uname -m`")"'" ENDL \'
+# find OS name in *-release and issue* files by filtering blank lines and lsb-release content out
+echo '"TBB:' "BUILD_OS\t\t"`lsb_release -sd 2>/dev/null | grep -ih '[a-z] ' - /etc/*release /etc/issue 2>/dev/null | head -1 | sed -e 's/["\\\\]//g'`'" ENDL \'
+echo '"TBB:' "BUILD_KERNEL\t"`uname -srv`'" ENDL \'
+echo '"TBB:' "BUILD_GCC\t\t"`g++ -v </dev/null 2>&1 | grep 'gcc.*version'`'" ENDL \'
+[ -z "$COMPILER_VERSION" ] || echo '"TBB:' "BUILD_COMPILER\t"$COMPILER_VERSION'" ENDL \'
+echo '"TBB:' "BUILD_GLIBC\t"`getconf GNU_LIBC_VERSION | grep glibc | sed -e 's/^glibc //'`'" ENDL \'
+echo '"TBB:' "BUILD_LD\t\t"`ld -v 2>&1 | grep 'version'`'" ENDL \'
+echo '"TBB:' "BUILD_TARGET\t$arch on $runtime"'" ENDL \'
+echo '"TBB:' "BUILD_COMMAND\t"$*'" ENDL \'
+echo ""
+echo "#define __TBB_DATETIME \""`date -u`"\""
diff --git a/dep/tbb/build/version_info_macos.sh b/dep/tbb/build/version_info_macos.sh
new file mode 100644
index 000000000..d6a40afbb
--- /dev/null
+++ b/dep/tbb/build/version_info_macos.sh
@@ -0,0 +1,39 @@
+#!/bin/sh
+#
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+# Script used to generate version info string
+echo "#define __TBB_VERSION_STRINGS \\"
+echo '"TBB:' "BUILD_HOST\t\t"`hostname -s`" ("`arch`")"'" ENDL \'
+echo '"TBB:' "BUILD_OS\t\t"`sw_vers -productName`" version "`sw_vers -productVersion`'" ENDL \'
+echo '"TBB:' "BUILD_KERNEL\t"`uname -v`'" ENDL \'
+echo '"TBB:' "BUILD_GCC\t\t"`gcc -v </dev/null 2>&1 | grep 'version'`'" ENDL \'
+[ -z "$COMPILER_VERSION" ] || echo '"TBB:' "BUILD_COMPILER\t"$COMPILER_VERSION'" ENDL \'
+echo '"TBB:' "BUILD_TARGET\t$arch on $runtime"'" ENDL \'
+echo '"TBB:' "BUILD_COMMAND\t"$*'" ENDL \'
+echo ""
+echo "#define __TBB_DATETIME \""`date -u`"\""
diff --git a/dep/tbb/build/version_info_sunos.sh b/dep/tbb/build/version_info_sunos.sh
new file mode 100644
index 000000000..16341165a
--- /dev/null
+++ b/dep/tbb/build/version_info_sunos.sh
@@ -0,0 +1,39 @@
+#!/bin/sh
+#
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+# Script used to generate version info string
+echo "#define __TBB_VERSION_STRINGS \\"
+echo '"TBB:' "BUILD_HOST\t"`hostname`" ("`arch`")"'" ENDL \'
+echo '"TBB:' "BUILD_OS\t\t"`uname`'" ENDL \'
+echo '"TBB:' "BUILD_KERNEL\t"`uname -srv`'" ENDL \'
+echo '"TBB:' "BUILD_SUNCC\t"`CC -V </dev/null 2>&1 | grep 'C++'`'" ENDL \'
+[ -z "$COMPILER_VERSION" ] || echo '"TBB: ' "BUILD_COMPILER\t"$COMPILER_VERSION'" ENDL \'
+echo '"TBB:' "BUILD_TARGET\t$arch on $runtime"'" ENDL \'
+echo '"TBB:' "BUILD_COMMAND\t"$*'" ENDL \'
+echo ""
+echo "#define __TBB_DATETIME \""`date -u`"\""
diff --git a/dep/tbb/build/version_info_windows.js b/dep/tbb/build/version_info_windows.js
new file mode 100644
index 000000000..1d1efb9f8
--- /dev/null
+++ b/dep/tbb/build/version_info_windows.js
@@ -0,0 +1,136 @@
+// Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+//
+// This file is part of Threading Building Blocks.
+//
+// Threading Building Blocks is free software; you can redistribute it
+// and/or modify it under the terms of the GNU General Public License
+// version 2 as published by the Free Software Foundation.
+//
+// Threading Building Blocks is distributed in the hope that it will be
+// useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+// of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with Threading Building Blocks; if not, write to the Free Software
+// Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+//
+// As a special exception, you may use this file as part of a free software
+// library without restriction.  Specifically, if other files instantiate
+// templates or use macros or inline functions from this file, or you compile
+// this file and link it with other files to produce an executable, this
+// file does not by itself cause the resulting executable to be covered by
+// the GNU General Public License.  This exception does not however
+// invalidate any other reasons why the executable file might be covered by
+// the GNU General Public License.
+
+var WshShell = WScript.CreateObject("WScript.Shell");
+
+var tmpExec;
+
+WScript.Echo("#define __TBB_VERSION_STRINGS \\");
+
+//Getting BUILD_HOST
+WScript.echo( "\"TBB: BUILD_HOST\\t\\t" + 
+              WshShell.ExpandEnvironmentStrings("%COMPUTERNAME%") +
+              "\" ENDL \\" );
+
+//Getting BUILD_OS
+tmpExec = WshShell.Exec("cmd /c ver");
+while ( tmpExec.Status == 0 ) {
+    WScript.Sleep(100);
+}
+tmpExec.StdOut.ReadLine();
+
+WScript.echo( "\"TBB: BUILD_OS\\t\\t" + 
+              tmpExec.StdOut.ReadLine() +
+              "\" ENDL \\" );
+
+if ( WScript.Arguments(0).toLowerCase().match("gcc") ) {
+    tmpExec = WshShell.Exec("gcc --version");
+    WScript.echo( "\"TBB: BUILD_COMPILER\\t" + 
+                  tmpExec.StdOut.ReadLine() + 
+                  "\" ENDL \\" );
+
+} else { // MS / Intel compilers
+    //Getting BUILD_CL
+    tmpExec = WshShell.Exec("cmd /c echo #define 0 0>empty.cpp");
+    tmpExec = WshShell.Exec("cl -c empty.cpp ");
+    while ( tmpExec.Status == 0 ) {
+        WScript.Sleep(100);
+    }
+    var clVersion = tmpExec.StdErr.ReadLine();
+    WScript.echo( "\"TBB: BUILD_CL\\t\\t" + 
+                  clVersion +
+                  "\" ENDL \\" );
+
+    //Getting BUILD_COMPILER
+    if ( WScript.Arguments(0).toLowerCase().match("icl") ) {
+        tmpExec = WshShell.Exec("icl -c empty.cpp ");
+        while ( tmpExec.Status == 0 ) {
+            WScript.Sleep(100);
+        }
+        WScript.echo( "\"TBB: BUILD_COMPILER\\t" + 
+                      tmpExec.StdErr.ReadLine() + 
+                      "\" ENDL \\" );
+    } else {
+        WScript.echo( "\"TBB: BUILD_COMPILER\\t\\t" + 
+                      clVersion +
+                      "\" ENDL \\" );
+    }
+    tmpExec = WshShell.Exec("cmd /c del /F /Q empty.obj empty.cpp");
+}
+
+//Getting BUILD_TARGET
+WScript.echo( "\"TBB: BUILD_TARGET\\t" + 
+              WScript.Arguments(1) + 
+              "\" ENDL \\" );
+
+//Getting BUILD_COMMAND
+WScript.echo( "\"TBB: BUILD_COMMAND\\t" + WScript.Arguments(2) + "\" ENDL" );
+
+//Getting __TBB_DATETIME and __TBB_VERSION_YMD
+var date = new Date();
+WScript.echo( "#define __TBB_DATETIME \"" + date.toUTCString() + "\"" );
+WScript.echo( "#define __TBB_VERSION_YMD " + date.getUTCFullYear() + ", " + 
+              (date.getUTCMonth() > 8 ? (date.getUTCMonth()+1):("0"+(date.getUTCMonth()+1))) + 
+              (date.getUTCDate() > 9 ? date.getUTCDate():("0"+date.getUTCDate())) );
+
+
+/*
+
+Original strings
+
+#define __TBB_VERSION_STRINGS \
+"TBB: BUILD_HOST\t\tvpolin-mobl1 (ia32)" ENDL \
+"TBB: BUILD_OS\t\tMicrosoft Windows XP [Version 5.1.2600]" ENDL \
+"TBB: BUILD_CL\t\tMicrosoft (R) 32-bit C/C++ Optimizing Compiler Version 13.10.3077 for 80x86" ENDL \
+"TBB: BUILD_COMPILER\tIntel(R) C++ Compiler for 32-bit applications, Version 9.1 Build 20070109Z Package ID: W_CC_C_9.1.034 " ENDL \
+"TBB: BUILD_TARGET\t" ENDL \
+"TBB: BUILD_COMMAND\t" ENDL \
+
+#define __TBB_DATETIME "Mon Jun 4 10:16:07 UTC 2007"
+#define __TBB_VERSION_YMD 2007, 0604
+
+
+
+# The script must be run from two directory levels below this level.
+x='"TBB: '
+y='" ENDL \'
+echo "#define __TBB_VERSION_STRINGS \\"
+echo $x "BUILD_HOST\t\t"`hostname`" ("`../../arch.exe`")"$y
+echo $x "BUILD_OS\t\t"`../../win_version.bat|grep -i 'Version'`$y
+echo >empty.cpp
+echo $x "BUILD_CL\t\t"`cl -c empty.cpp 2>&1 | grep -i Version`$y
+echo $x "BUILD_COMPILER\t"`icl -c empty.cpp 2>&1 | grep -i Version`$y
+echo $x "BUILD_TARGET\t"$TBB_ARCH$y
+echo $x "BUILD_COMMAND\t"$*$y
+echo ""
+# A workaround for MKS 8.6 where `date -u` crashes.
+date -u > date.tmp
+echo "#define __TBB_DATETIME \""`cat date.tmp`"\""
+echo "#define __TBB_VERSION_YMD "`date '+%Y, %m%d'`
+rm empty.cpp
+rm empty.obj
+rm date.tmp
+*/
diff --git a/dep/tbb/build/version_info_winlrb.js b/dep/tbb/build/version_info_winlrb.js
new file mode 100644
index 000000000..67f2a2920
--- /dev/null
+++ b/dep/tbb/build/version_info_winlrb.js
@@ -0,0 +1,91 @@
+// Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+//
+// This file is part of Threading Building Blocks.
+//
+// Threading Building Blocks is free software; you can redistribute it
+// and/or modify it under the terms of the GNU General Public License
+// version 2 as published by the Free Software Foundation.
+//
+// Threading Building Blocks is distributed in the hope that it will be
+// useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+// of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with Threading Building Blocks; if not, write to the Free Software
+// Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+//
+// As a special exception, you may use this file as part of a free software
+// library without restriction.  Specifically, if other files instantiate
+// templates or use macros or inline functions from this file, or you compile
+// this file and link it with other files to produce an executable, this
+// file does not by itself cause the resulting executable to be covered by
+// the GNU General Public License.  This exception does not however
+// invalidate any other reasons why the executable file might be covered by
+// the GNU General Public License.
+
+var WshShell = WScript.CreateObject("WScript.Shell");
+
+var tmpExec;
+
+WScript.Echo("#define __TBB_VERSION_STRINGS \\");
+
+//Getting BUILD_HOST
+WScript.echo( "\"TBB: BUILD_HOST\\t\\t" + 
+			  WshShell.ExpandEnvironmentStrings("%COMPUTERNAME%") +
+			  "\" ENDL \\" );
+
+//Getting BUILD_OS
+tmpExec = WshShell.Exec("cmd /c ver");
+while ( tmpExec.Status == 0 ) {
+	WScript.Sleep(100);
+}
+tmpExec.StdOut.ReadLine();
+
+WScript.echo( "\"TBB: BUILD_OS\\t\\t" + 
+			  tmpExec.StdOut.ReadLine() +
+			  "\" ENDL \\" );
+
+var Unknown = "Unknown";
+
+WScript.echo( "\"TBB: BUILD_KERNEL\\t" + 
+              Unknown +
+              "\" ENDL \\" );
+
+//Getting BUILD_COMPILER
+tmpExec = WshShell.Exec("icc --version");
+while ( tmpExec.Status == 0 ) {
+	WScript.Sleep(100);
+}
+var ccVersion = tmpExec.StdErr.ReadLine();
+WScript.echo( "\"TBB: BUILD_GCC\\t" + 
+              ccVersion +
+              "\" ENDL \\" );
+WScript.echo( "\"TBB: BUILD_COMPILER\\t" + 
+              ccVersion +
+              "\" ENDL \\" );
+
+WScript.echo( "\"TBB: BUILD_GLIBC\\t" + 
+              Unknown +
+              "\" ENDL \\" );
+
+WScript.echo( "\"TBB: BUILD_LD\\t" + 
+              Unknown +
+              "\" ENDL \\" );
+
+//Getting BUILD_TARGET
+WScript.echo( "\"TBB: BUILD_TARGET\\t" + 
+			  WScript.Arguments(1) + 
+			  "\" ENDL \\" );
+
+//Getting BUILD_COMMAND
+WScript.echo( "\"TBB: BUILD_COMMAND\\t" + WScript.Arguments(2) + "\" ENDL" );
+
+//Getting __TBB_DATETIME and __TBB_VERSION_YMD
+var date = new Date();
+WScript.echo( "#define __TBB_DATETIME \"" + date.toUTCString() + "\"" );
+WScript.echo( "#define __TBB_VERSION_YMD " + date.getUTCFullYear() + ", " + 
+			  (date.getUTCMonth() > 8 ? (date.getUTCMonth()+1):("0"+(date.getUTCMonth()+1))) + 
+			  (date.getUTCDate() > 9 ? date.getUTCDate():("0"+date.getUTCDate())) );
+
+
diff --git a/dep/tbb/build/vsproject/index.html b/dep/tbb/build/vsproject/index.html
new file mode 100644
index 000000000..82cad002d
--- /dev/null
+++ b/dep/tbb/build/vsproject/index.html
@@ -0,0 +1,31 @@
+<HTML>
+<BODY>
+
+<H2>Overview</H2>
+This directory contains the visual studio* 2005 solution to build Threading Building Blocks.
+
+
+<H2>Files</H2>
+<DL>
+<DT><A HREF="makefile.sln">makefile.sln</A>
+<DD>Solution file.
+<DT><A HREF="tbb.vcproj">tbb.vcproj</A>
+<DD>Library project file.
+<DT><A HREF="tbbmalloc.vcproj">tbbmalloc.vcproj</A>
+<DD>Scalable allocator library project file. Allocator sources are expected to be located in <A HREF="../../src/tbbmalloc">../../src/tbbmalloc</A> folder.
+<DT><A HREF="tbbmalloc_proxy.vcproj">tbbmalloc_proxy.vcproj</A>
+<DD>Standard allocator replacement project file. 
+</DL>
+
+<HR>
+<A HREF="../index.html">Up to parent directory</A>
+<P></P>
+Copyright &copy; 2005-2009 Intel Corporation.  All Rights Reserved.
+<P></P>
+Intel, Pentium, Intel Xeon, Itanium, Intel XScale and VTune are 
+registered trademarks or trademarks of Intel Corporation or its 
+subsidiaries in the United States and other countries. 
+<P></P>
+* Other names and brands may be claimed as the property of others.
+</BODY>
+</HTML>
diff --git a/dep/tbb/build/vsproject/makefile.sln b/dep/tbb/build/vsproject/makefile.sln
new file mode 100644
index 000000000..2a681d436
--- /dev/null
+++ b/dep/tbb/build/vsproject/makefile.sln
@@ -0,0 +1,72 @@
+Microsoft Visual Studio Solution File, Format Version 9.00
+# Visual Studio 2005
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "tbb", "tbb.vcproj", "{F62787DD-1327-448B-9818-030062BCFAA5}"
+	ProjectSection(WebsiteProperties) = preProject
+		Debug.AspNetCompiler.Debug = "True"
+		Release.AspNetCompiler.Debug = "False"
+	EndProjectSection
+EndProject
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "tbbmalloc", "tbbmalloc.vcproj", "{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}"
+	ProjectSection(WebsiteProperties) = preProject
+		Debug.AspNetCompiler.Debug = "True"
+		Release.AspNetCompiler.Debug = "False"
+	EndProjectSection
+	ProjectSection(ProjectDependencies) = postProject
+		{F62787DD-1327-448B-9818-030062BCFAA5} = {F62787DD-1327-448B-9818-030062BCFAA5}
+	EndProjectSection
+EndProject
+Project("{2150E333-8FDC-42A3-9474-1A3956D46DE8}") = "Solution Items", "Solution Items", "{8898CE0B-0BFB-45AE-AA71-83735ED2510D}"
+	ProjectSection(WebsiteProperties) = preProject
+		Debug.AspNetCompiler.Debug = "True"
+		Release.AspNetCompiler.Debug = "False"
+	EndProjectSection
+	ProjectSection(SolutionItems) = preProject
+		index.html = index.html
+	EndProjectSection
+EndProject
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "tbbmalloc_proxy", "tbbmalloc_proxy.vcproj", "{02F61511-D5B6-46E6-B4BB-DEAA96E6BCC7}"
+	ProjectSection(WebsiteProperties) = preProject
+		Debug.AspNetCompiler.Debug = "True"
+		Release.AspNetCompiler.Debug = "False"
+	EndProjectSection
+	ProjectSection(ProjectDependencies) = postProject
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8} = {B15F131E-328A-4D42-ADC2-9FF4CA6306D8}
+	EndProjectSection
+EndProject
+Global
+	GlobalSection(SolutionConfigurationPlatforms) = preSolution
+		Debug|Win32 = Debug|Win32
+		Debug|x64 = Debug|x64
+		Release|Win32 = Release|Win32
+		Release|x64 = Release|x64
+	EndGlobalSection
+	GlobalSection(ProjectConfigurationPlatforms) = postSolution
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug|Win32.ActiveCfg = Debug|Win32
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug|Win32.Build.0 = Debug|Win32
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug|x64.ActiveCfg = Debug|x64
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug|x64.Build.0 = Debug|x64
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Release|Win32.ActiveCfg = Release|Win32
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Release|Win32.Build.0 = Release|Win32
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Release|x64.ActiveCfg = Release|x64
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Release|x64.Build.0 = Release|x64
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug|Win32.ActiveCfg = Debug|Win32
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug|Win32.Build.0 = Debug|Win32
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug|x64.ActiveCfg = Debug|x64
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug|x64.Build.0 = Debug|x64
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Release|Win32.ActiveCfg = Release|Win32
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Release|Win32.Build.0 = Release|Win32
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Release|x64.ActiveCfg = Release|x64
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Release|x64.Build.0 = Release|x64
+		{02F61511-D5B6-46E6-B4BB-DEAA96E6BCC7}.Debug|Win32.ActiveCfg = Debug|Win32
+		{02F61511-D5B6-46E6-B4BB-DEAA96E6BCC7}.Debug|Win32.Build.0 = Debug|Win32
+		{02F61511-D5B6-46E6-B4BB-DEAA96E6BCC7}.Debug|x64.ActiveCfg = Debug|x64
+		{02F61511-D5B6-46E6-B4BB-DEAA96E6BCC7}.Debug|x64.Build.0 = Debug|x64
+		{02F61511-D5B6-46E6-B4BB-DEAA96E6BCC7}.Release|Win32.ActiveCfg = Release|Win32
+		{02F61511-D5B6-46E6-B4BB-DEAA96E6BCC7}.Release|Win32.Build.0 = Release|Win32
+		{02F61511-D5B6-46E6-B4BB-DEAA96E6BCC7}.Release|x64.ActiveCfg = Release|x64
+		{02F61511-D5B6-46E6-B4BB-DEAA96E6BCC7}.Release|x64.Build.0 = Release|x64
+	EndGlobalSection
+	GlobalSection(SolutionProperties) = preSolution
+		HideSolutionNode = FALSE
+	EndGlobalSection
+EndGlobal
diff --git a/dep/tbb/build/vsproject/tbb.vcproj b/dep/tbb/build/vsproject/tbb.vcproj
new file mode 100644
index 000000000..1024d7ef7
--- /dev/null
+++ b/dep/tbb/build/vsproject/tbb.vcproj
@@ -0,0 +1,310 @@
+<?xml version="1.0" encoding="windows-1251"?>
+<VisualStudioProject ProjectType="Visual C++" Version="8,00" Name="tbb" ProjectGUID="{F62787DD-1327-448B-9818-030062BCFAA5}" RootNamespace="tbb" Keyword="Win32Proj">
+	<Platforms>
+		<Platform Name="Win32"/>
+		<Platform Name="x64"/>
+	</Platforms>
+	<ToolFiles>
+		<DefaultToolFile FileName="masm.rules"/>
+	</ToolFiles>
+	<Configurations>
+		<Configuration Name="Debug|Win32" OutputDirectory="$(SolutionDir)ia32\$(ConfigurationName)" IntermediateDirectory="ia32\$(ConfigurationName)" ConfigurationType="2" CharacterSet="0">
+			<Tool Name="VCPreBuildEventTool"/>
+			<Tool Name="VCCustomBuildTool"/>
+			<Tool Name="MASM"/>
+			<Tool Name="VCXMLDataGeneratorTool"/>
+			<Tool Name="VCWebServiceProxyGeneratorTool"/>
+			<Tool Name="VCMIDLTool"/>
+			<Tool Name="VCCLCompilerTool" AdditionalOptions=" /c /MDd /Od /Ob0 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /D_USE_RTM_VERSION /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /W4 /Wp64 /I../../src /I../../src/rml/include /I../../include" Optimization="0" AdditionalIncludeDirectories="." PreprocessorDefinitions="" MinimalRebuild="true" BasicRuntimeChecks="3" RuntimeLibrary="3" UsePrecompiledHeader="0" WarningLevel="4" Detect64BitPortabilityProblems="false" DebugInformationFormat="3"/>
+			<Tool Name="VCManagedResourceCompilerTool"/>
+			<Tool Name="VCResourceCompilerTool"/>
+			<Tool Name="VCPreLinkEventTool"/>
+			<Tool Name="VCLinkerTool" AdditionalOptions="/DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbb.def" OutputFile="$(OutDir)\tbb_debug.dll" LinkIncremental="1" GenerateDebugInformation="true" SubSystem="2" TargetMachine="1"/>
+			<Tool Name="VCALinkTool"/>
+			<Tool Name="VCManifestTool"/>
+			<Tool Name="VCXDCMakeTool"/>
+			<Tool Name="VCBscMakeTool"/>
+			<Tool Name="VCFxCopTool"/>
+			<Tool Name="VCAppVerifierTool"/>
+			<Tool Name="VCWebDeploymentTool"/>
+			<Tool Name="VCPostBuildEventTool"/>
+		</Configuration>
+		<Configuration Name="Debug|x64" OutputDirectory="$(SolutionDir)intel64\$(ConfigurationName)" IntermediateDirectory="intel64\$(ConfigurationName)" ConfigurationType="2" CharacterSet="0">
+			<Tool Name="VCPreBuildEventTool"/>
+			<Tool Name="VCCustomBuildTool"/>
+			<Tool Name="MASM"/>
+			<Tool Name="VCXMLDataGeneratorTool"/>
+			<Tool Name="VCWebServiceProxyGeneratorTool"/>
+			<Tool Name="VCMIDLTool" TargetEnvironment="3"/>
+			<Tool Name="VCCLCompilerTool" AdditionalOptions=" /c /MDd /Od /Ob0 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /D_USE_RTM_VERSION /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /W4 /Wp64 /I../../src /I../../src/rml/include /I../../include" Optimization="0" AdditionalIncludeDirectories="." PreprocessorDefinitions="" MinimalRebuild="true" BasicRuntimeChecks="3" RuntimeLibrary="3" UsePrecompiledHeader="0" WarningLevel="4" Detect64BitPortabilityProblems="false" DebugInformationFormat="3" ShowIncludes="false"/>
+			<Tool Name="VCManagedResourceCompilerTool"/>
+			<Tool Name="VCResourceCompilerTool"/>
+			<Tool Name="VCPreLinkEventTool"/>
+			<Tool Name="VCLinkerTool" AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbb.def" OutputFile="$(OutDir)\tbb_debug.dll" LinkIncremental="1" GenerateDebugInformation="true" SubSystem="2" TargetMachine="17"/>
+			<Tool Name="VCALinkTool"/>
+			<Tool Name="VCManifestTool"/>
+			<Tool Name="VCXDCMakeTool"/>
+			<Tool Name="VCBscMakeTool"/>
+			<Tool Name="VCFxCopTool"/>
+			<Tool Name="VCAppVerifierTool"/>
+			<Tool Name="VCWebDeploymentTool"/>
+			<Tool Name="VCPostBuildEventTool"/>
+		</Configuration>
+		<Configuration Name="Release|Win32" OutputDirectory="$(SolutionDir)ia32\$(ConfigurationName)" IntermediateDirectory="ia32\$(ConfigurationName)" ConfigurationType="2" CharacterSet="0" WholeProgramOptimization="1">
+			<Tool Name="VCPreBuildEventTool"/>
+			<Tool Name="VCCustomBuildTool"/>
+			<Tool Name="MASM"/>
+			<Tool Name="VCXMLDataGeneratorTool"/>
+			<Tool Name="VCWebServiceProxyGeneratorTool"/>
+			<Tool Name="VCMIDLTool"/>
+			<Tool Name="VCCLCompilerTool" AdditionalOptions=" /c /MD /O2 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /Oy /D_USE_RTM_VERSION /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /W4 /Wp64 /I../../src /I../../src/rml/include /I../../include" AdditionalIncludeDirectories="." PreprocessorDefinitions="" RuntimeLibrary="2" UsePrecompiledHeader="0" WarningLevel="4" Detect64BitPortabilityProblems="false" DebugInformationFormat="3"/>
+			<Tool Name="VCManagedResourceCompilerTool"/>
+			<Tool Name="VCResourceCompilerTool"/>
+			<Tool Name="VCPreLinkEventTool"/>
+			<Tool Name="VCLinkerTool" AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbb.def" OutputFile="$(OutDir)\tbb.dll" LinkIncremental="1" GenerateDebugInformation="true" SubSystem="2" OptimizeReferences="2" EnableCOMDATFolding="2" TargetMachine="1"/>
+			<Tool Name="VCALinkTool"/>
+			<Tool Name="VCManifestTool"/>
+			<Tool Name="VCXDCMakeTool"/>
+			<Tool Name="VCBscMakeTool"/>
+			<Tool Name="VCFxCopTool"/>
+			<Tool Name="VCAppVerifierTool"/>
+			<Tool Name="VCWebDeploymentTool"/>
+			<Tool Name="VCPostBuildEventTool"/>
+		</Configuration>
+		<Configuration Name="Release|x64" OutputDirectory="$(SolutionDir)intel64\$(ConfigurationName)" IntermediateDirectory="intel64\$(ConfigurationName)" ConfigurationType="2" CharacterSet="0" WholeProgramOptimization="1">
+			<Tool Name="VCPreBuildEventTool"/>
+			<Tool Name="VCCustomBuildTool"/>
+			<Tool Name="MASM"/>
+			<Tool Name="VCXMLDataGeneratorTool"/>
+			<Tool Name="VCWebServiceProxyGeneratorTool"/>
+			<Tool Name="VCMIDLTool" TargetEnvironment="3"/>
+			<Tool Name="VCCLCompilerTool" AdditionalOptions=" /c /MD /O2 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /D_USE_RTM_VERSION /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /W4 /Wp64 /I../../src /I../../src/rml/include /I../../include" AdditionalIncludeDirectories="." PreprocessorDefinitions="" RuntimeLibrary="2" UsePrecompiledHeader="0" WarningLevel="4" Detect64BitPortabilityProblems="false" DebugInformationFormat="3"/>
+			<Tool Name="VCManagedResourceCompilerTool"/>
+			<Tool Name="VCResourceCompilerTool"/>
+			<Tool Name="VCPreLinkEventTool"/>
+			<Tool Name="VCLinkerTool" AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO /DEF:$(IntDir)\tbb.def" OutputFile="$(OutDir)\tbb.dll" LinkIncremental="1" GenerateDebugInformation="true" SubSystem="2" OptimizeReferences="2" EnableCOMDATFolding="2" TargetMachine="17"/>
+			<Tool Name="VCALinkTool"/>
+			<Tool Name="VCManifestTool"/>
+			<Tool Name="VCXDCMakeTool"/>
+			<Tool Name="VCBscMakeTool"/>
+			<Tool Name="VCFxCopTool"/>
+			<Tool Name="VCAppVerifierTool"/>
+			<Tool Name="VCWebDeploymentTool"/>
+			<Tool Name="VCPostBuildEventTool"/>
+		</Configuration>
+	</Configurations>
+	<References>
+	</References>
+	<Files>
+		<Filter Name="Source Files" Filter="cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx" UniqueIdentifier="{4FC737F1-C7A5-4376-A066-2A32D752A2FF}">
+			<File RelativePath="..\..\src\tbb\ia32-masm\atomic_support.asm">
+				<FileConfiguration Name="Debug|Win32">
+					<Tool Name="MASM" AdditionalOptions="/coff /Zi"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Debug|x64" ExcludedFromBuild="true">
+					<Tool Name="MASM" AdditionalOptions="/coff /Zi"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Release|x64" ExcludedFromBuild="true">
+					<Tool Name="MASM"/>
+				</FileConfiguration>
+			</File>
+			<File RelativePath="..\..\src\tbb\intel64-masm\atomic_support.asm">
+				<FileConfiguration Name="Debug|Win32" ExcludedFromBuild="true">
+					<Tool Name="MASM"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Debug|x64">
+					<Tool Name="VCCustomBuildTool" Description="building atomic_support.obj" CommandLine="ml64 /Fo&quot;intel64\Debug\atomic_support.obj&quot; /DUSE_FRAME_POINTER /DEM64T=1 /c /Zi ../../src/tbb/intel64-masm/atomic_support.asm
+" Outputs="intel64\Debug\atomic_support.obj"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Release|Win32" ExcludedFromBuild="true">
+					<Tool Name="MASM"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Release|x64">
+					<Tool Name="VCCustomBuildTool" Description="building atomic_support.obj" CommandLine="ml64 /Fo&quot;intel64\Release\atomic_support.obj&quot;  /DEM64T=1 /c /Zi ../../src/tbb/intel64-masm/atomic_support.asm
+" Outputs="intel64\Release\atomic_support.obj"/>
+				</FileConfiguration>
+			</File>
+			<File RelativePath="..\..\src\tbb\itt_notify_proxy.c">
+			</File>
+			<File RelativePath="..\..\src\tbb\ia32-masm\lock_byte.asm">
+				<FileConfiguration Name="Debug|Win32">
+					<Tool Name="MASM" AdditionalOptions="/coff /Zi"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Debug|x64" ExcludedFromBuild="true">
+					<Tool Name="MASM" AdditionalOptions="/coff /Zi"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Release|x64" ExcludedFromBuild="true">
+					<Tool Name="MASM"/>
+				</FileConfiguration>
+			</File>
+			<File RelativePath="..\..\src\tbb\win32-tbb-export.def">
+				<FileConfiguration Name="Debug|Win32">
+					<Tool Name="VCCustomBuildTool" Description="generating tbb.def file" CommandLine="cl /nologo /TC /EP ../../src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../include &gt;$(IntDir)\tbb.def
+" Outputs="$(IntDir)\tbb.def"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Debug|x64" ExcludedFromBuild="true">
+					<Tool Name="VCCustomBuildTool" Description="generating tbb.def file" CommandLine="cl /nologo /TC /EP ../../src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def
+" Outputs="$(IntDir)\tbb.def"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Release|Win32">
+					<Tool Name="VCCustomBuildTool" Description="generating tbb.def file" CommandLine="cl /nologo /TC /EP ../../src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../include &gt;$(IntDir)\tbb.def
+" Outputs="$(IntDir)\tbb.def"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Release|x64" ExcludedFromBuild="true">
+					<Tool Name="VCCustomBuildTool" Description="generating tbb.def file" CommandLine="cl /nologo /TC /EP ../../src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def
+" Outputs="$(IntDir)\tbb.def"/>
+				</FileConfiguration>
+			</File>
+			<File RelativePath="..\..\src\tbb\win64-tbb-export.def">
+				<FileConfiguration Name="Debug|Win32" ExcludedFromBuild="true">
+					<Tool Name="VCCustomBuildTool" Description="generating tbb.def file" CommandLine="cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def
+" Outputs="$(IntDir)\tbb.def"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Debug|x64">
+					<Tool Name="VCCustomBuildTool" Description="generating tbb.def file" CommandLine="cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../include &gt;$(IntDir)\tbb.def
+" Outputs="$(IntDir)\tbb.def"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Release|Win32" ExcludedFromBuild="true">
+					<Tool Name="VCCustomBuildTool" Description="generating tbb.def file" CommandLine="cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def
+" Outputs="$(IntDir)\tbb.def"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Release|x64">
+					<Tool Name="VCCustomBuildTool" Description="generating tbb.def file" CommandLine="cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../include &gt;$(IntDir)\tbb.def
+" Outputs="$(IntDir)\tbb.def"/>
+				</FileConfiguration>
+			</File>
+			<File RelativePath="..\..\src\tbb\concurrent_hash_map.cpp"/><File RelativePath="..\..\src\tbb\concurrent_queue.cpp"/><File RelativePath="..\..\src\tbb\concurrent_vector.cpp"/><File RelativePath="..\..\src\tbb\dynamic_link.cpp"/><File RelativePath="..\..\src\tbb\itt_notify.cpp"/><File RelativePath="..\..\src\tbb\cache_aligned_allocator.cpp"/><File RelativePath="..\..\src\tbb\pipeline.cpp"/><File RelativePath="..\..\src\tbb\queuing_mutex.cpp"/><File RelativePath="..\..\src\tbb\queuing_rw_mutex.cpp"/><File RelativePath="..\..\src\tbb\spin_rw_mutex.cpp"/><File RelativePath="..\..\src\tbb\spin_mutex.cpp"/><File RelativePath="..\..\src\tbb\task.cpp"/><File RelativePath="..\..\src\tbb\tbb_misc.cpp"/><File RelativePath="..\..\src\tbb\mutex.cpp"/><File RelativePath="..\..\src\tbb\recursive_mutex.cpp"/><File RelativePath="..\..\src\tbb\tbb_thread.cpp"/><File RelativePath="..\..\src\tbb\private_server.cpp"/><File RelativePath="..\..\src\rml\client\rml_tbb.cpp"/><File RelativePath="..\..\src\old\concurrent_vector_v2.cpp"/><File RelativePath="..\..\src\old\concurrent_queue_v2.cpp"/><File RelativePath="..\..\src\old\spin_rw_mutex_v2.cpp"/></Filter>
+		<Filter Name="Header Files" Filter="h;hpp;hxx;hm;inl;inc;xsd" UniqueIdentifier="{93995380-89BD-4b04-88EB-625FBE52EBFB}">
+			<File RelativePath="..\..\include\tbb\_tbb_windef.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\aligned_space.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\atomic.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\blocked_range.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\blocked_range2d.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\blocked_range3d.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\cache_aligned_allocator.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\concurrent_hash_map.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\concurrent_queue.h">
+			</File>
+			<File RelativePath="..\..\src\old\concurrent_queue_v2.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\concurrent_vector.h">
+			</File>
+			<File RelativePath="..\..\src\old\concurrent_vector_v2.h">
+			</File>
+			<File RelativePath="..\..\src\tbb\dynamic_link.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\enumerable_thread_specific.h">
+			</File>
+			<File RelativePath="..\..\src\tbb\gate.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\machine\ibm_aix51.h">
+			</File>
+			<File RelativePath="..\..\src\tbb\itt_notify.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\machine\linux_common.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\machine\linux_intel64.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\machine\linux_ia32.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\machine\linux_ia64.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\machine\mac_ppc.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\mutex.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\null_mutex.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\null_rw_mutex.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\parallel_do.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\parallel_for.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\parallel_reduce.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\parallel_scan.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\parallel_sort.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\parallel_while.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\partitioner.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\pipeline.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\queuing_mutex.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\queuing_rw_mutex.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\recursive_mutex.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\scalable_allocator.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\spin_mutex.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\spin_rw_mutex.h">
+			</File>
+			<File RelativePath="..\..\src\old\spin_rw_mutex_v2.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\task.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\task_scheduler_init.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\task_scheduler_observer.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\tbb_allocator.h">
+			</File>
+			<File RelativePath="..\..\src\tbb\tbb_assert_impl.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\tbb_exception.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\tbb_machine.h">
+			</File>
+			<File RelativePath="..\..\src\tbb\tbb_misc.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\tbb_profiling.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\tbb_stddef.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\tbb_thread.h">
+			</File>
+			<File RelativePath="..\..\src\tbb\tbb_version.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\tbbmalloc_proxy.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\tick_count.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\machine\windows_intel64.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\machine\windows_ia32.h">
+			</File>
+		</Filter>
+		<Filter Name="Resource Files" Filter="rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav" UniqueIdentifier="{67DA6AB6-F800-4c08-8B7A-83BB121AAD01}">
+			<File RelativePath="..\..\src\tbb\tbb_resource.rc">
+				<FileConfiguration Name="Debug|Win32">
+					<Tool Name="VCResourceCompilerTool" AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Debug|x64">
+					<Tool Name="VCResourceCompilerTool" AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Release|Win32">
+					<Tool Name="VCResourceCompilerTool" AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Release|x64">
+					<Tool Name="VCResourceCompilerTool" AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"/>
+				</FileConfiguration>
+			</File>
+		</Filter>
+	</Files>
+	<Globals>
+	</Globals>
+</VisualStudioProject>
diff --git a/dep/tbb/build/vsproject/tbbmalloc.vcproj b/dep/tbb/build/vsproject/tbbmalloc.vcproj
new file mode 100644
index 000000000..26cc44b90
--- /dev/null
+++ b/dep/tbb/build/vsproject/tbbmalloc.vcproj
@@ -0,0 +1,290 @@
+<?xml version="1.0" encoding="windows-1251"?>
+<VisualStudioProject ProjectType="Visual C++" Version="8,00" Name="tbbmalloc" ProjectGUID="{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}" RootNamespace="tbbmalloc" Keyword="Win32Proj">
+	<Platforms>
+		<Platform Name="Win32"/>
+		<Platform Name="x64"/>
+	</Platforms>
+	<ToolFiles>
+		<DefaultToolFile FileName="masm.rules"/>
+	</ToolFiles>
+	<Configurations>
+		<Configuration Name="Debug|Win32" OutputDirectory="$(SolutionDir)ia32\$(ConfigurationName)" IntermediateDirectory="ia32\$(ConfigurationName)" ConfigurationType="2" CharacterSet="0">
+			<Tool Name="VCPreBuildEventTool"/>
+			<Tool Name="VCCustomBuildTool"/>
+			<Tool Name="MASM"/>
+			<Tool Name="VCXMLDataGeneratorTool"/>
+			<Tool Name="VCWebServiceProxyGeneratorTool"/>
+			<Tool Name="VCMIDLTool"/>
+			<Tool Name="VCCLCompilerTool" AdditionalOptions=" /c /MDd /Od /Ob0 /Zi /EHs- /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /D_USE_RTM_VERSION /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../src/rml/include /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc" Optimization="0" AdditionalIncludeDirectories="." PreprocessorDefinitions="" MinimalRebuild="true" ExceptionHandling="0" BasicRuntimeChecks="0" RuntimeLibrary="3" UsePrecompiledHeader="0" WarningLevel="4" SuppressStartupBanner="false" Detect64BitPortabilityProblems="false" DebugInformationFormat="3"/>
+			<Tool Name="VCManagedResourceCompilerTool"/>
+			<Tool Name="VCResourceCompilerTool"/>
+			<Tool Name="VCPreLinkEventTool"/>
+			<Tool Name="VCLinkerTool" AdditionalOptions="/DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbbmalloc.def" OutputFile="$(OutDir)\tbbmalloc_debug.dll" LinkIncremental="1" GenerateDebugInformation="true" SubSystem="2" TargetMachine="1"/>
+			<Tool Name="VCALinkTool"/>
+			<Tool Name="VCManifestTool"/>
+			<Tool Name="VCXDCMakeTool"/>
+			<Tool Name="VCBscMakeTool"/>
+			<Tool Name="VCFxCopTool"/>
+			<Tool Name="VCAppVerifierTool"/>
+			<Tool Name="VCWebDeploymentTool"/>
+			<Tool Name="VCPostBuildEventTool"/>
+		</Configuration>
+		<Configuration Name="Debug|x64" OutputDirectory="$(SolutionDir)intel64\$(ConfigurationName)" IntermediateDirectory="intel64\$(ConfigurationName)" ConfigurationType="2" CharacterSet="0">
+			<Tool Name="VCPreBuildEventTool"/>
+			<Tool Name="VCCustomBuildTool"/>
+			<Tool Name="MASM"/>
+			<Tool Name="VCXMLDataGeneratorTool"/>
+			<Tool Name="VCWebServiceProxyGeneratorTool"/>
+			<Tool Name="VCMIDLTool" TargetEnvironment="3"/>
+			<Tool Name="VCCLCompilerTool" AdditionalOptions=" /c /MDd /Od /Ob0 /Zi /EHs- /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /D_USE_RTM_VERSION /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../src/rml/include /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc" Optimization="0" AdditionalIncludeDirectories="." MinimalRebuild="false" ExceptionHandling="0" BasicRuntimeChecks="0" RuntimeLibrary="3" TreatWChar_tAsBuiltInType="true" UsePrecompiledHeader="0" WarningLevel="4" SuppressStartupBanner="false" Detect64BitPortabilityProblems="false" DebugInformationFormat="3" ShowIncludes="false"/>
+			<Tool Name="VCManagedResourceCompilerTool"/>
+			<Tool Name="VCResourceCompilerTool"/>
+			<Tool Name="VCPreLinkEventTool"/>
+			<Tool Name="VCLinkerTool" AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbbmalloc.def" OutputFile="$(OutDir)\tbbmalloc_debug.dll" LinkIncremental="1" GenerateDebugInformation="true" SubSystem="2" TargetMachine="17"/>
+			<Tool Name="VCALinkTool"/>
+			<Tool Name="VCManifestTool"/>
+			<Tool Name="VCXDCMakeTool"/>
+			<Tool Name="VCBscMakeTool"/>
+			<Tool Name="VCFxCopTool"/>
+			<Tool Name="VCAppVerifierTool"/>
+			<Tool Name="VCWebDeploymentTool"/>
+			<Tool Name="VCPostBuildEventTool"/>
+		</Configuration>
+		<Configuration Name="Release|Win32" OutputDirectory="$(SolutionDir)ia32\$(ConfigurationName)" IntermediateDirectory="ia32\$(ConfigurationName)" ConfigurationType="2" CharacterSet="0" WholeProgramOptimization="1">
+			<Tool Name="VCPreBuildEventTool"/>
+			<Tool Name="VCCustomBuildTool"/>
+			<Tool Name="MASM"/>
+			<Tool Name="VCXMLDataGeneratorTool"/>
+			<Tool Name="VCWebServiceProxyGeneratorTool"/>
+			<Tool Name="VCMIDLTool"/>
+			<Tool Name="VCCLCompilerTool" AdditionalOptions=" /c /MD /O2 /Zi /EHs- /Zc:forScope /Zc:wchar_t /Oy /D_USE_RTM_VERSION /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../src/rml/include /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc" AdditionalIncludeDirectories="." PreprocessorDefinitions="" ExceptionHandling="0" RuntimeLibrary="2" UsePrecompiledHeader="0" WarningLevel="4" SuppressStartupBanner="false" Detect64BitPortabilityProblems="false" DebugInformationFormat="3"/>
+			<Tool Name="VCManagedResourceCompilerTool"/>
+			<Tool Name="VCResourceCompilerTool"/>
+			<Tool Name="VCPreLinkEventTool"/>
+			<Tool Name="VCLinkerTool" AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbbmalloc.def" OutputFile="$(OutDir)\tbbmalloc.dll" LinkIncremental="1" GenerateDebugInformation="true" SubSystem="2" OptimizeReferences="2" EnableCOMDATFolding="2" TargetMachine="1"/>
+			<Tool Name="VCALinkTool"/>
+			<Tool Name="VCManifestTool"/>
+			<Tool Name="VCXDCMakeTool"/>
+			<Tool Name="VCBscMakeTool"/>
+			<Tool Name="VCFxCopTool"/>
+			<Tool Name="VCAppVerifierTool"/>
+			<Tool Name="VCWebDeploymentTool"/>
+			<Tool Name="VCPostBuildEventTool"/>
+		</Configuration>
+		<Configuration Name="Release|x64" OutputDirectory="$(SolutionDir)intel64\$(ConfigurationName)" IntermediateDirectory="intel64\$(ConfigurationName)" ConfigurationType="2" CharacterSet="0" WholeProgramOptimization="1">
+			<Tool Name="VCPreBuildEventTool"/>
+			<Tool Name="VCCustomBuildTool"/>
+			<Tool Name="MASM"/>
+			<Tool Name="VCXMLDataGeneratorTool"/>
+			<Tool Name="VCWebServiceProxyGeneratorTool"/>
+			<Tool Name="VCMIDLTool" TargetEnvironment="3"/>
+			<Tool Name="VCCLCompilerTool" AdditionalOptions=" /c /MD /O2 /Zi /EHs- /Zc:forScope /Zc:wchar_t /D_USE_RTM_VERSION /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../src/rml/include /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc" AdditionalIncludeDirectories="." PreprocessorDefinitions="" ExceptionHandling="0" RuntimeLibrary="2" UsePrecompiledHeader="0" WarningLevel="4" SuppressStartupBanner="false" Detect64BitPortabilityProblems="false" DebugInformationFormat="3"/>
+			<Tool Name="VCManagedResourceCompilerTool"/>
+			<Tool Name="VCResourceCompilerTool"/>
+			<Tool Name="VCPreLinkEventTool"/>
+			<Tool Name="VCLinkerTool" AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO /DEF:$(IntDir)\tbbmalloc.def" OutputFile="$(OutDir)\tbbmalloc.dll" LinkIncremental="1" GenerateDebugInformation="true" SubSystem="2" OptimizeReferences="2" EnableCOMDATFolding="2" TargetMachine="17"/>
+			<Tool Name="VCALinkTool"/>
+			<Tool Name="VCManifestTool"/>
+			<Tool Name="VCXDCMakeTool"/>
+			<Tool Name="VCBscMakeTool"/>
+			<Tool Name="VCFxCopTool"/>
+			<Tool Name="VCAppVerifierTool"/>
+			<Tool Name="VCWebDeploymentTool"/>
+			<Tool Name="VCPostBuildEventTool"/>
+		</Configuration>
+	</Configurations>
+	<References>
+	</References>
+	<Files>
+		<Filter Name="Source Files" Filter="cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx" UniqueIdentifier="{4FC737F1-C7A5-4376-A066-2A32D752A2FF}">
+			<File RelativePath="..\..\src\tbb\ia32-masm\atomic_support.asm">
+				<FileConfiguration Name="Debug|Win32">
+					<Tool Name="MASM" AdditionalOptions="/coff /Zi"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Debug|x64" ExcludedFromBuild="true">
+					<Tool Name="MASM" AdditionalOptions="/coff /Zi"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Release|x64" ExcludedFromBuild="true">
+					<Tool Name="MASM"/>
+				</FileConfiguration>
+			</File>
+			<File RelativePath="..\..\src\tbb\intel64-masm\atomic_support.asm">
+				<FileConfiguration Name="Debug|Win32" ExcludedFromBuild="true">
+					<Tool Name="MASM"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Debug|x64">
+					<Tool Name="VCCustomBuildTool" Description="building atomic_support.obj" CommandLine="ml64 /Fo&quot;intel64\Debug\atomic_support.obj&quot; /DUSE_FRAME_POINTER /DEM64T=1 /c /Zi ../../src/tbb/intel64-masm/atomic_support.asm
+" Outputs="intel64\Debug\atomic_support.obj"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Release|Win32" ExcludedFromBuild="true">
+					<Tool Name="MASM"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Release|x64">
+					<Tool Name="VCCustomBuildTool" Description="building atomic_support.obj" CommandLine="ml64 /Fo&quot;intel64\Release\atomic_support.obj&quot;  /DEM64T=1 /c /Zi ../../src/tbb/intel64-masm/atomic_support.asm
+" Outputs="intel64\Release\atomic_support.obj"/>
+				</FileConfiguration>
+			</File>
+			<File RelativePath="..\..\src\tbb\itt_notify_proxy.c">
+			</File>
+			<File RelativePath="..\..\src\tbb\ia32-masm\lock_byte.asm">
+				<FileConfiguration Name="Debug|Win32">
+					<Tool Name="MASM" AdditionalOptions="/coff /Zi"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Debug|x64" ExcludedFromBuild="true">
+					<Tool Name="MASM" AdditionalOptions="/coff /Zi"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Release|x64" ExcludedFromBuild="true">
+					<Tool Name="MASM"/>
+				</FileConfiguration>
+			</File>
+			<File RelativePath="..\..\src\tbbmalloc\win32-tbbmalloc-export.def">
+				<FileConfiguration Name="Debug|Win32">
+					<Tool Name="VCCustomBuildTool" Description="generating tbbmalloc.def file" CommandLine="cl /nologo /TC /EP ../../src/tbbmalloc/win32-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbbmalloc.def
+" Outputs="$(IntDir)\tbbmalloc.def"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Debug|x64" ExcludedFromBuild="true">
+					<Tool Name="VCCustomBuildTool" Description="generating tbb.def file" CommandLine="cl /nologo /TC /EP ../../src/tbbmalloc/win32-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbbmalloc.def
+" Outputs="$(IntDir)\tbbmalloc.def"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Release|Win32">
+					<Tool Name="VCCustomBuildTool" Description="generating tbbmalloc.def file" CommandLine="cl /nologo /TC /EP ../../src/tbbmalloc/win32-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbbmalloc.def
+" Outputs="$(IntDir)\tbbmalloc.def"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Release|x64" ExcludedFromBuild="true">
+					<Tool Name="VCCustomBuildTool" Description="generating tbb.def file" CommandLine="cl /nologo /TC /EP ../../src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def
+" Outputs="$(IntDir)\tbb.def"/>
+				</FileConfiguration>
+			</File>
+			<File RelativePath="..\..\src\tbbmalloc\win64-tbbmalloc-export.def">
+				<FileConfiguration Name="Debug|Win32" ExcludedFromBuild="true">
+					<Tool Name="VCCustomBuildTool" Description="generating tbb.def file" CommandLine="cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def
+" Outputs="$(IntDir)\tbb.def"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Debug|x64">
+					<Tool Name="VCCustomBuildTool" Description="generating tbbmalloc.def file" CommandLine="cl /nologo /TC /EP ../../src/tbbmalloc/win64-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbbmalloc.def
+" Outputs="$(IntDir)\tbbmalloc.def"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Release|Win32" ExcludedFromBuild="true">
+					<Tool Name="VCCustomBuildTool" Description="generating tbb.def file" CommandLine="cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def
+" Outputs="$(IntDir)\tbb.def"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Release|x64">
+					<Tool Name="VCCustomBuildTool" Description="generating tbbmalloc.def file" CommandLine="cl /nologo /TC /EP ../../src/tbbmalloc/win64-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbbmalloc.def
+" Outputs="$(IntDir)\tbbmalloc.def"/>
+				</FileConfiguration>
+			</File>
+			<File RelativePath="..\..\src\tbbmalloc\tbbmalloc.cpp"/><File RelativePath="..\..\src\tbb\dynamic_link.cpp"/><File RelativePath="..\..\src\tbb\tbb_misc.cpp"/><File RelativePath="..\..\src\tbbmalloc\MemoryAllocator.cpp"/></Filter>
+		<Filter Name="Header Files" Filter="h;hpp;hxx;hm;inl;inc;xsd" UniqueIdentifier="{93995380-89BD-4b04-88EB-625FBE52EBFB}">
+			<File RelativePath="..\..\include\tbb\_tbb_windef.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\aligned_space.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\atomic.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\blocked_range.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\blocked_range2d.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\blocked_range3d.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\cache_aligned_allocator.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\concurrent_hash_map.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\concurrent_queue.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\concurrent_vector.h">
+			</File>
+			<File RelativePath="..\..\src\tbbmalloc\Customize.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\enumerable_thread_specific.h">
+			</File>
+			<File RelativePath="..\..\src\tbbmalloc\LifoQueue.h">
+			</File>
+			<File RelativePath="..\..\src\tbbmalloc\MapMemory.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\mutex.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\null_mutex.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\null_rw_mutex.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\parallel_do.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\parallel_for.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\parallel_reduce.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\parallel_scan.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\parallel_sort.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\parallel_while.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\partitioner.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\pipeline.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\queuing_mutex.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\queuing_rw_mutex.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\recursive_mutex.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\scalable_allocator.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\spin_mutex.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\spin_rw_mutex.h">
+			</File>
+			<File RelativePath="..\..\src\tbbmalloc\Statistics.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\task.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\task_scheduler_init.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\task_scheduler_observer.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\tbb_allocator.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\tbb_exception.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\tbb_machine.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\tbb_profiling.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\tbb_stddef.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\tbb_thread.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\tbbmalloc_proxy.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\tick_count.h">
+			</File>
+			<File RelativePath="..\..\src\tbbmalloc\TypeDefinitions.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\machine\windows_intel64.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\machine\windows_ia32.h">
+			</File>
+		</Filter>
+		<Filter Name="Resource Files" Filter="rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav" UniqueIdentifier="{67DA6AB6-F800-4c08-8B7A-83BB121AAD01}">
+			<File RelativePath="..\..\src\tbbmalloc\tbbmalloc.rc">
+				<FileConfiguration Name="Debug|Win32">
+					<Tool Name="VCResourceCompilerTool" AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Debug|x64">
+					<Tool Name="VCResourceCompilerTool" AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Release|Win32">
+					<Tool Name="VCResourceCompilerTool" AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Release|x64">
+					<Tool Name="VCResourceCompilerTool" AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"/>
+				</FileConfiguration>
+			</File>
+		</Filter>
+	</Files>
+	<Globals>
+	</Globals>
+</VisualStudioProject>
diff --git a/dep/tbb/build/vsproject/tbbmalloc_proxy.vcproj b/dep/tbb/build/vsproject/tbbmalloc_proxy.vcproj
new file mode 100644
index 000000000..57d65f790
--- /dev/null
+++ b/dep/tbb/build/vsproject/tbbmalloc_proxy.vcproj
@@ -0,0 +1,126 @@
+<?xml version="1.0" encoding="windows-1251"?>
+<VisualStudioProject ProjectType="Visual C++" Version="8,00" Name="tbbmalloc_proxy" ProjectGUID="{02F61511-D5B6-46E6-B4BB-DEAA96E6BCC7}" RootNamespace="tbbmalloc_proxy" Keyword="Win32Proj">
+	<Platforms>
+		<Platform Name="Win32"/>
+		<Platform Name="x64"/>
+	</Platforms>
+	<ToolFiles>
+		<DefaultToolFile FileName="masm.rules"/>
+	</ToolFiles>
+	<Configurations>
+		<Configuration Name="Debug|Win32" OutputDirectory="$(SolutionDir)ia32\$(ConfigurationName)" IntermediateDirectory="ia32\$(ConfigurationName)" ConfigurationType="2" CharacterSet="0">
+			<Tool Name="VCPreBuildEventTool"/>
+			<Tool Name="VCCustomBuildTool"/>
+			<Tool Name="MASM"/>
+			<Tool Name="VCXMLDataGeneratorTool"/>
+			<Tool Name="VCWebServiceProxyGeneratorTool"/>
+			<Tool Name="VCMIDLTool"/>
+			<Tool Name="VCCLCompilerTool" AdditionalOptions=" /c /MDd /Od /Ob0 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /D_USE_RTM_VERSION /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /W4 /Wp64 /I../../src /I../../src/rml/include /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc" Optimization="0" AdditionalIncludeDirectories="." PreprocessorDefinitions="" MinimalRebuild="true" ExceptionHandling="1" BasicRuntimeChecks="0" RuntimeLibrary="3" UsePrecompiledHeader="0" WarningLevel="4" SuppressStartupBanner="false" Detect64BitPortabilityProblems="false" DebugInformationFormat="3"/>
+			<Tool Name="VCManagedResourceCompilerTool"/>
+			<Tool Name="VCResourceCompilerTool"/>
+			<Tool Name="VCPreLinkEventTool"/>
+			<Tool Name="VCLinkerTool" AdditionalOptions="/DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO " AdditionalDependencies="$(OutDir)\tbbmalloc_debug.lib" OutputFile="$(OutDir)\tbbmalloc_proxy_debug.dll" LinkIncremental="1" GenerateDebugInformation="true" SubSystem="2" TargetMachine="1"/>
+			<Tool Name="VCALinkTool"/>
+			<Tool Name="VCManifestTool"/>
+			<Tool Name="VCXDCMakeTool"/>
+			<Tool Name="VCBscMakeTool"/>
+			<Tool Name="VCFxCopTool"/>
+			<Tool Name="VCAppVerifierTool"/>
+			<Tool Name="VCWebDeploymentTool"/>
+			<Tool Name="VCPostBuildEventTool"/>
+		</Configuration>
+		<Configuration Name="Debug|x64" OutputDirectory="$(SolutionDir)intel64\$(ConfigurationName)" IntermediateDirectory="intel64\$(ConfigurationName)" ConfigurationType="2" CharacterSet="0">
+			<Tool Name="VCPreBuildEventTool"/>
+			<Tool Name="VCCustomBuildTool"/>
+			<Tool Name="MASM"/>
+			<Tool Name="VCXMLDataGeneratorTool"/>
+			<Tool Name="VCWebServiceProxyGeneratorTool"/>
+			<Tool Name="VCMIDLTool" TargetEnvironment="3"/>
+			<Tool Name="VCCLCompilerTool" AdditionalOptions=" /c /MDd /Od /Ob0 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /D_USE_RTM_VERSION /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /W4 /Wp64 /I../../src /I../../src/rml/include /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc" Optimization="0" AdditionalIncludeDirectories="." MinimalRebuild="false" ExceptionHandling="0" BasicRuntimeChecks="0" RuntimeLibrary="3" TreatWChar_tAsBuiltInType="true" UsePrecompiledHeader="0" WarningLevel="4" SuppressStartupBanner="false" Detect64BitPortabilityProblems="false" DebugInformationFormat="3" ShowIncludes="false"/>
+			<Tool Name="VCManagedResourceCompilerTool"/>
+			<Tool Name="VCResourceCompilerTool"/>
+			<Tool Name="VCPreLinkEventTool"/>
+			<Tool Name="VCLinkerTool" AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbbmalloc.def" OutputFile="$(OutDir)\tbbmalloc_proxy_debug.dll" LinkIncremental="1" GenerateDebugInformation="true" SubSystem="2" TargetMachine="17"/>
+			<Tool Name="VCALinkTool"/>
+			<Tool Name="VCManifestTool"/>
+			<Tool Name="VCXDCMakeTool"/>
+			<Tool Name="VCBscMakeTool"/>
+			<Tool Name="VCFxCopTool"/>
+			<Tool Name="VCAppVerifierTool"/>
+			<Tool Name="VCWebDeploymentTool"/>
+			<Tool Name="VCPostBuildEventTool"/>
+		</Configuration>
+		<Configuration Name="Release|Win32" OutputDirectory="$(SolutionDir)ia32\$(ConfigurationName)" IntermediateDirectory="ia32\$(ConfigurationName)" ConfigurationType="2" CharacterSet="0" WholeProgramOptimization="1">
+			<Tool Name="VCPreBuildEventTool"/>
+			<Tool Name="VCCustomBuildTool"/>
+			<Tool Name="MASM"/>
+			<Tool Name="VCXMLDataGeneratorTool"/>
+			<Tool Name="VCWebServiceProxyGeneratorTool"/>
+			<Tool Name="VCMIDLTool"/>
+			<Tool Name="VCCLCompilerTool" AdditionalOptions=" /c /MD /O2 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /Oy /D_USE_RTM_VERSION /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /W4 /Wp64 /I../../src /I../../src/rml/include /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc" AdditionalIncludeDirectories="." PreprocessorDefinitions="" ExceptionHandling="0" RuntimeLibrary="2" UsePrecompiledHeader="0" WarningLevel="4" SuppressStartupBanner="false" Detect64BitPortabilityProblems="false" DebugInformationFormat="3"/>
+			<Tool Name="VCManagedResourceCompilerTool"/>
+			<Tool Name="VCResourceCompilerTool"/>
+			<Tool Name="VCPreLinkEventTool"/>
+			<Tool Name="VCLinkerTool" AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  " OutputFile="$(OutDir)\tbbmalloc_proxy.dll" LinkIncremental="1" GenerateDebugInformation="true" SubSystem="2" OptimizeReferences="2" EnableCOMDATFolding="2" TargetMachine="1"/>
+			<Tool Name="VCALinkTool"/>
+			<Tool Name="VCManifestTool"/>
+			<Tool Name="VCXDCMakeTool"/>
+			<Tool Name="VCBscMakeTool"/>
+			<Tool Name="VCFxCopTool"/>
+			<Tool Name="VCAppVerifierTool"/>
+			<Tool Name="VCWebDeploymentTool"/>
+			<Tool Name="VCPostBuildEventTool"/>
+		</Configuration>
+		<Configuration Name="Release|x64" OutputDirectory="$(SolutionDir)intel64\$(ConfigurationName)" IntermediateDirectory="intel64\$(ConfigurationName)" ConfigurationType="2" CharacterSet="0" WholeProgramOptimization="1">
+			<Tool Name="VCPreBuildEventTool"/>
+			<Tool Name="VCCustomBuildTool"/>
+			<Tool Name="MASM"/>
+			<Tool Name="VCXMLDataGeneratorTool"/>
+			<Tool Name="VCWebServiceProxyGeneratorTool"/>
+			<Tool Name="VCMIDLTool" TargetEnvironment="3"/>
+			<Tool Name="VCCLCompilerTool" AdditionalOptions=" /c /MD /O2 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /D_USE_RTM_VERSION /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /W4 /Wp64 /I../../src /I../../src/rml/include /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc" AdditionalIncludeDirectories="." PreprocessorDefinitions="" ExceptionHandling="0" RuntimeLibrary="2" UsePrecompiledHeader="0" WarningLevel="4" SuppressStartupBanner="false" Detect64BitPortabilityProblems="false" DebugInformationFormat="3"/>
+			<Tool Name="VCManagedResourceCompilerTool"/>
+			<Tool Name="VCResourceCompilerTool"/>
+			<Tool Name="VCPreLinkEventTool"/>
+			<Tool Name="VCLinkerTool" AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO /DEF:$(IntDir)\tbbmalloc.def" OutputFile="$(OutDir)\tbbmalloc_proxy.dll" LinkIncremental="1" GenerateDebugInformation="true" SubSystem="2" OptimizeReferences="2" EnableCOMDATFolding="2" TargetMachine="17"/>
+			<Tool Name="VCALinkTool"/>
+			<Tool Name="VCManifestTool"/>
+			<Tool Name="VCXDCMakeTool"/>
+			<Tool Name="VCBscMakeTool"/>
+			<Tool Name="VCFxCopTool"/>
+			<Tool Name="VCAppVerifierTool"/>
+			<Tool Name="VCWebDeploymentTool"/>
+			<Tool Name="VCPostBuildEventTool"/>
+		</Configuration>
+	</Configurations>
+	<References>
+	</References>
+	<Files>
+		<Filter Name="Source Files" Filter="cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx" UniqueIdentifier="{4FC737F1-C7A5-4376-A066-2A32D752A2FF}">
+			<File RelativePath="..\..\src\tbbmalloc\proxy.cpp"/><File RelativePath="..\..\src\tbbmalloc\tbb_function_replacement.cpp"/></Filter>
+		<Filter Name="Header Files" Filter="h;hpp;hxx;hm;inl;inc;xsd" UniqueIdentifier="{93995380-89BD-4b04-88EB-625FBE52EBFB}">
+			<File RelativePath="..\..\src\tbbmalloc\tbb_function_replacement.h">
+			</File>
+			<File RelativePath="..\..\include\tbb\tbbmalloc_proxy.h">
+			</File>
+		</Filter>
+		<Filter Name="Resource Files" Filter="rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav" UniqueIdentifier="{67DA6AB6-F800-4c08-8B7A-83BB121AAD01}">
+			<File RelativePath="..\..\src\tbbmalloc\tbbmalloc.rc">
+				<FileConfiguration Name="Debug|Win32">
+					<Tool Name="VCResourceCompilerTool" AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Debug|x64">
+					<Tool Name="VCResourceCompilerTool" AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Release|Win32">
+					<Tool Name="VCResourceCompilerTool" AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"/>
+				</FileConfiguration>
+				<FileConfiguration Name="Release|x64">
+					<Tool Name="VCResourceCompilerTool" AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"/>
+				</FileConfiguration>
+			</File>
+		</Filter>
+	</Files>
+	<Globals>
+	</Globals>
+</VisualStudioProject>
diff --git a/dep/tbb/build/vsproject/version_string.tmp b/dep/tbb/build/vsproject/version_string.tmp
new file mode 100644
index 000000000..2098d6759
--- /dev/null
+++ b/dep/tbb/build/vsproject/version_string.tmp
@@ -0,0 +1 @@
+#define __TBB_VERSION_STRINGS "Empty"
diff --git a/dep/tbb/build/windows.cl.inc b/dep/tbb/build/windows.cl.inc
new file mode 100644
index 000000000..1051ece06
--- /dev/null
+++ b/dep/tbb/build/windows.cl.inc
@@ -0,0 +1,122 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+#------------------------------------------------------------------------------
+# Define compiler-specific variables.
+#------------------------------------------------------------------------------
+
+
+#------------------------------------------------------------------------------
+# Setting compiler flags.
+#------------------------------------------------------------------------------
+CPLUS = cl /nologo
+LINK_FLAGS = /link /nologo
+LIB_LINK_FLAGS=/link /nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO
+MS_CRT_KEY = /MD$(if $(findstring debug,$(cfg)),d)
+EH_FLAGS = /EHsc /GR
+              
+ifeq ($(cfg), release)
+        CPLUS_FLAGS = $(MS_CRT_KEY) /O2 /Zi $(EH_FLAGS) /Zc:forScope /Zc:wchar_t
+        ASM_FLAGS =
+ifeq (ia32,$(arch))
+        CPLUS_FLAGS += /Oy
+endif
+endif
+ifeq ($(cfg), debug)
+        CPLUS_FLAGS = $(MS_CRT_KEY) /Od /Ob0 /Zi $(EH_FLAGS) /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG
+        ASM_FLAGS = /DUSE_FRAME_POINTER
+endif
+
+
+COMPILE_ONLY = /c
+PREPROC_ONLY = /TC /EP
+INCLUDE_KEY = /I
+DEFINE_KEY = /D
+OUTPUT_KEY = /Fe
+OUTPUTOBJ_KEY = /Fo
+WARNING_AS_ERROR_KEY = /WX
+
+ifeq ($(runtime),vc7.1)
+        WARNING_KEY = /W3
+else
+        WARNING_KEY = /W4
+endif
+
+DYLIB_KEY = /DLL
+EXPORT_KEY = /DEF:
+
+ifeq ($(runtime),vc8)
+        OPENMP_FLAG = /openmp
+        WARNING_KEY += /Wp64
+        CPLUS_FLAGS += /D_USE_RTM_VERSION
+endif
+ifeq ($(runtime),vc9)
+        OPENMP_FLAG = /openmp
+endif
+
+ifeq (intel64,$(arch))
+        CPLUS_FLAGS += /GS-
+endif
+
+
+
+CPLUS_FLAGS += /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE \
+        /D_WIN32_WINNT=$(_WIN32_WINNT)
+C_FLAGS = $(CPLUS_FLAGS)
+#------------------------------------------------------------------------------
+# End of setting compiler flags.
+#------------------------------------------------------------------------------
+
+
+#------------------------------------------------------------------------------
+# Setting assembler data.
+#------------------------------------------------------------------------------
+ASSEMBLY_SOURCE=$(arch)-masm
+ifeq (intel64,$(arch))
+    ASM=ml64
+    ASM_FLAGS += /DEM64T=1 /c /Zi
+    TBB_ASM.OBJ = atomic_support.obj
+else
+    ASM=ml
+    ASM_FLAGS += /c /coff /Zi
+    TBB_ASM.OBJ = atomic_support.obj lock_byte.obj
+endif
+#------------------------------------------------------------------------------
+# End of setting assembler data.
+#------------------------------------------------------------------------------
+
+
+#------------------------------------------------------------------------------
+# Setting tbbmalloc data.
+#------------------------------------------------------------------------------
+M_CPLUS_FLAGS = $(subst $(EH_FLAGS),/EHs-,$(CPLUS_FLAGS))
+#------------------------------------------------------------------------------
+# End of setting tbbmalloc data.
+#------------------------------------------------------------------------------
+
+#------------------------------------------------------------------------------
+# End of define compiler-specific variables.
+#------------------------------------------------------------------------------
diff --git a/dep/tbb/build/windows.gcc.inc b/dep/tbb/build/windows.gcc.inc
new file mode 100644
index 000000000..b52d2a75b
--- /dev/null
+++ b/dep/tbb/build/windows.gcc.inc
@@ -0,0 +1,122 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+#------------------------------------------------------------------------------
+# Overriding settings from windows.inc
+#------------------------------------------------------------------------------
+
+SLASH= $(strip \)
+OBJ = o
+LIBEXT = dll # MinGW allows linking with DLLs directly
+
+TBB.RES =
+MALLOC.RES =
+TBB.MANIFEST =
+MALLOC.MANIFEST =
+
+# TODO: do better when/if mingw64 support is added
+TBB.DEF = $(tbb_root)/src/tbb/lin32-tbb-export.def
+MALLOC.DEF = $(MALLOC_ROOT)/win-gcc-tbbmalloc-export.def
+
+LINK_TBB.LIB = $(TBB.LIB)
+
+#------------------------------------------------------------------------------
+# End of overridden settings
+#------------------------------------------------------------------------------
+# Compiler-specific variables
+#------------------------------------------------------------------------------
+
+CPLUS = g++ 
+COMPILE_ONLY = -c -MMD
+PREPROC_ONLY = -E -x c
+INCLUDE_KEY = -I
+DEFINE_KEY = -D
+OUTPUT_KEY = -o #
+OUTPUTOBJ_KEY = -o #
+PIC_KEY =
+WARNING_AS_ERROR_KEY = -Werror
+WARNING_KEY = -Wall  -Wno-uninitialized
+WARNING_SUPPRESS = -Wno-parentheses
+DYLIB_KEY = -shared
+LIBDL = 
+EXPORT_KEY = -Wl,--version-script,
+LIBS = -lpsapi
+
+#------------------------------------------------------------------------------
+# End of compiler-specific variables
+#------------------------------------------------------------------------------
+# Command lines
+#------------------------------------------------------------------------------
+
+LINK_FLAGS = -Wl,--enable-auto-import
+LIB_LINK_FLAGS = $(DYLIB_KEY)
+
+ifeq ($(cfg), release)
+        CPLUS_FLAGS = -O2
+endif
+ifeq ($(cfg), debug)
+        CPLUS_FLAGS = -g -O0 -DTBB_USE_DEBUG
+endif
+CPLUS_FLAGS += -DUSE_WINTHREAD
+
+# MinGW specific
+CPLUS_FLAGS += -D__MSVCRT_VERSION__=0x0700 -msse -mthreads
+
+CONLY = gcc
+C_FLAGS = $(CPLUS_FLAGS)
+
+ifeq (intel64,$(arch))
+    CPLUS_FLAGS += -m64
+    LIB_LINK_FLAGS += -m64
+endif 
+
+ifeq (ia32,$(arch))
+    CPLUS_FLAGS += -m32
+    LIB_LINK_FLAGS += -m32
+endif 
+
+#------------------------------------------------------------------------------
+# End of command lines
+#------------------------------------------------------------------------------
+# Setting assembler data
+#------------------------------------------------------------------------------
+
+ASM=
+ASM_FLAGS=
+TBB_ASM.OBJ=
+ASSEMBLY_SOURCE=$(arch)-gas
+
+#------------------------------------------------------------------------------
+# End of setting assembler data
+#------------------------------------------------------------------------------
+# Setting tbbmalloc data
+#------------------------------------------------------------------------------
+
+M_CPLUS_FLAGS = $(CPLUS_FLAGS) -fno-rtti -fno-exceptions
+
+#------------------------------------------------------------------------------
+# End of setting tbbmalloc data
+#------------------------------------------------------------------------------
diff --git a/dep/tbb/build/windows.icl.inc b/dep/tbb/build/windows.icl.inc
new file mode 100644
index 000000000..386c5d8a5
--- /dev/null
+++ b/dep/tbb/build/windows.icl.inc
@@ -0,0 +1,144 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+#------------------------------------------------------------------------------
+# Define compiler-specific variables.
+#------------------------------------------------------------------------------
+
+
+#------------------------------------------------------------------------------
+# Setting default configuration to release.
+#------------------------------------------------------------------------------
+cfg ?= release
+#------------------------------------------------------------------------------
+# End of setting default configuration to release.
+#------------------------------------------------------------------------------
+
+
+#------------------------------------------------------------------------------
+# Setting compiler flags.
+#------------------------------------------------------------------------------
+CPLUS = icl /nologo $(VCCOMPAT_FLAG)
+LINK_FLAGS = /link /nologo
+LIB_LINK_FLAGS= /link /nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO
+MS_CRT_KEY = /MD$(if $(findstring debug,$(cfg)),d)
+EH_FLAGS = /EHsc /GR
+
+ifeq ($(cfg), release)
+    CPLUS_FLAGS = $(MS_CRT_KEY) /O2 /Zi $(EH_FLAGS) /Zc:forScope /Zc:wchar_t
+    ASM_FLAGS =
+ifeq (ia32,$(arch))
+    CPLUS_FLAGS += /Oy
+endif
+endif
+ifeq ($(cfg), debug)
+    CPLUS_FLAGS = $(MS_CRT_KEY) /Od /Ob0 /Zi $(EH_FLAGS) /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG
+    LINK_FLAGS += libmmds.lib /NODEFAULTLIB:libmmdd.lib
+    ASM_FLAGS = /DUSE_FRAME_POINTER
+endif
+
+
+COMPILE_ONLY = /c /QMMD
+PREPROC_ONLY = /EP /Tc
+INCLUDE_KEY = /I
+DEFINE_KEY = /D
+OUTPUT_KEY = /Fe
+OUTPUTOBJ_KEY = /Fo
+WARNING_AS_ERROR_KEY = /WX
+WARNING_KEY = /W3
+DYLIB_KEY = /DLL
+EXPORT_KEY = /DEF:
+
+ifeq (intel64,$(arch))
+    CPLUS_FLAGS += /GS-
+endif
+
+ifneq (,$(codecov))
+    CPLUS_FLAGS += /Qprof-genx
+else
+    CPLUS_FLAGS += /DDO_ITT_NOTIFY
+endif
+
+OPENMP_FLAG = /Qopenmp
+CPLUS_FLAGS += /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE \
+               /D_WIN32_WINNT=$(_WIN32_WINNT)
+
+ifeq ($(runtime),vc8)
+        CPLUS_FLAGS += /D_USE_RTM_VERSION
+endif
+
+C_FLAGS = $(CPLUS_FLAGS)
+
+ifneq (00,$(lambdas)$(cpp0x))
+	CPLUS_FLAGS += /Qstd=c++0x /D_TBB_CPP0X
+endif
+
+VCVERSION:=$(runtime)
+VCCOMPAT_FLAG := $(if $(findstring vc7.1, $(VCVERSION)),/Qvc7.1)
+ifeq ($(VCCOMPAT_FLAG),)
+        VCCOMPAT_FLAG := $(if $(findstring vc8, $(VCVERSION)),/Qvc8)
+endif
+ifeq ($(VCCOMPAT_FLAG),)
+        VCCOMPAT_FLAG := $(if $(findstring vc9, $(VCVERSION)),/Qvc9)
+endif
+ifeq ($(VCCOMPAT_FLAG),)
+        $(error VC version not detected correctly: $(VCVERSION) )
+endif
+export VCCOMPAT_FLAG
+#------------------------------------------------------------------------------
+# End of setting compiler flags.
+#------------------------------------------------------------------------------
+
+
+#------------------------------------------------------------------------------
+# Setting assembler data.
+#------------------------------------------------------------------------------
+ASSEMBLY_SOURCE=$(arch)-masm
+ifeq (intel64,$(arch))
+    ASM=ml64
+    ASM_FLAGS += /DEM64T=1 /c /Zi
+    TBB_ASM.OBJ = atomic_support.obj
+else
+    ASM=ml
+    ASM_FLAGS += /c /coff /Zi
+    TBB_ASM.OBJ = atomic_support.obj lock_byte.obj
+endif
+#------------------------------------------------------------------------------
+# End of setting assembler data.
+#------------------------------------------------------------------------------
+
+
+#------------------------------------------------------------------------------
+# Setting tbbmalloc data.
+#------------------------------------------------------------------------------
+M_CPLUS_FLAGS = $(subst $(EH_FLAGS),/EHs-,$(CPLUS_FLAGS))
+#------------------------------------------------------------------------------
+# End of setting tbbmalloc data.
+#------------------------------------------------------------------------------
+
+#------------------------------------------------------------------------------
+# End of define compiler-specific variables.
+#------------------------------------------------------------------------------
diff --git a/dep/tbb/build/windows.inc b/dep/tbb/build/windows.inc
new file mode 100644
index 000000000..400864fe6
--- /dev/null
+++ b/dep/tbb/build/windows.inc
@@ -0,0 +1,100 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+ifdef tbb_build_dir
+  test_dir:=$(tbb_build_dir)
+else
+  test_dir:=.
+endif
+
+# TODO give an error if archs doesn't match
+ifndef arch
+  export arch:=$(shell cmd /C "cscript /nologo /E:jscript $(tbb_root)/build/detect.js /arch $(compiler)")
+endif
+
+ifndef runtime
+  export runtime:=$(shell cmd /C "cscript /nologo /E:jscript $(tbb_root)/build/detect.js /runtime $(compiler)")
+endif
+
+native_compiler := cl
+export compiler ?= cl
+debugger ?= devenv /debugexe
+
+CMD=cmd /C
+CWD=$(shell cmd /C echo %CD%)
+RM=cmd /C del /Q /F
+RD=cmd /C rmdir
+MD=cmd /c mkdir
+SLASH=\\
+NUL = nul
+
+OBJ = obj
+DLL = dll
+LIBEXT = lib
+
+def_prefix = $(if $(findstring ia32,$(arch)),win32,win64)
+
+# Target Windows version. Do not increase beyond 0x0500 without prior discussion!
+# Used as the value for macro definition opiton in windows.cl.inc etc.
+_WIN32_WINNT=0x0400
+
+TBB.DEF = $(tbb_root)/src/tbb/$(def_prefix)-tbb-export.def
+TBB.DLL = tbb$(DEBUG_SUFFIX).$(DLL)
+TBB.LIB = tbb$(DEBUG_SUFFIX).$(LIBEXT)
+TBB.RES = tbb_resource.res
+# On Windows, we use #pragma comment to set the proper TBB lib to link with
+# But for cross-configuration testing, need to link explicitly
+LINK_TBB.LIB = $(if $(crosstest),$(TBB.LIB))
+TBB.MANIFEST = 
+ifneq ($(filter vc8 vc9,$(runtime)),)
+    TBB.MANIFEST = tbbmanifest.exe.manifest
+endif
+
+MALLOC.DEF = $(MALLOC_ROOT)/$(def_prefix)-tbbmalloc-export.def
+MALLOC.DLL = tbbmalloc$(DEBUG_SUFFIX).$(DLL)
+MALLOC.LIB = tbbmalloc$(DEBUG_SUFFIX).$(LIBEXT)
+MALLOC.RES = tbbmalloc.res
+MALLOC.MANIFEST =
+ifneq ($(filter vc8 vc9,$(runtime)),)
+MALLOC.MANIFEST = tbbmanifest.exe.manifest
+endif
+LINK_MALLOC.LIB = $(MALLOC.LIB)
+
+MALLOCPROXY.DLL = tbbmalloc_proxy$(DEBUG_SUFFIX).$(DLL)
+MALLOCPROXY.LIB = tbbmalloc_proxy$(DEBUG_SUFFIX).$(LIBEXT)
+
+RML.DEF = $(RML_SERVER_ROOT)/$(def_prefix)-rml-export.def
+RML.DLL = irml$(DEBUG_SUFFIX).$(DLL)
+RML.LIB = irml$(DEBUG_SUFFIX).$(LIBEXT)
+RML.RES = irml.res
+ifneq ($(runtime),vc7.1)
+RML.MANIFEST = tbbmanifest.exe.manifest
+endif
+
+MAKE_VERSIONS = cmd /C cscript /nologo /E:jscript $(subst \,/,$(tbb_root))/build/version_info_windows.js $(compiler) $(arch) $(subst \,/,"$(CPLUS) $(CPLUS_FLAGS) $(INCLUDES)") > version_string.tmp
+MAKE_TBBVARS  = cmd /C "$(subst /,\,$(tbb_root))\build\generate_tbbvars.bat"
+
+TEST_LAUNCHER =  $(subst /,\,$(tbb_root))\build\test_launcher.bat
diff --git a/dep/tbb/build/winlrb.cl.inc b/dep/tbb/build/winlrb.cl.inc
new file mode 100644
index 000000000..618dba5bf
--- /dev/null
+++ b/dep/tbb/build/winlrb.cl.inc
@@ -0,0 +1,66 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+include $(tbb_root)/build/windows.cl.inc
+
+ifeq ($(cfg), debug)
+    CFG_LETTER = d
+else
+    CFG_LETTER = r
+endif
+
+_CPLUS_FLAGS_HOST := $(CPLUS_FLAGS) /I$(LRB_INC_DIR) $(LINK_FLAGS) /LIBPATH:$(LRB_LIB_DIR) xn_host$(LRB_HOST_ARCH)$(CFG_LETTER).lib
+
+TEST_EXT = dll
+CPLUS_FLAGS += /I$(LRB_INC_DIR) /D__LRB__
+LIB_LINK_FLAGS += /LIBPATH:$(LRB_LIB_DIR) xn_lrb$(LRB_HOST_ARCH)$(CFG_LETTER).lib
+LINK_FLAGS = $(LIB_LINK_FLAGS)
+OPENMP_FLAG =
+
+ifdef TEST_RESOURCE
+LINK_FLAGS += $(TEST_RESOURCE)
+
+TEST_LAUNCHER_NAME = harness_lrb_host
+AUX_TEST_DEPENDENCIES = $(TEST_LAUNCHER_NAME).exe
+
+$(TEST_LAUNCHER_NAME).exe: $(TEST_LAUNCHER_NAME).cpp
+	cl /Fe$@ $< $(_CPLUS_FLAGS_HOST)
+
+NO_LEGACY_TESTS = 1
+NO_C_TESTS = 1
+TEST_LAUNCHER=
+endif # TEST_RESOURCE
+
+#test_model_plugin.%:
+#	@echo test_model_plugin is not supported for LRB architecture so far
+
+ifeq ($(BUILDING_PHASE),0)  # examples
+    export RM = del /Q /F
+    export LIBS = -shared -lthr -z muldefs -L$(work_dir)_debug -L$(work_dir)_release
+    export UI = con
+    export x64 = 64
+    export CXXFLAGS = -xR -I..\..\..\include
+endif # examples
diff --git a/dep/tbb/build/winlrb.icc.inc b/dep/tbb/build/winlrb.icc.inc
new file mode 100644
index 000000000..427d06c9d
--- /dev/null
+++ b/dep/tbb/build/winlrb.icc.inc
@@ -0,0 +1,49 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+
+include $(tbb_root)/build/winlrb.cl.inc
+
+TEST_EXT = so
+.PRECIOUS: %.$(TEST_EXT)
+
+include $(tbb_root)/build/freebsd.gcc.inc
+
+WARNING_KEY = -w1
+CPLUS = icpc
+CONLY = icc
+#LIBS = -u _read -lcprts -lthr -lc
+#LIBS = -lthr
+LIBS = -u _read -lcprts -lthr -limf -lc
+LINK_FLAGS = -L$(LRB_LIB_DIR) $(DYLIB_KEY) -lxn$(XN_VER)_lrb64$(CFG_LETTER)
+CPLUS_FLAGS += -xR $(PIC_KEY) -I$(LRB_INC_DIR) -DXENSIM
+C_FLAGS = $(CPLUS_FLAGS)
+LIB_LINK_FLAGS = $(LINK_FLAGS)
+
+ifeq ($(cfg), release)
+    # workaround for LRB compiler issues
+    CPLUS_FLAGS := $(subst -O2,-O0, $(CPLUS_FLAGS))
+endif
diff --git a/dep/tbb/build/winlrb.inc b/dep/tbb/build/winlrb.inc
new file mode 100644
index 000000000..f72c66fde
--- /dev/null
+++ b/dep/tbb/build/winlrb.inc
@@ -0,0 +1,88 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+ifndef XN_VER
+export LRBSDK = $(LARRABEE_CORE_LATEST)
+export LRB_LIB_DIR = "$(LRBSDK)lib"
+export LRB_INC_DIR = "$(LRBSDK)include"
+
+# Function $(wildcard pattern) does not work with paths containing spaces!
+_lrb_lib = $(shell cmd /C "dir /B "$(LRBSDK)lib\libxn*_lrb64d.so")
+export XN_VER = $(patsubst libxn%_lrb64d.so,%,$(_lrb_lib))
+
+ifeq (1,$(NETSIM_LRB_32_OVERRIDE))
+    export LRB_HOST_ARCH = 32
+else
+    export LRB_HOST_ARCH = 64
+endif
+
+export run_cmd = harness_lrb_host.exe
+
+export UI = con
+
+endif #XN_VER
+
+include $(tbb_root)/build/windows.inc
+
+ifneq (1,$(netsim))
+# Target environment is native LRB or LrbFSim
+
+export compiler = icc
+export arch := lrb
+
+target_machine = $(subst -,_,$(shell icpc -dumpmachine))
+runtime = $(subst _lrb_,_,$(target_machine))
+# -dumpmachine option does not work in R9 Core SDK 5
+ifeq ($(runtime),)
+    runtime = x86_64_freebsd
+endif
+export runtime:=$(runtime)_xn$(XN_VER)
+
+OBJ = o
+DLL = so
+LIBEXT = so
+
+TBB.DEF =
+TBB.DLL = libtbb$(DEBUG_SUFFIX).$(DLL)
+TBB.LIB = $(TBB.DLL)
+LINK_TBB.LIB = $(TBB.DLL)
+TBB.RES =
+
+MALLOC.DEF :=
+MALLOC.DLL = libtbbmalloc$(DEBUG_SUFFIX).$(DLL)
+MALLOC.LIB = $(MALLOC.DLL)
+MALLOC.RES = 
+
+MAKE_VERSIONS = cmd /C cscript /nologo /E:jscript $(subst \,/,$(tbb_root))/build/version_info_winlrb.js $(compiler) $(arch) $(subst \,/,"$(CPLUS) $(CPLUS_FLAGS) $(INCLUDES)") > version_string.tmp
+MAKE_TBBVARS  = cmd /C "$(subst /,\,$(tbb_root))\build\generate_tbbvars.bat"
+
+ifneq (1,$(XENSIM_ENABLED))
+    export run_cmd = rem
+endif
+
+TBB_NOSTRICT = 1
+
+endif # lrbfsim
diff --git a/dep/tbb/include/index.html b/dep/tbb/include/index.html
new file mode 100644
index 000000000..f80c5d491
--- /dev/null
+++ b/dep/tbb/include/index.html
@@ -0,0 +1,24 @@
+<HTML>
+<BODY>
+
+<H2>Overview</H2>
+Include files for Threading Building Blocks.
+
+<H2>Directories</H2>
+<DL>
+<DT><A HREF="tbb/index.html">tbb</A>
+<DD>Include files for Threading Building Blocks classes and functions.
+</DL>
+
+<HR>
+<A HREF="../index.html">Up to parent directory</A>
+<p></p>
+Copyright &copy; 2005-2009 Intel Corporation.  All Rights Reserved.
+<p></p>
+Intel, Pentium, Intel Xeon, Itanium, Intel XScale and VTune are 
+registered trademarks or trademarks of Intel Corporation or its 
+subsidiaries in the United States and other countries. 
+<p></p>
+* Other names and brands may be claimed as the property of others.
+</BODY>
+</HTML>
diff --git a/dep/tbb/include/tbb/_concurrent_queue_internal.h b/dep/tbb/include/tbb/_concurrent_queue_internal.h
new file mode 100644
index 000000000..418065dd8
--- /dev/null
+++ b/dep/tbb/include/tbb/_concurrent_queue_internal.h
@@ -0,0 +1,973 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_concurrent_queue_internal_H
+#define __TBB_concurrent_queue_internal_H
+
+#include "tbb_stddef.h"
+#include "tbb_machine.h"
+#include "atomic.h"
+#include "spin_mutex.h"
+#include "cache_aligned_allocator.h"
+#include "tbb_exception.h"
+#include <iterator>
+#include <new>
+
+namespace tbb {
+
+#if !__TBB_TEMPLATE_FRIENDS_BROKEN
+
+// forward declaration
+namespace strict_ppl {
+template<typename T, typename A> class concurrent_queue;
+}
+
+template<typename T, typename A> class concurrent_bounded_queue;
+
+namespace deprecated {
+template<typename T, typename A> class concurrent_queue;
+}
+#endif
+
+//! For internal use only.
+namespace strict_ppl {
+
+//! @cond INTERNAL
+namespace internal {
+
+using namespace tbb::internal;
+
+typedef size_t ticket;
+
+static void* invalid_page;
+
+template<typename T> class micro_queue ;
+template<typename T> class micro_queue_pop_finalizer ;
+template<typename T> class concurrent_queue_base_v3;
+
+//! parts of concurrent_queue_rep that do not have references to micro_queue
+/**
+ * For internal use only.
+ */
+struct concurrent_queue_rep_base : no_copy {
+    template<typename T> friend class micro_queue;
+    template<typename T> friend class concurrent_queue_base_v3;
+
+protected:
+    //! Approximately n_queue/golden ratio
+    static const size_t phi = 3;
+
+public:
+    // must be power of 2
+    static const size_t n_queue = 8;
+
+    //! Prefix on a page
+    struct page {
+        page* next;
+        uintptr_t mask; 
+    };
+
+    atomic<ticket> head_counter;
+    char pad1[NFS_MaxLineSize-sizeof(atomic<ticket>)];
+    atomic<ticket> tail_counter;
+    char pad2[NFS_MaxLineSize-sizeof(atomic<ticket>)];
+
+    //! Always a power of 2
+    size_t items_per_page;
+
+    //! Size of an item
+    size_t item_size;
+
+    //! number of invalid entries in the queue
+    atomic<size_t> n_invalid_entries;
+
+    char pad3[NFS_MaxLineSize-sizeof(size_t)-sizeof(size_t)-sizeof(atomic<size_t>)];
+} ;
+
+//! Abstract class to define interface for page allocation/deallocation
+/**
+ * For internal use only.
+ */
+class concurrent_queue_page_allocator
+{
+    template<typename T> friend class micro_queue ;
+    template<typename T> friend class micro_queue_pop_finalizer ;
+protected:
+    virtual ~concurrent_queue_page_allocator() {}
+private:
+    virtual concurrent_queue_rep_base::page* allocate_page() = 0;
+    virtual void deallocate_page( concurrent_queue_rep_base::page* p ) = 0;
+} ;
+
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+// unary minus operator applied to unsigned type, result still unsigned
+#pragma warning( push )
+#pragma warning( disable: 4146 )
+#endif
+
+//! A queue using simple locking.
+/** For efficient, this class has no constructor.  
+    The caller is expected to zero-initialize it. */
+template<typename T>
+class micro_queue : no_copy {
+    typedef concurrent_queue_rep_base::page page;
+
+    //! Class used to ensure exception-safety of method "pop" 
+    class destroyer: no_copy {
+        T& my_value;
+    public:
+        destroyer( T& value ) : my_value(value) {}
+        ~destroyer() {my_value.~T();}          
+    };
+
+    T& get_ref( page& page, size_t index ) {
+        return static_cast<T*>(static_cast<void*>(&page+1))[index];
+    }
+
+    void copy_item( page& dst, size_t index, const void* src ) {
+        new( &get_ref(dst,index) ) T(*static_cast<const T*>(src)); 
+    }
+
+    void copy_item( page& dst, size_t dindex, const page& src, size_t sindex ) {
+        new( &get_ref(dst,dindex) ) T( static_cast<const T*>(static_cast<const void*>(&src+1))[sindex] );
+    }
+
+    void assign_and_destroy_item( void* dst, page& src, size_t index ) {
+        T& from = get_ref(src,index);
+        destroyer d(from);
+        *static_cast<T*>(dst) = from;
+    }
+
+    void spin_wait_until_my_turn( atomic<ticket>& counter, ticket k, concurrent_queue_rep_base& rb ) const ;
+
+public:
+    friend class micro_queue_pop_finalizer<T>;
+
+    atomic<page*> head_page;
+    atomic<ticket> head_counter;
+
+    atomic<page*> tail_page;
+    atomic<ticket> tail_counter;
+
+    spin_mutex page_mutex;
+    
+    void push( const void* item, ticket k, concurrent_queue_base_v3<T>& base ) ;
+
+    bool pop( void* dst, ticket k, concurrent_queue_base_v3<T>& base ) ;
+
+    micro_queue& assign( const micro_queue& src, concurrent_queue_base_v3<T>& base ) ;
+
+    page* make_copy( concurrent_queue_base_v3<T>& base, const page* src_page, size_t begin_in_page, size_t end_in_page, ticket& g_index ) ;
+
+    void make_invalid( ticket k ) ;
+};
+
+template<typename T>
+void micro_queue<T>::spin_wait_until_my_turn( atomic<ticket>& counter, ticket k, concurrent_queue_rep_base& rb ) const {
+    atomic_backoff backoff;
+    do {
+        backoff.pause();
+        if( counter&0x1 ) {
+            ++rb.n_invalid_entries;
+            throw_bad_last_alloc_exception_v4();
+        }
+    } while( counter!=k ) ;
+}
+
+template<typename T>
+void micro_queue<T>::push( const void* item, ticket k, concurrent_queue_base_v3<T>& base ) {
+    k &= -concurrent_queue_rep_base::n_queue;
+    page* p = NULL;
+    size_t index = k/concurrent_queue_rep_base::n_queue & (base.my_rep->items_per_page-1);
+    if( !index ) {
+        try {
+            concurrent_queue_page_allocator& pa = base;
+            p = pa.allocate_page();
+        } catch (...) {
+            ++base.my_rep->n_invalid_entries;
+            make_invalid( k );
+        }
+        p->mask = 0;
+        p->next = NULL;
+    }
+    
+    if( tail_counter!=k ) spin_wait_until_my_turn( tail_counter, k, *base.my_rep );
+        
+    if( p ) {
+        spin_mutex::scoped_lock lock( page_mutex );
+        if( page* q = tail_page )
+            q->next = p;
+        else
+            head_page = p; 
+        tail_page = p;
+    } else {
+        p = tail_page;
+    }
+   
+    try {
+        copy_item( *p, index, item );
+        // If no exception was thrown, mark item as present.
+        p->mask |= uintptr_t(1)<<index;
+        tail_counter += concurrent_queue_rep_base::n_queue; 
+    } catch (...) {
+        ++base.my_rep->n_invalid_entries;
+        tail_counter += concurrent_queue_rep_base::n_queue; 
+        throw;
+    }
+}
+
+template<typename T>
+bool micro_queue<T>::pop( void* dst, ticket k, concurrent_queue_base_v3<T>& base ) {
+    k &= -concurrent_queue_rep_base::n_queue;
+    if( head_counter!=k ) spin_wait_until_eq( head_counter, k );
+    if( tail_counter==k ) spin_wait_while_eq( tail_counter, k );
+    page& p = *head_page;
+    __TBB_ASSERT( &p, NULL );
+    size_t index = k/concurrent_queue_rep_base::n_queue & (base.my_rep->items_per_page-1);
+    bool success = false; 
+    {
+        micro_queue_pop_finalizer<T> finalizer( *this, base, k+concurrent_queue_rep_base::n_queue, index==base.my_rep->items_per_page-1 ? &p : NULL ); 
+        if( p.mask & uintptr_t(1)<<index ) {
+            success = true;
+            assign_and_destroy_item( dst, p, index );
+        } else {
+            --base.my_rep->n_invalid_entries;
+        }
+    }
+    return success;
+}
+
+template<typename T>
+micro_queue<T>& micro_queue<T>::assign( const micro_queue<T>& src, concurrent_queue_base_v3<T>& base ) {
+    head_counter = src.head_counter;
+    tail_counter = src.tail_counter;
+    page_mutex   = src.page_mutex;
+
+    const page* srcp = src.head_page;
+    if( srcp ) {
+        ticket g_index = head_counter;
+        try {
+            size_t n_items  = (tail_counter-head_counter)/concurrent_queue_rep_base::n_queue;
+            size_t index = head_counter/concurrent_queue_rep_base::n_queue & (base.my_rep->items_per_page-1);
+            size_t end_in_first_page = (index+n_items<base.my_rep->items_per_page)?(index+n_items):base.my_rep->items_per_page;
+
+            head_page = make_copy( base, srcp, index, end_in_first_page, g_index );
+            page* cur_page = head_page;
+
+            if( srcp != src.tail_page ) {
+                for( srcp = srcp->next; srcp!=src.tail_page; srcp=srcp->next ) {
+                    cur_page->next = make_copy( base, srcp, 0, base.my_rep->items_per_page, g_index );
+                    cur_page = cur_page->next;
+                }
+
+                __TBB_ASSERT( srcp==src.tail_page, NULL );
+                size_t last_index = tail_counter/concurrent_queue_rep_base::n_queue & (base.my_rep->items_per_page-1);
+                if( last_index==0 ) last_index = base.my_rep->items_per_page;
+
+                cur_page->next = make_copy( base, srcp, 0, last_index, g_index );
+                cur_page = cur_page->next;
+            }
+            tail_page = cur_page;
+        } catch (...) {
+            make_invalid( g_index );
+        }
+    } else {
+        head_page = tail_page = NULL;
+    }
+    return *this;
+}
+
+template<typename T>
+void micro_queue<T>::make_invalid( ticket k ) {
+    static page dummy = {static_cast<page*>((void*)1), 0};
+    // mark it so that no more pushes are allowed.
+    invalid_page = &dummy;
+    {
+        spin_mutex::scoped_lock lock( page_mutex );
+        tail_counter = k+concurrent_queue_rep_base::n_queue+1;
+        if( page* q = tail_page )
+            q->next = static_cast<page*>(invalid_page);
+        else
+            head_page = static_cast<page*>(invalid_page); 
+        tail_page = static_cast<page*>(invalid_page);
+    }
+    throw;
+}
+
+template<typename T>
+concurrent_queue_rep_base::page* micro_queue<T>::make_copy( concurrent_queue_base_v3<T>& base, const concurrent_queue_rep_base::page* src_page, size_t begin_in_page, size_t end_in_page, ticket& g_index ) {
+    concurrent_queue_page_allocator& pa = base;
+    page* new_page = pa.allocate_page();
+    new_page->next = NULL;
+    new_page->mask = src_page->mask;
+    for( ; begin_in_page!=end_in_page; ++begin_in_page, ++g_index )
+        if( new_page->mask & uintptr_t(1)<<begin_in_page )
+            copy_item( *new_page, begin_in_page, *src_page, begin_in_page );
+    return new_page;
+}
+
+template<typename T>
+class micro_queue_pop_finalizer: no_copy {
+    typedef concurrent_queue_rep_base::page page;
+    ticket my_ticket;
+    micro_queue<T>& my_queue;
+    page* my_page; 
+    concurrent_queue_page_allocator& allocator;
+public:
+    micro_queue_pop_finalizer( micro_queue<T>& queue, concurrent_queue_base_v3<T>& b, ticket k, page* p ) :
+        my_ticket(k), my_queue(queue), my_page(p), allocator(b)
+    {}
+    ~micro_queue_pop_finalizer() ;
+};
+
+template<typename T>
+micro_queue_pop_finalizer<T>::~micro_queue_pop_finalizer() {
+    page* p = my_page;
+    if( p ) {
+        spin_mutex::scoped_lock lock( my_queue.page_mutex );
+        page* q = p->next;
+        my_queue.head_page = q;
+        if( !q ) {
+            my_queue.tail_page = NULL;
+        }
+    }
+    my_queue.head_counter = my_ticket;
+    if( p ) {
+        allocator.deallocate_page( p );
+    }
+}
+
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+#pragma warning( pop )
+#endif // warning 4146 is back
+
+template<typename T> class concurrent_queue_iterator_rep ;
+template<typename T> class concurrent_queue_iterator_base_v3;
+
+//! representation of concurrent_queue_base
+/**
+ * the class inherits from concurrent_queue_rep_base and defines an array of micro_queue<T>'s
+ */
+template<typename T>
+struct concurrent_queue_rep : public concurrent_queue_rep_base {
+    micro_queue<T> array[n_queue];
+
+    //! Map ticket to an array index
+    static size_t index( ticket k ) {
+        return k*phi%n_queue;
+    }
+
+    micro_queue<T>& choose( ticket k ) {
+        // The formula here approximates LRU in a cache-oblivious way.
+        return array[index(k)];
+    }
+};
+
+//! base class of concurrent_queue
+/**
+ * The class implements the interface defined by concurrent_queue_page_allocator
+ * and has a pointer to an instance of concurrent_queue_rep.
+ */
+template<typename T>
+class concurrent_queue_base_v3: public concurrent_queue_page_allocator {
+    //! Internal representation
+    concurrent_queue_rep<T>* my_rep;
+
+    friend struct concurrent_queue_rep<T>;
+    friend class micro_queue<T>;
+    friend class concurrent_queue_iterator_rep<T>;
+    friend class concurrent_queue_iterator_base_v3<T>;
+
+protected:
+    typedef typename concurrent_queue_rep<T>::page page;
+
+private:
+    /* override */ virtual page *allocate_page() {
+        concurrent_queue_rep<T>& r = *my_rep;
+        size_t n = sizeof(page) + r.items_per_page*r.item_size;
+        return reinterpret_cast<page*>(allocate_block ( n ));
+    }
+
+    /* override */ virtual void deallocate_page( concurrent_queue_rep_base::page *p ) {
+        concurrent_queue_rep<T>& r = *my_rep;
+        size_t n = sizeof(page) + r.items_per_page*r.item_size;
+        deallocate_block( reinterpret_cast<void*>(p), n );
+    }
+
+    //! custom allocator
+    virtual void *allocate_block( size_t n ) = 0;
+
+    //! custom de-allocator
+    virtual void deallocate_block( void *p, size_t n ) = 0;
+
+protected:
+    concurrent_queue_base_v3( size_t item_size ) ;
+
+    /* override */ virtual ~concurrent_queue_base_v3() {
+        size_t nq = my_rep->n_queue;
+        for( size_t i=0; i<nq; i++ )
+            __TBB_ASSERT( my_rep->array[i].tail_page==NULL, "pages were not freed properly" );
+        cache_aligned_allocator<concurrent_queue_rep<T> >().deallocate(my_rep,1);
+    }
+
+    //! Enqueue item at tail of queue
+    void internal_push( const void* src ) {
+        concurrent_queue_rep<T>& r = *my_rep;
+        ticket k = r.tail_counter++;
+        r.choose(k).push( src, k, *this );
+    }
+
+    //! Attempt to dequeue item from queue.
+    /** NULL if there was no item to dequeue. */
+    bool internal_try_pop( void* dst ) ;
+
+    //! Get size of queue; result may be invalid if queue is modified concurrently
+    size_t internal_size() const ;
+
+    //! check if the queue is empty; thread safe
+    bool internal_empty() const ;
+
+    //! free any remaining pages
+    /* note that the name may be misleading, but it remains so due to a historical accident. */
+    void internal_finish_clear() ;
+
+    //! throw an exception
+    void internal_throw_exception() const {
+        throw std::bad_alloc();
+    }
+
+    //! copy internal representation
+    void assign( const concurrent_queue_base_v3& src ) ;
+};
+
+template<typename T>
+concurrent_queue_base_v3<T>::concurrent_queue_base_v3( size_t item_size ) {
+    my_rep = cache_aligned_allocator<concurrent_queue_rep<T> >().allocate(1);
+    __TBB_ASSERT( (size_t)my_rep % NFS_GetLineSize()==0, "alignment error" );
+    __TBB_ASSERT( (size_t)&my_rep->head_counter % NFS_GetLineSize()==0, "alignment error" );
+    __TBB_ASSERT( (size_t)&my_rep->tail_counter % NFS_GetLineSize()==0, "alignment error" );
+    __TBB_ASSERT( (size_t)&my_rep->array % NFS_GetLineSize()==0, "alignment error" );
+    memset(my_rep,0,sizeof(concurrent_queue_rep<T>));
+    my_rep->item_size = item_size;
+    my_rep->items_per_page = item_size<=8 ? 32 :
+                             item_size<=16 ? 16 : 
+                             item_size<=32 ? 8 :
+                             item_size<=64 ? 4 :
+                             item_size<=128 ? 2 :
+                             1;
+}
+
+template<typename T>
+bool concurrent_queue_base_v3<T>::internal_try_pop( void* dst ) {
+    concurrent_queue_rep<T>& r = *my_rep;
+    ticket k;
+    do {
+        k = r.head_counter;
+        for(;;) {
+            if( r.tail_counter<=k ) {
+                // Queue is empty 
+                return false;
+            }
+            // Queue had item with ticket k when we looked.  Attempt to get that item.
+            ticket tk=k;
+#if defined(_MSC_VER) && defined(_Wp64)
+    #pragma warning (push)
+    #pragma warning (disable: 4267)
+#endif
+            k = r.head_counter.compare_and_swap( tk+1, tk );
+#if defined(_MSC_VER) && defined(_Wp64)
+    #pragma warning (pop)
+#endif
+            if( k==tk )
+                break;
+            // Another thread snatched the item, retry.
+        }
+    } while( !r.choose( k ).pop( dst, k, *this ) );
+    return true;
+}
+
+template<typename T>
+size_t concurrent_queue_base_v3<T>::internal_size() const {
+    concurrent_queue_rep<T>& r = *my_rep;
+    __TBB_ASSERT( sizeof(ptrdiff_t)<=sizeof(size_t), NULL );
+    ticket hc = r.head_counter;
+    size_t nie = r.n_invalid_entries;
+    ticket tc = r.tail_counter;
+    __TBB_ASSERT( hc!=tc || !nie, NULL );
+    ptrdiff_t sz = tc-hc-nie;
+    return sz<0 ? 0 :  size_t(sz);
+}
+
+template<typename T>
+bool concurrent_queue_base_v3<T>::internal_empty() const {
+    concurrent_queue_rep<T>& r = *my_rep;
+    ticket tc = r.tail_counter;
+    ticket hc = r.head_counter;
+    // if tc!=r.tail_counter, the queue was not empty at some point between the two reads.
+    return tc==r.tail_counter && tc==hc+r.n_invalid_entries ;
+}
+
+template<typename T>
+void concurrent_queue_base_v3<T>::internal_finish_clear() {
+    concurrent_queue_rep<T>& r = *my_rep;
+    size_t nq = r.n_queue;
+    for( size_t i=0; i<nq; ++i ) {
+        page* tp = r.array[i].tail_page;
+        __TBB_ASSERT( r.array[i].head_page==tp, "at most one page should remain" );
+        if( tp!=NULL) {
+            if( tp!=invalid_page ) deallocate_page( tp );
+            r.array[i].tail_page = NULL;
+        }
+    }
+}
+
+template<typename T>
+void concurrent_queue_base_v3<T>::assign( const concurrent_queue_base_v3& src ) {
+    concurrent_queue_rep<T>& r = *my_rep;
+    r.items_per_page = src.my_rep->items_per_page;
+
+    // copy concurrent_queue_rep.
+    r.head_counter = src.my_rep->head_counter;
+    r.tail_counter = src.my_rep->tail_counter;
+    r.n_invalid_entries = src.my_rep->n_invalid_entries;
+
+    // copy micro_queues
+    for( size_t i = 0; i<r.n_queue; ++i )
+        r.array[i].assign( src.my_rep->array[i], *this);
+
+    __TBB_ASSERT( r.head_counter==src.my_rep->head_counter && r.tail_counter==src.my_rep->tail_counter, 
+            "the source concurrent queue should not be concurrently modified." );
+}
+
+template<typename Container, typename Value> class concurrent_queue_iterator;
+
+template<typename T>
+class concurrent_queue_iterator_rep: no_assign {
+public:
+    ticket head_counter;
+    const concurrent_queue_base_v3<T>& my_queue;
+    typename concurrent_queue_base_v3<T>::page* array[concurrent_queue_rep<T>::n_queue];
+    concurrent_queue_iterator_rep( const concurrent_queue_base_v3<T>& queue ) :
+        head_counter(queue.my_rep->head_counter),
+        my_queue(queue)
+    {
+        for( size_t k=0; k<concurrent_queue_rep<T>::n_queue; ++k )
+            array[k] = queue.my_rep->array[k].head_page;
+    }
+
+    //! Set item to point to kth element.  Return true if at end of queue or item is marked valid; false otherwise.
+    bool get_item( void*& item, size_t k ) ;
+};
+
+template<typename T>
+bool concurrent_queue_iterator_rep<T>::get_item( void*& item, size_t k ) {
+    if( k==my_queue.my_rep->tail_counter ) {
+        item = NULL;
+        return true;
+    } else {
+        typename concurrent_queue_base_v3<T>::page* p = array[concurrent_queue_rep<T>::index(k)];
+        __TBB_ASSERT(p,NULL);
+        size_t i = k/concurrent_queue_rep<T>::n_queue & (my_queue.my_rep->items_per_page-1);
+        item = static_cast<unsigned char*>(static_cast<void*>(p+1)) + my_queue.my_rep->item_size*i;
+        return (p->mask & uintptr_t(1)<<i)!=0;
+    }
+}
+
+//! Type-independent portion of concurrent_queue_iterator.
+/** @ingroup containers */
+template<typename Value>
+class concurrent_queue_iterator_base_v3 : no_assign {
+    //! Concurrentconcurrent_queue over which we are iterating.
+    /** NULL if one past last element in queue. */
+    concurrent_queue_iterator_rep<Value>* my_rep;
+
+    template<typename C, typename T, typename U>
+    friend bool operator==( const concurrent_queue_iterator<C,T>& i, const concurrent_queue_iterator<C,U>& j );
+
+    template<typename C, typename T, typename U>
+    friend bool operator!=( const concurrent_queue_iterator<C,T>& i, const concurrent_queue_iterator<C,U>& j );
+protected:
+    //! Pointer to current item
+    mutable void* my_item;
+
+public:
+    //! Default constructor
+    concurrent_queue_iterator_base_v3() : my_rep(NULL), my_item(NULL) {
+#if __GNUC__==4&&__GNUC_MINOR__==3
+        // to get around a possible gcc 4.3 bug
+        __asm__ __volatile__("": : :"memory");
+#endif
+    }
+
+    //! Copy constructor
+    concurrent_queue_iterator_base_v3( const concurrent_queue_iterator_base_v3& i ) : my_rep(NULL), my_item(NULL) {
+        assign(i);
+    }
+
+    //! Construct iterator pointing to head of queue.
+    concurrent_queue_iterator_base_v3( const concurrent_queue_base_v3<Value>& queue ) ;
+
+protected:
+    //! Assignment
+    void assign( const concurrent_queue_iterator_base_v3<Value>& other ) ;
+
+    //! Advance iterator one step towards tail of queue.
+    void advance() ;
+
+    //! Destructor
+    ~concurrent_queue_iterator_base_v3() {
+        cache_aligned_allocator<concurrent_queue_iterator_rep<Value> >().deallocate(my_rep, 1);
+        my_rep = NULL;
+    }
+};
+
+template<typename Value>
+concurrent_queue_iterator_base_v3<Value>::concurrent_queue_iterator_base_v3( const concurrent_queue_base_v3<Value>& queue ) {
+    my_rep = cache_aligned_allocator<concurrent_queue_iterator_rep<Value> >().allocate(1);
+    new( my_rep ) concurrent_queue_iterator_rep<Value>(queue);
+    size_t k = my_rep->head_counter;
+    if( !my_rep->get_item(my_item, k) ) advance();
+}
+
+template<typename Value>
+void concurrent_queue_iterator_base_v3<Value>::assign( const concurrent_queue_iterator_base_v3<Value>& other ) {
+    if( my_rep!=other.my_rep ) {
+        if( my_rep ) {
+            cache_aligned_allocator<concurrent_queue_iterator_rep<Value> >().deallocate(my_rep, 1);
+            my_rep = NULL;
+        }
+        if( other.my_rep ) {
+            my_rep = cache_aligned_allocator<concurrent_queue_iterator_rep<Value> >().allocate(1);
+            new( my_rep ) concurrent_queue_iterator_rep<Value>( *other.my_rep );
+        }
+    }
+    my_item = other.my_item;
+}
+
+template<typename Value>
+void concurrent_queue_iterator_base_v3<Value>::advance() {
+    __TBB_ASSERT( my_item, "attempt to increment iterator past end of queue" );  
+    size_t k = my_rep->head_counter;
+    const concurrent_queue_base_v3<Value>& queue = my_rep->my_queue;
+#if TBB_USE_ASSERT
+    void* tmp;
+    my_rep->get_item(tmp,k);
+    __TBB_ASSERT( my_item==tmp, NULL );
+#endif /* TBB_USE_ASSERT */
+    size_t i = k/concurrent_queue_rep<Value>::n_queue & (queue.my_rep->items_per_page-1);
+    if( i==queue.my_rep->items_per_page-1 ) {
+        typename concurrent_queue_base_v3<Value>::page*& root = my_rep->array[concurrent_queue_rep<Value>::index(k)];
+        root = root->next;
+    }
+    // advance k
+    my_rep->head_counter = ++k;
+    if( !my_rep->get_item(my_item, k) ) advance();
+}
+
+template<typename T>
+static inline const concurrent_queue_iterator_base_v3<const T>& add_constness( const concurrent_queue_iterator_base_v3<T>& q )
+{
+    return *reinterpret_cast<const concurrent_queue_iterator_base_v3<const T> *>(&q) ;
+}
+
+//! Meets requirements of a forward iterator for STL.
+/** Value is either the T or const T type of the container.
+    @ingroup containers */
+template<typename Container, typename Value>
+class concurrent_queue_iterator: public concurrent_queue_iterator_base_v3<Value>,
+        public std::iterator<std::forward_iterator_tag,Value> {
+#if !__TBB_TEMPLATE_FRIENDS_BROKEN
+    template<typename T, class A>
+    friend class ::tbb::strict_ppl::concurrent_queue;
+#else
+public: // workaround for MSVC
+#endif 
+    //! Construct iterator pointing to head of queue.
+    concurrent_queue_iterator( const concurrent_queue_base_v3<Value>& queue ) :
+        concurrent_queue_iterator_base_v3<Value>(queue)
+    {
+    }
+
+public:
+    concurrent_queue_iterator() {}
+
+    //! Copy constructor
+    concurrent_queue_iterator( const concurrent_queue_iterator<Container,Value>& other ) :
+        concurrent_queue_iterator_base_v3<Value>(other)
+    {
+    }
+
+    template<typename T>
+    concurrent_queue_iterator( const concurrent_queue_iterator<Container,T>& other ) :
+        concurrent_queue_iterator_base_v3<Value>(add_constness(other))
+    {
+    }
+
+    //! Iterator assignment
+    concurrent_queue_iterator& operator=( const concurrent_queue_iterator& other ) {
+        assign(other);
+        return *this;
+    }
+
+    //! Reference to current item 
+    Value& operator*() const {
+        return *static_cast<Value*>(this->my_item);
+    }
+
+    Value* operator->() const {return &operator*();}
+
+    //! Advance to next item in queue
+    concurrent_queue_iterator& operator++() {
+        this->advance();
+        return *this;
+    }
+
+    //! Post increment
+    Value* operator++(int) {
+        Value* result = &operator*();
+        operator++();
+        return result;
+    }
+}; // concurrent_queue_iterator
+
+
+template<typename C, typename T, typename U>
+bool operator==( const concurrent_queue_iterator<C,T>& i, const concurrent_queue_iterator<C,U>& j ) {
+    return i.my_item==j.my_item;
+}
+
+template<typename C, typename T, typename U>
+bool operator!=( const concurrent_queue_iterator<C,T>& i, const concurrent_queue_iterator<C,U>& j ) {
+    return i.my_item!=j.my_item;
+}
+
+} // namespace internal
+
+//! @endcond
+
+} // namespace strict_ppl
+
+//! @cond INTERNAL
+namespace internal {
+
+class concurrent_queue_rep;
+class concurrent_queue_iterator_rep;
+class concurrent_queue_iterator_base_v3;
+template<typename Container, typename Value> class concurrent_queue_iterator;
+
+//! For internal use only.
+/** Type-independent portion of concurrent_queue.
+    @ingroup containers */
+class concurrent_queue_base_v3: no_copy {
+    //! Internal representation
+    concurrent_queue_rep* my_rep;
+
+    friend class concurrent_queue_rep;
+    friend struct micro_queue;
+    friend class micro_queue_pop_finalizer;
+    friend class concurrent_queue_iterator_rep;
+    friend class concurrent_queue_iterator_base_v3;
+protected:
+    //! Prefix on a page
+    struct page {
+        page* next;
+        uintptr_t mask; 
+    };
+
+    //! Capacity of the queue
+    ptrdiff_t my_capacity;
+   
+    //! Always a power of 2
+    size_t items_per_page;
+
+    //! Size of an item
+    size_t item_size;
+
+private:
+    virtual void copy_item( page& dst, size_t index, const void* src ) = 0;
+    virtual void assign_and_destroy_item( void* dst, page& src, size_t index ) = 0;
+protected:
+    __TBB_EXPORTED_METHOD concurrent_queue_base_v3( size_t item_size );
+    virtual __TBB_EXPORTED_METHOD ~concurrent_queue_base_v3();
+
+    //! Enqueue item at tail of queue
+    void __TBB_EXPORTED_METHOD internal_push( const void* src );
+
+    //! Dequeue item from head of queue
+    void __TBB_EXPORTED_METHOD internal_pop( void* dst );
+
+    //! Attempt to enqueue item onto queue.
+    bool __TBB_EXPORTED_METHOD internal_push_if_not_full( const void* src );
+
+    //! Attempt to dequeue item from queue.
+    /** NULL if there was no item to dequeue. */
+    bool __TBB_EXPORTED_METHOD internal_pop_if_present( void* dst );
+
+    //! Get size of queue
+    ptrdiff_t __TBB_EXPORTED_METHOD internal_size() const;
+
+    //! Check if the queue is emtpy
+    bool __TBB_EXPORTED_METHOD internal_empty() const;
+
+    //! Set the queue capacity
+    void __TBB_EXPORTED_METHOD internal_set_capacity( ptrdiff_t capacity, size_t element_size );
+
+    //! custom allocator
+    virtual page *allocate_page() = 0;
+
+    //! custom de-allocator
+    virtual void deallocate_page( page *p ) = 0;
+
+    //! free any remaining pages
+    /* note that the name may be misleading, but it remains so due to a historical accident. */
+    void __TBB_EXPORTED_METHOD internal_finish_clear() ;
+
+    //! throw an exception
+    void __TBB_EXPORTED_METHOD internal_throw_exception() const;
+
+    //! copy internal representation
+    void __TBB_EXPORTED_METHOD assign( const concurrent_queue_base_v3& src ) ;
+
+private:
+    virtual void copy_page_item( page& dst, size_t dindex, const page& src, size_t sindex ) = 0;
+};
+
+//! Type-independent portion of concurrent_queue_iterator.
+/** @ingroup containers */
+class concurrent_queue_iterator_base_v3 {
+    //! Concurrentconcurrent_queue over which we are iterating.
+    /** NULL if one past last element in queue. */
+    concurrent_queue_iterator_rep* my_rep;
+
+    template<typename C, typename T, typename U>
+    friend bool operator==( const concurrent_queue_iterator<C,T>& i, const concurrent_queue_iterator<C,U>& j );
+
+    template<typename C, typename T, typename U>
+    friend bool operator!=( const concurrent_queue_iterator<C,T>& i, const concurrent_queue_iterator<C,U>& j );
+protected:
+    //! Pointer to current item
+    mutable void* my_item;
+
+    //! Default constructor
+    concurrent_queue_iterator_base_v3() : my_rep(NULL), my_item(NULL) {}
+
+    //! Copy constructor
+    concurrent_queue_iterator_base_v3( const concurrent_queue_iterator_base_v3& i ) : my_rep(NULL), my_item(NULL) {
+        assign(i);
+    }
+
+    //! Construct iterator pointing to head of queue.
+    __TBB_EXPORTED_METHOD concurrent_queue_iterator_base_v3( const concurrent_queue_base_v3& queue );
+
+    //! Assignment
+    void __TBB_EXPORTED_METHOD assign( const concurrent_queue_iterator_base_v3& i );
+
+    //! Advance iterator one step towards tail of queue.
+    void __TBB_EXPORTED_METHOD advance();
+
+    //! Destructor
+    __TBB_EXPORTED_METHOD ~concurrent_queue_iterator_base_v3();
+};
+
+typedef concurrent_queue_iterator_base_v3 concurrent_queue_iterator_base;
+
+//! Meets requirements of a forward iterator for STL.
+/** Value is either the T or const T type of the container.
+    @ingroup containers */
+template<typename Container, typename Value>
+class concurrent_queue_iterator: public concurrent_queue_iterator_base,
+        public std::iterator<std::forward_iterator_tag,Value> {
+#if !defined(_MSC_VER) || defined(__INTEL_COMPILER)
+    template<typename T, class A>
+    friend class ::tbb::concurrent_bounded_queue;
+
+    template<typename T, class A>
+    friend class ::tbb::deprecated::concurrent_queue;
+#else
+public: // workaround for MSVC
+#endif 
+    //! Construct iterator pointing to head of queue.
+    concurrent_queue_iterator( const concurrent_queue_base_v3& queue ) :
+        concurrent_queue_iterator_base_v3(queue)
+    {
+    }
+
+public:
+    concurrent_queue_iterator() {}
+
+    /** If Value==Container::value_type, then this routine is the copy constructor. 
+        If Value==const Container::value_type, then this routine is a conversion constructor. */
+    concurrent_queue_iterator( const concurrent_queue_iterator<Container,typename Container::value_type>& other ) :
+        concurrent_queue_iterator_base_v3(other)
+    {}
+
+    //! Iterator assignment
+    concurrent_queue_iterator& operator=( const concurrent_queue_iterator& other ) {
+        assign(other);
+        return *this;
+    }
+
+    //! Reference to current item 
+    Value& operator*() const {
+        return *static_cast<Value*>(my_item);
+    }
+
+    Value* operator->() const {return &operator*();}
+
+    //! Advance to next item in queue
+    concurrent_queue_iterator& operator++() {
+        advance();
+        return *this;
+    }
+
+    //! Post increment
+    Value* operator++(int) {
+        Value* result = &operator*();
+        operator++();
+        return result;
+    }
+}; // concurrent_queue_iterator
+
+
+template<typename C, typename T, typename U>
+bool operator==( const concurrent_queue_iterator<C,T>& i, const concurrent_queue_iterator<C,U>& j ) {
+    return i.my_item==j.my_item;
+}
+
+template<typename C, typename T, typename U>
+bool operator!=( const concurrent_queue_iterator<C,T>& i, const concurrent_queue_iterator<C,U>& j ) {
+    return i.my_item!=j.my_item;
+}
+
+} // namespace internal;
+
+//! @endcond
+
+} // namespace tbb
+
+#endif /* __TBB_concurrent_queue_internal_H */
diff --git a/dep/tbb/include/tbb/_tbb_windef.h b/dep/tbb/include/tbb/_tbb_windef.h
new file mode 100644
index 000000000..ceb697dc3
--- /dev/null
+++ b/dep/tbb/include/tbb/_tbb_windef.h
@@ -0,0 +1,84 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_tbb_windef_H
+#error Do not #include this file directly.  Use "#include tbb/tbb_stddef.h" instead.
+#endif /* __TBB_tbb_windef_H */
+
+// Check that the target Windows version has all API calls requried for TBB.
+// Do not increase the version in condition beyond 0x0500 without prior discussion!
+#if defined(_WIN32_WINNT) && _WIN32_WINNT<0x0400
+#error TBB is unable to run on old Windows versions; _WIN32_WINNT must be 0x0400 or greater.
+#endif
+
+#if !defined(_MT)
+#error TBB requires linkage with multithreaded C/C++ runtime library. \
+       Choose multithreaded DLL runtime in project settings, or use /MD[d] compiler switch.
+#elif !defined(_DLL)
+#pragma message("Warning: Using TBB together with static C/C++ runtime library is not recommended. " \
+                "Consider switching your project to multithreaded DLL runtime used by TBB.")
+#endif
+
+// Workaround for the problem with MVSC headers failing to define namespace std
+namespace std {
+  using ::size_t; using ::ptrdiff_t;
+}
+
+#define __TBB_STRING_AUX(x) #x
+#define __TBB_STRING(x) __TBB_STRING_AUX(x)
+
+// Default setting of TBB_USE_DEBUG
+#ifdef TBB_USE_DEBUG
+#    if TBB_USE_DEBUG 
+#        if !defined(_DEBUG)
+#            pragma message(__FILE__ "(" __TBB_STRING(__LINE__) ") : Warning: Recommend using /MDd if compiling with TBB_USE_DEBUG!=0")
+#        endif
+#    else
+#        if defined(_DEBUG)
+#            pragma message(__FILE__ "(" __TBB_STRING(__LINE__) ") : Warning: Recommend using /MD if compiling with TBB_USE_DEBUG==0")
+#        endif
+#    endif
+#else
+#    ifdef _DEBUG
+#        define TBB_USE_DEBUG 1
+#    endif
+#endif 
+
+#if __TBB_BUILD && !defined(__TBB_NO_IMPLICIT_LINKAGE)
+#define __TBB_NO_IMPLICIT_LINKAGE 1
+#endif
+
+#if _MSC_VER
+    #if !__TBB_NO_IMPLICIT_LINKAGE
+        #ifdef _DEBUG
+            #pragma comment(lib, "tbb_debug.lib")
+        #else
+            #pragma comment(lib, "tbb.lib")
+        #endif
+    #endif
+#endif
diff --git a/dep/tbb/include/tbb/aligned_space.h b/dep/tbb/include/tbb/aligned_space.h
new file mode 100644
index 000000000..f9a08df5a
--- /dev/null
+++ b/dep/tbb/include/tbb/aligned_space.h
@@ -0,0 +1,55 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_aligned_space_H
+#define __TBB_aligned_space_H
+
+#include "tbb_stddef.h"
+#include "tbb_machine.h"
+
+namespace tbb {
+
+//! Block of space aligned sufficiently to construct an array T with N elements.
+/** The elements are not constructed or destroyed by this class.
+    @ingroup memory_allocation */
+template<typename T,size_t N>
+class aligned_space {
+private:
+    typedef __TBB_TypeWithAlignmentAtLeastAsStrict(T) element_type;
+    element_type array[(sizeof(T)*N+sizeof(element_type)-1)/sizeof(element_type)];
+public:
+    //! Pointer to beginning of array
+    T* begin() {return reinterpret_cast<T*>(this);}
+
+    //! Pointer to one past last element in array.
+    T* end() {return begin()+N;}
+};
+
+} // namespace tbb 
+
+#endif /* __TBB_aligned_space_H */
diff --git a/dep/tbb/include/tbb/atomic.h b/dep/tbb/include/tbb/atomic.h
new file mode 100644
index 000000000..8f3517f1e
--- /dev/null
+++ b/dep/tbb/include/tbb/atomic.h
@@ -0,0 +1,397 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_atomic_H
+#define __TBB_atomic_H
+
+#include <cstddef>
+#include "tbb_stddef.h"
+
+#if _MSC_VER 
+#define __TBB_LONG_LONG __int64
+#else
+#define __TBB_LONG_LONG long long
+#endif /* _MSC_VER */
+
+#include "tbb_machine.h"
+
+#if defined(_MSC_VER) && !defined(__INTEL_COMPILER)
+    // Workaround for overzealous compiler warnings 
+    #pragma warning (push)
+    #pragma warning (disable: 4244 4267)
+#endif
+
+namespace tbb {
+
+//! Specifies memory fencing.
+enum memory_semantics {
+    //! For internal use only.
+    __TBB_full_fence,
+    //! Acquire fence
+    acquire,
+    //! Release fence
+    release
+};
+
+//! @cond INTERNAL
+namespace internal {
+
+#if __GNUC__ || __SUNPRO_CC
+#define __TBB_DECL_ATOMIC_FIELD(t,f,a) t f  __attribute__ ((aligned(a)));
+#elif defined(__INTEL_COMPILER)||_MSC_VER >= 1300
+#define __TBB_DECL_ATOMIC_FIELD(t,f,a) __declspec(align(a)) t f;
+#else 
+#error Do not know syntax for forcing alignment.
+#endif /* __GNUC__ */
+
+template<size_t S>
+struct atomic_rep;           // Primary template declared, but never defined.
+
+template<>
+struct atomic_rep<1> {       // Specialization
+    typedef int8_t word;
+    int8_t value;
+};
+template<>
+struct atomic_rep<2> {       // Specialization
+    typedef int16_t word;
+    __TBB_DECL_ATOMIC_FIELD(int16_t,value,2)
+};
+template<>
+struct atomic_rep<4> {       // Specialization
+#if _MSC_VER && __TBB_WORDSIZE==4
+    // Work-around that avoids spurious /Wp64 warnings
+    typedef intptr_t word;
+#else
+    typedef int32_t word;
+#endif
+    __TBB_DECL_ATOMIC_FIELD(int32_t,value,4)
+};
+template<>
+struct atomic_rep<8> {       // Specialization
+    typedef int64_t word;
+    __TBB_DECL_ATOMIC_FIELD(int64_t,value,8)
+};
+
+template<size_t Size, memory_semantics M>
+struct atomic_traits;        // Primary template declared, but not defined.
+
+#define __TBB_DECL_FENCED_ATOMIC_PRIMITIVES(S,M)                         \
+    template<> struct atomic_traits<S,M> {                               \
+        typedef atomic_rep<S>::word word;                               \
+        inline static word compare_and_swap( volatile void* location, word new_value, word comparand ) {\
+            return __TBB_CompareAndSwap##S##M(location,new_value,comparand);    \
+        }                                                                       \
+        inline static word fetch_and_add( volatile void* location, word addend ) { \
+            return __TBB_FetchAndAdd##S##M(location,addend);                    \
+        }                                                                       \
+        inline static word fetch_and_store( volatile void* location, word value ) {\
+            return __TBB_FetchAndStore##S##M(location,value);                   \
+        }                                                                       \
+    };
+
+#define __TBB_DECL_ATOMIC_PRIMITIVES(S)                                  \
+    template<memory_semantics M>                                         \
+    struct atomic_traits<S,M> {                                          \
+        typedef atomic_rep<S>::word word;                               \
+        inline static word compare_and_swap( volatile void* location, word new_value, word comparand ) {\
+            return __TBB_CompareAndSwap##S(location,new_value,comparand);       \
+        }                                                                       \
+        inline static word fetch_and_add( volatile void* location, word addend ) { \
+            return __TBB_FetchAndAdd##S(location,addend);                       \
+        }                                                                       \
+        inline static word fetch_and_store( volatile void* location, word value ) {\
+            return __TBB_FetchAndStore##S(location,value);                      \
+        }                                                                       \
+    };
+
+#if __TBB_DECL_FENCED_ATOMICS
+__TBB_DECL_FENCED_ATOMIC_PRIMITIVES(1,__TBB_full_fence)
+__TBB_DECL_FENCED_ATOMIC_PRIMITIVES(2,__TBB_full_fence)
+__TBB_DECL_FENCED_ATOMIC_PRIMITIVES(4,__TBB_full_fence)
+__TBB_DECL_FENCED_ATOMIC_PRIMITIVES(8,__TBB_full_fence)
+__TBB_DECL_FENCED_ATOMIC_PRIMITIVES(1,acquire)
+__TBB_DECL_FENCED_ATOMIC_PRIMITIVES(2,acquire)
+__TBB_DECL_FENCED_ATOMIC_PRIMITIVES(4,acquire)
+__TBB_DECL_FENCED_ATOMIC_PRIMITIVES(8,acquire)
+__TBB_DECL_FENCED_ATOMIC_PRIMITIVES(1,release)
+__TBB_DECL_FENCED_ATOMIC_PRIMITIVES(2,release)
+__TBB_DECL_FENCED_ATOMIC_PRIMITIVES(4,release)
+__TBB_DECL_FENCED_ATOMIC_PRIMITIVES(8,release)
+#else
+__TBB_DECL_ATOMIC_PRIMITIVES(1)
+__TBB_DECL_ATOMIC_PRIMITIVES(2)
+__TBB_DECL_ATOMIC_PRIMITIVES(4)
+__TBB_DECL_ATOMIC_PRIMITIVES(8)
+#endif
+
+//! Additive inverse of 1 for type T.
+/** Various compilers issue various warnings if -1 is used with various integer types.
+    The baroque expression below avoids all the warnings (we hope). */
+#define __TBB_MINUS_ONE(T) (T(T(0)-T(1)))
+
+//! Base class that provides basic functionality for atomic<T> without fetch_and_add.
+/** Works for any type T that has the same size as an integral type, has a trivial constructor/destructor, 
+    and can be copied/compared by memcpy/memcmp. */
+template<typename T>
+struct atomic_impl {
+protected:
+    atomic_rep<sizeof(T)> rep;
+private:
+    //! Union type used to convert type T to underlying integral type.
+    union converter {
+        T value;
+        typename atomic_rep<sizeof(T)>::word bits;
+    };
+public:
+    typedef T value_type;
+
+    template<memory_semantics M>
+    value_type fetch_and_store( value_type value ) {
+        converter u, w;
+        u.value = value;
+        w.bits = internal::atomic_traits<sizeof(value_type),M>::fetch_and_store(&rep.value,u.bits);
+        return w.value;
+    }
+
+    value_type fetch_and_store( value_type value ) {
+        return fetch_and_store<__TBB_full_fence>(value);
+    }
+
+    template<memory_semantics M>
+    value_type compare_and_swap( value_type value, value_type comparand ) {
+        converter u, v, w;
+        u.value = value;
+        v.value = comparand;
+        w.bits = internal::atomic_traits<sizeof(value_type),M>::compare_and_swap(&rep.value,u.bits,v.bits);
+        return w.value;
+    }
+
+    value_type compare_and_swap( value_type value, value_type comparand ) {
+        return compare_and_swap<__TBB_full_fence>(value,comparand);
+    }
+
+    operator value_type() const volatile {                // volatile qualifier here for backwards compatibility 
+        converter w;
+        w.bits = __TBB_load_with_acquire( rep.value );
+        return w.value;
+    }
+
+protected:
+    value_type store_with_release( value_type rhs ) {
+        converter u;
+        u.value = rhs;
+        __TBB_store_with_release(rep.value,u.bits);
+        return rhs;
+    }
+};
+
+//! Base class that provides basic functionality for atomic<T> with fetch_and_add.
+/** I is the underlying type.
+    D is the difference type.
+    StepType should be char if I is an integral type, and T if I is a T*. */
+template<typename I, typename D, typename StepType>
+struct atomic_impl_with_arithmetic: atomic_impl<I> {
+public:
+    typedef I value_type;
+
+    template<memory_semantics M>
+    value_type fetch_and_add( D addend ) {
+        return value_type(internal::atomic_traits<sizeof(value_type),M>::fetch_and_add( &this->rep.value, addend*sizeof(StepType) ));
+    }
+
+    value_type fetch_and_add( D addend ) {
+        return fetch_and_add<__TBB_full_fence>(addend);
+    }
+
+    template<memory_semantics M>
+    value_type fetch_and_increment() {
+        return fetch_and_add<M>(1);
+    }
+
+    value_type fetch_and_increment() {
+        return fetch_and_add(1);
+    }
+
+    template<memory_semantics M>
+    value_type fetch_and_decrement() {
+        return fetch_and_add<M>(__TBB_MINUS_ONE(D));
+    }
+
+    value_type fetch_and_decrement() {
+        return fetch_and_add(__TBB_MINUS_ONE(D));
+    }
+
+public:
+    value_type operator+=( D addend ) {
+        return fetch_and_add(addend)+addend;
+    }
+
+    value_type operator-=( D addend ) {
+        // Additive inverse of addend computed using binary minus,
+        // instead of unary minus, for sake of avoiding compiler warnings.
+        return operator+=(D(0)-addend);    
+    }
+
+    value_type operator++() {
+        return fetch_and_add(1)+1;
+    }
+
+    value_type operator--() {
+        return fetch_and_add(__TBB_MINUS_ONE(D))-1;
+    }
+
+    value_type operator++(int) {
+        return fetch_and_add(1);
+    }
+
+    value_type operator--(int) {
+        return fetch_and_add(__TBB_MINUS_ONE(D));
+    }
+};
+
+#if __TBB_WORDSIZE == 4
+// Plaforms with 32-bit hardware require special effort for 64-bit loads and stores.
+#if defined(__INTEL_COMPILER)||!defined(_MSC_VER)||_MSC_VER>=1400
+
+template<>
+inline atomic_impl<__TBB_LONG_LONG>::operator atomic_impl<__TBB_LONG_LONG>::value_type() const volatile {
+    return __TBB_Load8(&rep.value);
+}
+
+template<>
+inline atomic_impl<unsigned __TBB_LONG_LONG>::operator atomic_impl<unsigned __TBB_LONG_LONG>::value_type() const volatile {
+    return __TBB_Load8(&rep.value);
+}
+
+template<>
+inline atomic_impl<__TBB_LONG_LONG>::value_type atomic_impl<__TBB_LONG_LONG>::store_with_release( value_type rhs ) {
+    __TBB_Store8(&rep.value,rhs);
+    return rhs;
+}
+
+template<>
+inline atomic_impl<unsigned __TBB_LONG_LONG>::value_type atomic_impl<unsigned __TBB_LONG_LONG>::store_with_release( value_type rhs ) {
+    __TBB_Store8(&rep.value,rhs);
+    return rhs;
+}
+
+#endif /* defined(__INTEL_COMPILER)||!defined(_MSC_VER)||_MSC_VER>=1400 */
+#endif /* __TBB_WORDSIZE==4 */
+
+} /* Internal */
+//! @endcond
+
+//! Primary template for atomic.
+/** See the Reference for details.
+    @ingroup synchronization */
+template<typename T>
+struct atomic: internal::atomic_impl<T> {
+    T operator=( T rhs ) {
+        // "this" required here in strict ISO C++ because store_with_release is a dependent name
+        return this->store_with_release(rhs);
+    }
+    atomic<T>& operator=( const atomic<T>& rhs ) {this->store_with_release(rhs); return *this;}
+};
+
+#define __TBB_DECL_ATOMIC(T) \
+    template<> struct atomic<T>: internal::atomic_impl_with_arithmetic<T,T,char> {  \
+        T operator=( T rhs ) {return store_with_release(rhs);}  \
+        atomic<T>& operator=( const atomic<T>& rhs ) {store_with_release(rhs); return *this;}  \
+    };
+
+#if defined(__INTEL_COMPILER)||!defined(_MSC_VER)||_MSC_VER>=1400
+__TBB_DECL_ATOMIC(__TBB_LONG_LONG)
+__TBB_DECL_ATOMIC(unsigned __TBB_LONG_LONG)
+#else
+// Some old versions of MVSC cannot correctly compile templates with "long long".
+#endif /* defined(__INTEL_COMPILER)||!defined(_MSC_VER)||_MSC_VER>=1400 */
+
+__TBB_DECL_ATOMIC(long)
+__TBB_DECL_ATOMIC(unsigned long)
+
+#if defined(_MSC_VER) && __TBB_WORDSIZE==4
+/* Special version of __TBB_DECL_ATOMIC that avoids gratuitous warnings from cl /Wp64 option. 
+   It is identical to __TBB_DECL_ATOMIC(unsigned) except that it replaces operator=(T) 
+   with an operator=(U) that explicitly converts the U to a T.  Types T and U should be
+   type synonyms on the platform.  Type U should be the wider variant of T from the
+   perspective of /Wp64. */
+#define __TBB_DECL_ATOMIC_ALT(T,U) \
+    template<> struct atomic<T>: internal::atomic_impl_with_arithmetic<T,T,char> {  \
+        T operator=( U rhs ) {return store_with_release(T(rhs));}  \
+        atomic<T>& operator=( const atomic<T>& rhs ) {store_with_release(rhs); return *this;}  \
+    };
+__TBB_DECL_ATOMIC_ALT(unsigned,size_t)
+__TBB_DECL_ATOMIC_ALT(int,ptrdiff_t)
+#else
+__TBB_DECL_ATOMIC(unsigned)
+__TBB_DECL_ATOMIC(int)
+#endif /* defined(_MSC_VER) && __TBB_WORDSIZE==4 */
+
+__TBB_DECL_ATOMIC(unsigned short)
+__TBB_DECL_ATOMIC(short)
+__TBB_DECL_ATOMIC(char)
+__TBB_DECL_ATOMIC(signed char)
+__TBB_DECL_ATOMIC(unsigned char)
+
+#if !defined(_MSC_VER)||defined(_NATIVE_WCHAR_T_DEFINED) 
+__TBB_DECL_ATOMIC(wchar_t)
+#endif /* _MSC_VER||!defined(_NATIVE_WCHAR_T_DEFINED) */
+
+//! Specialization for atomic<T*> with arithmetic and operator->.
+template<typename T> struct atomic<T*>: internal::atomic_impl_with_arithmetic<T*,ptrdiff_t,T> {
+    T* operator=( T* rhs ) {
+        // "this" required here in strict ISO C++ because store_with_release is a dependent name
+        return this->store_with_release(rhs);
+    }
+    atomic<T*>& operator=( const atomic<T*>& rhs ) {
+        this->store_with_release(rhs); return *this;
+    }
+    T* operator->() const {
+        return (*this);
+    }
+};
+
+//! Specialization for atomic<void*>, for sake of not allowing arithmetic or operator->.
+template<> struct atomic<void*>: internal::atomic_impl<void*> {
+    void* operator=( void* rhs ) {
+        // "this" required here in strict ISO C++ because store_with_release is a dependent name
+        return this->store_with_release(rhs);
+    }
+    atomic<void*>& operator=( const atomic<void*>& rhs ) {
+        this->store_with_release(rhs); return *this;
+    }
+};
+
+} // namespace tbb
+
+#if defined(_MSC_VER) && !defined(__INTEL_COMPILER)
+    #pragma warning (pop)
+#endif // warnings 4244, 4267 are back
+
+#endif /* __TBB_atomic_H */
diff --git a/dep/tbb/include/tbb/blocked_range.h b/dep/tbb/include/tbb/blocked_range.h
new file mode 100644
index 000000000..fd20aa0c4
--- /dev/null
+++ b/dep/tbb/include/tbb/blocked_range.h
@@ -0,0 +1,129 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_blocked_range_H
+#define __TBB_blocked_range_H
+
+#include "tbb_stddef.h"
+
+namespace tbb {
+
+/** \page range_req Requirements on range concept
+    Class \c R implementing the concept of range must define:
+    - \code R::R( const R& ); \endcode               Copy constructor
+    - \code R::~R(); \endcode                        Destructor
+    - \code bool R::is_divisible() const; \endcode   True if range can be partitioned into two subranges
+    - \code bool R::empty() const; \endcode          True if range is empty
+    - \code R::R( R& r, split ); \endcode            Split range \c r into two subranges.
+**/
+
+//! A range over which to iterate.
+/** @ingroup algorithms */
+template<typename Value>
+class blocked_range {
+public:
+    //! Type of a value
+    /** Called a const_iterator for sake of algorithms that need to treat a blocked_range
+        as an STL container. */
+    typedef Value const_iterator;
+
+    //! Type for size of a range
+    typedef std::size_t size_type;
+
+    //! Construct range with default-constructed values for begin and end.
+    /** Requires that Value have a default constructor. */
+    blocked_range() : my_begin(), my_end() {}
+
+    //! Construct range over half-open interval [begin,end), with the given grainsize.
+    blocked_range( Value begin_, Value end_, size_type grainsize_=1 ) : 
+        my_end(end_), my_begin(begin_), my_grainsize(grainsize_) 
+    {
+        __TBB_ASSERT( my_grainsize>0, "grainsize must be positive" );
+    }
+
+    //! Beginning of range.
+    const_iterator begin() const {return my_begin;}
+
+    //! One past last value in range.
+    const_iterator end() const {return my_end;}
+
+    //! Size of the range
+    /** Unspecified if end()<begin(). */
+    size_type size() const {
+        __TBB_ASSERT( !(end()<begin()), "size() unspecified if end()<begin()" );
+        return size_type(my_end-my_begin);
+    }
+
+    //! The grain size for this range.
+    size_type grainsize() const {return my_grainsize;}
+
+    //------------------------------------------------------------------------
+    // Methods that implement Range concept
+    //------------------------------------------------------------------------
+
+    //! True if range is empty.
+    bool empty() const {return !(my_begin<my_end);}
+
+    //! True if range is divisible.
+    /** Unspecified if end()<begin(). */
+    bool is_divisible() const {return my_grainsize<size();}
+
+    //! Split range.  
+    /** The new Range *this has the second half, the old range r has the first half. 
+        Unspecified if end()<begin() or !is_divisible(). */
+    blocked_range( blocked_range& r, split ) : 
+        my_end(r.my_end),
+        my_begin(do_split(r)),
+        my_grainsize(r.my_grainsize)
+    {}
+
+private:
+    /** NOTE: my_end MUST be declared before my_begin, otherwise the forking constructor will break. */
+    Value my_end;
+    Value my_begin;
+    size_type my_grainsize;
+
+    //! Auxilary function used by forking constructor.
+    /** Using this function lets us not require that Value support assignment or default construction. */
+    static Value do_split( blocked_range& r ) {
+        __TBB_ASSERT( r.is_divisible(), "cannot split blocked_range that is not divisible" );
+        Value middle = r.my_begin + (r.my_end-r.my_begin)/2u;
+        r.my_end = middle;
+        return middle;
+    }
+
+    template<typename RowValue, typename ColValue>
+    friend class blocked_range2d;
+
+    template<typename RowValue, typename ColValue, typename PageValue>
+    friend class blocked_range3d;
+};
+
+} // namespace tbb 
+
+#endif /* __TBB_blocked_range_H */
diff --git a/dep/tbb/include/tbb/blocked_range2d.h b/dep/tbb/include/tbb/blocked_range2d.h
new file mode 100644
index 000000000..d0e48b936
--- /dev/null
+++ b/dep/tbb/include/tbb/blocked_range2d.h
@@ -0,0 +1,97 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_blocked_range2d_H
+#define __TBB_blocked_range2d_H
+
+#include "tbb_stddef.h"
+#include "blocked_range.h"
+
+namespace tbb {
+
+//! A 2-dimensional range that models the Range concept.
+/** @ingroup algorithms */
+template<typename RowValue, typename ColValue=RowValue>
+class blocked_range2d {
+public:
+    //! Type for size of an iteation range
+    typedef blocked_range<RowValue> row_range_type;
+    typedef blocked_range<ColValue> col_range_type;
+ 
+private:
+    row_range_type my_rows;
+    col_range_type my_cols;
+
+public:
+
+    blocked_range2d( RowValue row_begin, RowValue row_end, typename row_range_type::size_type row_grainsize,
+                     ColValue col_begin, ColValue col_end, typename col_range_type::size_type col_grainsize ) : 
+        my_rows(row_begin,row_end,row_grainsize),
+        my_cols(col_begin,col_end,col_grainsize)
+    {
+    }
+
+    blocked_range2d( RowValue row_begin, RowValue row_end,
+                     ColValue col_begin, ColValue col_end ) : 
+        my_rows(row_begin,row_end),
+        my_cols(col_begin,col_end)
+    {
+    }
+
+    //! True if range is empty
+    bool empty() const {
+        // Yes, it is a logical OR here, not AND.
+        return my_rows.empty() || my_cols.empty();
+    }
+
+    //! True if range is divisible into two pieces.
+    bool is_divisible() const {
+        return my_rows.is_divisible() || my_cols.is_divisible();
+    }
+
+    blocked_range2d( blocked_range2d& r, split ) : 
+        my_rows(r.my_rows),
+        my_cols(r.my_cols)
+    {
+        if( my_rows.size()*double(my_cols.grainsize()) < my_cols.size()*double(my_rows.grainsize()) ) {
+            my_cols.my_begin = col_range_type::do_split(r.my_cols);
+        } else {
+            my_rows.my_begin = row_range_type::do_split(r.my_rows);
+        }
+    }
+
+    //! The rows of the iteration space 
+    const row_range_type& rows() const {return my_rows;}
+
+    //! The columns of the iteration space 
+    const col_range_type& cols() const {return my_cols;}
+};
+
+} // namespace tbb 
+
+#endif /* __TBB_blocked_range2d_H */
diff --git a/dep/tbb/include/tbb/blocked_range3d.h b/dep/tbb/include/tbb/blocked_range3d.h
new file mode 100644
index 000000000..6b6742f55
--- /dev/null
+++ b/dep/tbb/include/tbb/blocked_range3d.h
@@ -0,0 +1,116 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_blocked_range3d_H
+#define __TBB_blocked_range3d_H
+
+#include "tbb_stddef.h"
+#include "blocked_range.h"
+
+namespace tbb {
+
+//! A 3-dimensional range that models the Range concept.
+/** @ingroup algorithms */
+template<typename PageValue, typename RowValue=PageValue, typename ColValue=RowValue>
+class blocked_range3d {
+public:
+    //! Type for size of an iteation range
+    typedef blocked_range<PageValue> page_range_type;
+    typedef blocked_range<RowValue>  row_range_type;
+    typedef blocked_range<ColValue>  col_range_type;
+ 
+private:
+    page_range_type my_pages;
+    row_range_type  my_rows;
+    col_range_type  my_cols;
+
+public:
+
+    blocked_range3d( PageValue page_begin, PageValue page_end,
+                     RowValue  row_begin,  RowValue row_end,
+                     ColValue  col_begin,  ColValue col_end ) : 
+        my_pages(page_begin,page_end),
+        my_rows(row_begin,row_end),
+        my_cols(col_begin,col_end)
+    {
+    }
+
+    blocked_range3d( PageValue page_begin, PageValue page_end, typename page_range_type::size_type page_grainsize, 
+                     RowValue  row_begin,  RowValue row_end,   typename row_range_type::size_type row_grainsize,
+                     ColValue  col_begin,  ColValue col_end,   typename col_range_type::size_type col_grainsize ) :  
+        my_pages(page_begin,page_end,page_grainsize),
+        my_rows(row_begin,row_end,row_grainsize),
+        my_cols(col_begin,col_end,col_grainsize)
+    {
+    }
+
+    //! True if range is empty
+    bool empty() const {
+        // Yes, it is a logical OR here, not AND.
+        return my_pages.empty() || my_rows.empty() || my_cols.empty();
+    }
+
+    //! True if range is divisible into two pieces.
+    bool is_divisible() const {
+        return  my_pages.is_divisible() || my_rows.is_divisible() || my_cols.is_divisible();
+    }
+
+    blocked_range3d( blocked_range3d& r, split ) : 
+        my_pages(r.my_pages),
+        my_rows(r.my_rows),
+        my_cols(r.my_cols)
+    {
+        if( my_pages.size()*double(my_rows.grainsize()) < my_rows.size()*double(my_pages.grainsize()) ) {
+            if ( my_rows.size()*double(my_cols.grainsize()) < my_cols.size()*double(my_rows.grainsize()) ) {
+                my_cols.my_begin = col_range_type::do_split(r.my_cols);
+            } else {
+                my_rows.my_begin = row_range_type::do_split(r.my_rows);
+            }
+	} else {
+            if ( my_pages.size()*double(my_cols.grainsize()) < my_cols.size()*double(my_pages.grainsize()) ) {
+                my_cols.my_begin = col_range_type::do_split(r.my_cols);
+            } else {
+                    my_pages.my_begin = page_range_type::do_split(r.my_pages);
+            }
+        }
+    }
+
+    //! The pages of the iteration space 
+    const page_range_type& pages() const {return my_pages;}
+
+    //! The rows of the iteration space 
+    const row_range_type& rows() const {return my_rows;}
+
+    //! The columns of the iteration space 
+    const col_range_type& cols() const {return my_cols;}
+
+};
+
+} // namespace tbb 
+
+#endif /* __TBB_blocked_range3d_H */
diff --git a/dep/tbb/include/tbb/cache_aligned_allocator.h b/dep/tbb/include/tbb/cache_aligned_allocator.h
new file mode 100644
index 000000000..449dcb1ed
--- /dev/null
+++ b/dep/tbb/include/tbb/cache_aligned_allocator.h
@@ -0,0 +1,133 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_cache_aligned_allocator_H
+#define __TBB_cache_aligned_allocator_H
+
+#include <new>
+#include "tbb_stddef.h"
+
+namespace tbb {
+
+//! @cond INTERNAL
+namespace internal {
+    //! Cache/sector line size.
+    /** @ingroup memory_allocation */
+    size_t __TBB_EXPORTED_FUNC NFS_GetLineSize();
+
+    //! Allocate memory on cache/sector line boundary.
+    /** @ingroup memory_allocation */
+    void* __TBB_EXPORTED_FUNC NFS_Allocate( size_t n_element, size_t element_size, void* hint );
+
+    //! Free memory allocated by NFS_Allocate.
+    /** Freeing a NULL pointer is allowed, but has no effect.
+        @ingroup memory_allocation */
+    void __TBB_EXPORTED_FUNC NFS_Free( void* );
+}
+//! @endcond
+
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+    // Workaround for erroneous "unreferenced parameter" warning in method destroy.
+    #pragma warning (push)
+    #pragma warning (disable: 4100)
+#endif
+
+//! Meets "allocator" requirements of ISO C++ Standard, Section 20.1.5
+/** The members are ordered the same way they are in section 20.4.1
+    of the ISO C++ standard.
+    @ingroup memory_allocation */
+template<typename T>
+class cache_aligned_allocator {
+public:
+    typedef typename internal::allocator_type<T>::value_type value_type;
+    typedef value_type* pointer;
+    typedef const value_type* const_pointer;
+    typedef value_type& reference;
+    typedef const value_type& const_reference;
+    typedef size_t size_type;
+    typedef ptrdiff_t difference_type;
+    template<typename U> struct rebind {
+        typedef cache_aligned_allocator<U> other;
+    };
+
+    cache_aligned_allocator() throw() {}
+    cache_aligned_allocator( const cache_aligned_allocator& ) throw() {}
+    template<typename U> cache_aligned_allocator(const cache_aligned_allocator<U>&) throw() {}
+
+    pointer address(reference x) const {return &x;}
+    const_pointer address(const_reference x) const {return &x;}
+    
+    //! Allocate space for n objects, starting on a cache/sector line.
+    pointer allocate( size_type n, const void* hint=0 ) {
+        // The "hint" argument is always ignored in NFS_Allocate thus const_cast shouldn't hurt
+        return pointer(internal::NFS_Allocate( n, sizeof(value_type), const_cast<void*>(hint) ));
+    }
+
+    //! Free block of memory that starts on a cache line
+    void deallocate( pointer p, size_type ) {
+        internal::NFS_Free(p);
+    }
+
+    //! Largest value for which method allocate might succeed.
+    size_type max_size() const throw() {
+        return (~size_t(0)-internal::NFS_MaxLineSize)/sizeof(value_type);
+    }
+
+    //! Copy-construct value at location pointed to by p.
+    void construct( pointer p, const value_type& value ) {new(static_cast<void*>(p)) value_type(value);}
+
+    //! Destroy value at location pointed to by p.
+    void destroy( pointer p ) {p->~value_type();}
+};
+
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+    #pragma warning (pop)
+#endif // warning 4100 is back
+
+//! Analogous to std::allocator<void>, as defined in ISO C++ Standard, Section 20.4.1
+/** @ingroup memory_allocation */
+template<> 
+class cache_aligned_allocator<void> {
+public:
+    typedef void* pointer;
+    typedef const void* const_pointer;
+    typedef void value_type;
+    template<typename U> struct rebind {
+        typedef cache_aligned_allocator<U> other;
+    };
+};
+
+template<typename T, typename U>
+inline bool operator==( const cache_aligned_allocator<T>&, const cache_aligned_allocator<U>& ) {return true;}
+
+template<typename T, typename U>
+inline bool operator!=( const cache_aligned_allocator<T>&, const cache_aligned_allocator<U>& ) {return false;}
+
+} // namespace tbb
+
+#endif /* __TBB_cache_aligned_allocator_H */
diff --git a/dep/tbb/include/tbb/combinable.h b/dep/tbb/include/tbb/combinable.h
new file mode 100644
index 000000000..9122ffa8e
--- /dev/null
+++ b/dep/tbb/include/tbb/combinable.h
@@ -0,0 +1,78 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_combinable_H
+#define __TBB_combinable_H
+
+#include "tbb/enumerable_thread_specific.h"
+#include "tbb/cache_aligned_allocator.h"
+
+namespace tbb {
+/** \name combinable
+    **/
+//@{
+//! Thread-local storage with optional reduction
+/** @ingroup containers */
+    template <typename T>
+        class combinable {
+    private:
+        typedef typename tbb::cache_aligned_allocator<T> my_alloc;
+
+        typedef typename tbb::enumerable_thread_specific<T, my_alloc, ets_no_key> my_ets_type;
+        my_ets_type my_ets; 
+ 
+    public:
+
+        combinable() { }
+
+        template <typename finit>
+        combinable( finit _finit) : my_ets(_finit) { }
+
+        //! destructor
+        ~combinable() { 
+        }
+
+        combinable(const combinable& other) : my_ets(other.my_ets) { }
+
+        combinable & operator=( const combinable & other) { my_ets = other.my_ets; return *this; }
+
+        void clear() { my_ets.clear(); }
+
+        T& local() { return my_ets.local(); }
+
+        T& local(bool & exists) { return my_ets.local(exists); }
+
+        template< typename FCombine>
+        T combine(FCombine fcombine) { return my_ets.combine(fcombine); }
+
+        template<typename FCombine>
+        void combine_each(FCombine fcombine) { my_ets.combine_each(fcombine); }
+
+    };
+} // namespace tbb
+#endif /* __TBB_combinable_H */
diff --git a/dep/tbb/include/tbb/compat/ppl.h b/dep/tbb/include/tbb/compat/ppl.h
new file mode 100644
index 000000000..998bd0015
--- /dev/null
+++ b/dep/tbb/include/tbb/compat/ppl.h
@@ -0,0 +1,58 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_compat_ppl_H
+#define __TBB_compat_ppl_H
+
+#include "../task_group.h"
+#include "../parallel_invoke.h"
+#include "../parallel_for_each.h"
+#include "../parallel_for.h"
+
+namespace Concurrency {
+
+    using tbb::task_handle;
+    using tbb::task_group_status;
+    using tbb::task_group;
+    using tbb::structured_task_group;
+    using tbb::missing_wait;
+    using tbb::make_task;
+
+    using tbb::not_complete;
+    using tbb::complete;
+    using tbb::canceled;
+
+    using tbb::is_current_task_group_canceling;
+
+    using tbb::parallel_invoke;
+    using tbb::strict_ppl::parallel_for;
+    using tbb::parallel_for_each;
+
+} // namespace Concurrency
+
+#endif /* __TBB_compat_ppl_H */
diff --git a/dep/tbb/include/tbb/concurrent_hash_map.h b/dep/tbb/include/tbb/concurrent_hash_map.h
new file mode 100644
index 000000000..ea4138fcc
--- /dev/null
+++ b/dep/tbb/include/tbb/concurrent_hash_map.h
@@ -0,0 +1,1262 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_concurrent_hash_map_H
+#define __TBB_concurrent_hash_map_H
+
+#include <stdexcept>
+#include <iterator>
+#include <utility>      // Need std::pair
+#include <cstring>      // Need std::memset
+#include <string>
+#include "tbb_stddef.h"
+#include "cache_aligned_allocator.h"
+#include "tbb_allocator.h"
+#include "spin_rw_mutex.h"
+#include "atomic.h"
+#include "aligned_space.h"
+#if TBB_USE_PERFORMANCE_WARNINGS
+#include <typeinfo>
+#endif
+
+namespace tbb {
+
+template<typename T> struct tbb_hash_compare;
+template<typename Key, typename T, typename HashCompare = tbb_hash_compare<Key>, typename A = tbb_allocator<std::pair<Key, T> > >
+class concurrent_hash_map;
+
+//! @cond INTERNAL
+namespace internal {
+    //! ITT instrumented routine that loads pointer from location pointed to by src.
+    void* __TBB_EXPORTED_FUNC itt_load_pointer_with_acquire_v3( const void* src );
+    //! ITT instrumented routine that stores src into location pointed to by dst.
+    void __TBB_EXPORTED_FUNC itt_store_pointer_with_release_v3( void* dst, void* src );
+    //! Routine that loads pointer from location pointed to by src without causing ITT to report a race.
+    void* __TBB_EXPORTED_FUNC itt_load_pointer_v3( const void* src );
+
+    //! Type of a hash code.
+    typedef size_t hashcode_t;
+    //! Node base type
+    struct hash_map_node_base : no_copy {
+        //! Mutex type
+        typedef spin_rw_mutex mutex_t;
+        //! Scoped lock type for mutex
+        typedef mutex_t::scoped_lock scoped_t;
+        //! Next node in chain
+        hash_map_node_base *next;
+        mutex_t mutex;
+    };
+    //! Incompleteness flag value
+    static hash_map_node_base *const rehash_req = reinterpret_cast<hash_map_node_base*>(size_t(3));
+    //! Rehashed empty bucket flag
+    static hash_map_node_base *const empty_rehashed = reinterpret_cast<hash_map_node_base*>(size_t(0));
+    //! base class of concurrent_hash_map
+    class hash_map_base {
+    public:
+        //! Size type
+        typedef size_t size_type;
+        //! Type of a hash code.
+        typedef size_t hashcode_t;
+        //! Segment index type
+        typedef size_t segment_index_t;
+        //! Node base type
+        typedef hash_map_node_base node_base;
+        //! Bucket type
+        struct bucket : no_copy {
+            //! Mutex type for buckets
+            typedef spin_rw_mutex mutex_t;
+            //! Scoped lock type for mutex
+            typedef mutex_t::scoped_lock scoped_t;
+            mutex_t mutex;
+            node_base *node_list;
+        };
+        //! Count of segments in the first block
+        static size_type const embedded_block = 1;
+        //! Count of segments in the first block
+        static size_type const embedded_buckets = 1<<embedded_block;
+        //! Count of segments in the first block
+        static size_type const first_block = 8; //including embedded_block. perfect with bucket size 16, so the allocations are power of 4096
+        //! Size of a pointer / table size
+        static size_type const pointers_per_table = sizeof(segment_index_t) * 8; // one segment per bit
+        //! Segment pointer
+        typedef bucket *segment_ptr_t;
+        //! Segment pointers table type
+        typedef segment_ptr_t segments_table_t[pointers_per_table];
+        //! Hash mask = sum of allocated segments sizes - 1
+        atomic<hashcode_t> my_mask;
+        //! Segment pointers table. Also prevents false sharing between my_mask and my_size
+        segments_table_t my_table;
+        //! Size of container in stored items
+        atomic<size_type> my_size; // It must be in separate cache line from my_mask due to performance effects
+        //! Zero segment
+        bucket my_embedded_segment[embedded_buckets];
+
+        //! Constructor
+        hash_map_base() {
+            std::memset( this, 0, pointers_per_table*sizeof(segment_ptr_t) // 32*4=128   or 64*8=512
+                + sizeof(my_size) + sizeof(my_mask)  // 4+4 or 8+8
+                + embedded_buckets*sizeof(bucket) ); // n*8 or n*16
+            for( size_type i = 0; i < embedded_block; i++ ) // fill the table
+                my_table[i] = my_embedded_segment + segment_base(i);
+            my_mask = embedded_buckets - 1;
+            __TBB_ASSERT( embedded_block <= first_block, "The first block number must include embedded blocks");
+        }
+
+        //! @return segment index of given index in the array
+        static segment_index_t segment_index_of( size_type index ) {
+            return segment_index_t( __TBB_Log2( index|1 ) );
+        }
+
+        //! @return the first array index of given segment
+        static segment_index_t segment_base( segment_index_t k ) {
+            return (segment_index_t(1)<<k & ~segment_index_t(1));
+        }
+
+        //! @return segment size except for @arg k == 0
+        static size_type segment_size( segment_index_t k ) {
+            return size_type(1)<<k; // fake value for k==0
+        }
+        
+        //! @return true if @arg ptr is valid pointer
+        static bool is_valid( void *ptr ) {
+            return reinterpret_cast<size_t>(ptr) > size_t(63);
+        }
+
+        //! Initialize buckets
+        static void init_buckets( segment_ptr_t ptr, size_type sz, bool is_initial ) {
+            if( is_initial ) std::memset(ptr, 0, sz*sizeof(bucket) );
+            else for(size_type i = 0; i < sz; i++, ptr++) {
+                    *reinterpret_cast<intptr_t*>(&ptr->mutex) = 0;
+                    ptr->node_list = rehash_req;
+                }
+        }
+        
+        //! Add node @arg n to bucket @arg b
+        static void add_to_bucket( bucket *b, node_base *n ) {
+            __TBB_ASSERT(b->node_list != rehash_req, NULL);
+            n->next = b->node_list;
+            b->node_list = n; // its under lock and flag is set
+        }
+
+        //! Exception safety helper
+        struct enable_segment_failsafe {
+            segment_ptr_t *my_segment_ptr;
+            enable_segment_failsafe(segments_table_t &table, segment_index_t k) : my_segment_ptr(&table[k]) {}
+            ~enable_segment_failsafe() {
+                if( my_segment_ptr ) *my_segment_ptr = 0; // indicate no allocation in progress
+            }
+        };
+
+        //! Enable segment
+        void enable_segment( segment_index_t k, bool is_initial = false ) {
+            __TBB_ASSERT( k, "Zero segment must be embedded" );
+            enable_segment_failsafe watchdog( my_table, k );
+            cache_aligned_allocator<bucket> alloc;
+            size_type sz;
+            __TBB_ASSERT( !is_valid(my_table[k]), "Wrong concurrent assignment");
+            if( k >= first_block ) {
+                sz = segment_size( k );
+                segment_ptr_t ptr = alloc.allocate( sz );
+                init_buckets( ptr, sz, is_initial );
+#if TBB_USE_THREADING_TOOLS
+                // TODO: actually, fence and notification are unnecessary here and below
+                itt_store_pointer_with_release_v3( my_table + k, ptr );
+#else
+                my_table[k] = ptr;// my_mask has release fence
+#endif
+                sz <<= 1;// double it to get entire capacity of the container
+            } else { // the first block
+                __TBB_ASSERT( k == embedded_block, "Wrong segment index" );
+                sz = segment_size( first_block );
+                segment_ptr_t ptr = alloc.allocate( sz - embedded_buckets );
+                init_buckets( ptr, sz - embedded_buckets, is_initial );
+                ptr -= segment_base(embedded_block);
+                for(segment_index_t i = embedded_block; i < first_block; i++) // calc the offsets
+#if TBB_USE_THREADING_TOOLS
+                    itt_store_pointer_with_release_v3( my_table + i, ptr + segment_base(i) );
+#else
+                    my_table[i] = ptr + segment_base(i);
+#endif
+            }
+#if TBB_USE_THREADING_TOOLS
+            itt_store_pointer_with_release_v3( &my_mask, (void*)(sz-1) );
+#else
+            my_mask = sz - 1;
+#endif
+            watchdog.my_segment_ptr = 0;
+        }
+
+        //! Get bucket by (masked) hashcode
+        bucket *get_bucket( hashcode_t h ) const throw() { // TODO: add throw() everywhere?
+            segment_index_t s = segment_index_of( h );
+            h -= segment_base(s);
+            segment_ptr_t seg = my_table[s];
+            __TBB_ASSERT( is_valid(seg), "hashcode must be cut by valid mask for allocated segments" );
+            return &seg[h];
+        }
+
+        //! Check for mask race
+        // Splitting into two functions should help inlining
+        inline bool check_mask_race( const hashcode_t h, hashcode_t &m ) const {
+            hashcode_t m_now, m_old = m;
+#if TBB_USE_THREADING_TOOLS
+            m_now = (hashcode_t) itt_load_pointer_with_acquire_v3( &my_mask );
+#else
+            m_now = my_mask;
+#endif
+            if( m_old != m_now )
+                return check_rehashing_collision( h, m_old, m = m_now );
+            return false;
+        }
+
+        //! Process mask race, check for rehashing collision
+        bool check_rehashing_collision( const hashcode_t h, hashcode_t m_old, hashcode_t m ) const {
+            __TBB_ASSERT(m_old != m, NULL); // TODO?: m arg could be optimized out by passing h = h&m
+            if( (h & m_old) != (h & m) ) { // mask changed for this hashcode, rare event
+                // condition above proves that 'h' has some other bits set beside 'm_old'
+                // find next applicable mask after m_old    //TODO: look at bsl instruction
+                for( ++m_old; !(h & m_old); m_old <<= 1 ); // at maximum few rounds depending on the first block size
+                m_old = (m_old<<1) - 1; // get full mask from a bit
+                __TBB_ASSERT((m_old&(m_old+1))==0 && m_old <= m, NULL);
+                // check whether it is rehashing/ed
+#if TBB_USE_THREADING_TOOLS
+                if( itt_load_pointer_with_acquire_v3(&( get_bucket(h & m_old)->node_list )) != rehash_req )
+#else
+                if( __TBB_load_with_acquire(get_bucket( h & m_old )->node_list) != rehash_req )
+#endif
+                    return true;
+            }
+            return false;
+        }
+
+        //! Insert a node and check for load factor. @return segment index to enable.
+        segment_index_t insert_new_node( bucket *b, node_base *n, hashcode_t mask ) {
+            size_type sz = ++my_size; // prefix form is to enforce allocation after the first item inserted
+            add_to_bucket( b, n );
+            // check load factor
+            if( sz >= mask ) { // TODO: add custom load_factor 
+                segment_index_t new_seg = segment_index_of( mask+1 );
+                __TBB_ASSERT( is_valid(my_table[new_seg-1]), "new allocations must not publish new mask until segment has allocated");
+#if TBB_USE_THREADING_TOOLS
+                if( !itt_load_pointer_v3(my_table+new_seg)
+#else
+                if( !my_table[new_seg]
+#endif
+                  && __TBB_CompareAndSwapW(&my_table[new_seg], 2, 0) == 0 )
+                    return new_seg; // The value must be processed
+            }
+            return 0;
+        }
+
+        //! Prepare enough segments for number of buckets
+        void reserve(size_type buckets) {
+            if( !buckets-- ) return;
+            bool is_initial = !my_size;
+            for( size_type m = my_mask; buckets > m; m = my_mask )
+                enable_segment( segment_index_of( m+1 ), is_initial );
+        }
+        //! Swap hash_map_bases
+        void internal_swap(hash_map_base &table) {
+            std::swap(this->my_mask, table.my_mask);
+            std::swap(this->my_size, table.my_size);
+            for(size_type i = 0; i < embedded_buckets; i++)
+                std::swap(this->my_embedded_segment[i].node_list, table.my_embedded_segment[i].node_list);
+            for(size_type i = embedded_block; i < pointers_per_table; i++)
+                std::swap(this->my_table[i], table.my_table[i]);
+        }
+    };
+
+    template<typename Iterator>
+    class hash_map_range;
+
+    //! Meets requirements of a forward iterator for STL */
+    /** Value is either the T or const T type of the container.
+        @ingroup containers */ 
+    template<typename Container, typename Value>
+    class hash_map_iterator
+        : public std::iterator<std::forward_iterator_tag,Value>
+    {
+        typedef Container map_type;
+        typedef typename Container::node node;
+        typedef hash_map_base::node_base node_base;
+        typedef hash_map_base::bucket bucket;
+
+        template<typename C, typename T, typename U>
+        friend bool operator==( const hash_map_iterator<C,T>& i, const hash_map_iterator<C,U>& j );
+
+        template<typename C, typename T, typename U>
+        friend bool operator!=( const hash_map_iterator<C,T>& i, const hash_map_iterator<C,U>& j );
+
+        template<typename C, typename T, typename U>
+        friend ptrdiff_t operator-( const hash_map_iterator<C,T>& i, const hash_map_iterator<C,U>& j );
+    
+        template<typename C, typename U>
+        friend class internal::hash_map_iterator;
+
+        template<typename I>
+        friend class internal::hash_map_range;
+
+        void advance_to_next_bucket() { // TODO?: refactor to iterator_base class
+            size_t k = my_index+1;
+            while( my_bucket && k <= my_map->my_mask ) {
+                // Following test uses 2's-complement wizardry
+                if( k& (k-2) ) // not the beginning of a segment
+                    ++my_bucket;
+                else my_bucket = my_map->get_bucket( k );
+                my_node = static_cast<node*>( my_bucket->node_list );
+                if( hash_map_base::is_valid(my_node) ) {
+                    my_index = k; return;
+                }
+                ++k;
+            }
+            my_bucket = 0; my_node = 0; my_index = k; // the end
+        }
+#if !defined(_MSC_VER) || defined(__INTEL_COMPILER)
+        template<typename Key, typename T, typename HashCompare, typename A>
+        friend class tbb::concurrent_hash_map;
+#else
+    public: // workaround
+#endif
+        //! concurrent_hash_map over which we are iterating.
+        const Container *my_map;
+
+        //! Index in hash table for current item
+        size_t my_index;
+
+        //! Pointer to bucket
+        const bucket *my_bucket;
+
+        //! Pointer to node that has current item
+        node *my_node;
+
+        hash_map_iterator( const Container &map, size_t index, const bucket *b, node_base *n );
+
+    public:
+        //! Construct undefined iterator
+        hash_map_iterator() {}
+        hash_map_iterator( const hash_map_iterator<Container,typename Container::value_type> &other ) :
+            my_map(other.my_map),
+            my_index(other.my_index),
+            my_bucket(other.my_bucket),
+            my_node(other.my_node)
+        {}
+        Value& operator*() const {
+            __TBB_ASSERT( hash_map_base::is_valid(my_node), "iterator uninitialized or at end of container?" );
+            return my_node->item;
+        }
+        Value* operator->() const {return &operator*();}
+        hash_map_iterator& operator++();
+        
+        //! Post increment
+        Value* operator++(int) {
+            Value* result = &operator*();
+            operator++();
+            return result;
+        }
+    };
+
+    template<typename Container, typename Value>
+    hash_map_iterator<Container,Value>::hash_map_iterator( const Container &map, size_t index, const bucket *b, node_base *n ) :
+        my_map(&map),
+        my_index(index),
+        my_bucket(b),
+        my_node( static_cast<node*>(n) )
+    {
+        if( b && !hash_map_base::is_valid(n) )
+            advance_to_next_bucket();
+    }
+
+    template<typename Container, typename Value>
+    hash_map_iterator<Container,Value>& hash_map_iterator<Container,Value>::operator++() {
+        my_node = static_cast<node*>( my_node->next );
+        if( !my_node ) advance_to_next_bucket();
+        return *this;
+    }
+
+    template<typename Container, typename T, typename U>
+    bool operator==( const hash_map_iterator<Container,T>& i, const hash_map_iterator<Container,U>& j ) {
+        return i.my_node == j.my_node && i.my_map == j.my_map;
+    }
+
+    template<typename Container, typename T, typename U>
+    bool operator!=( const hash_map_iterator<Container,T>& i, const hash_map_iterator<Container,U>& j ) {
+        return i.my_node != j.my_node || i.my_map != j.my_map;
+    }
+
+    //! Range class used with concurrent_hash_map
+    /** @ingroup containers */ 
+    template<typename Iterator>
+    class hash_map_range {
+        typedef typename Iterator::map_type map_type;
+        Iterator my_begin;
+        Iterator my_end;
+        mutable Iterator my_midpoint;
+        size_t my_grainsize;
+        //! Set my_midpoint to point approximately half way between my_begin and my_end.
+        void set_midpoint() const;
+        template<typename U> friend class hash_map_range;
+    public:
+        //! Type for size of a range
+        typedef std::size_t size_type;
+        typedef typename Iterator::value_type value_type;
+        typedef typename Iterator::reference reference;
+        typedef typename Iterator::difference_type difference_type;
+        typedef Iterator iterator;
+
+        //! True if range is empty.
+        bool empty() const {return my_begin==my_end;}
+
+        //! True if range can be partitioned into two subranges.
+        bool is_divisible() const {
+            return my_midpoint!=my_end;
+        }
+        //! Split range.
+        hash_map_range( hash_map_range& r, split ) : 
+            my_end(r.my_end),
+            my_grainsize(r.my_grainsize)
+        {
+            r.my_end = my_begin = r.my_midpoint;
+            __TBB_ASSERT( !empty(), "Splitting despite the range is not divisible" );
+            __TBB_ASSERT( !r.empty(), "Splitting despite the range is not divisible" );
+            set_midpoint();
+            r.set_midpoint();
+        }
+        //! type conversion
+        template<typename U>
+        hash_map_range( hash_map_range<U>& r) : 
+            my_begin(r.my_begin),
+            my_end(r.my_end),
+            my_midpoint(r.my_midpoint),
+            my_grainsize(r.my_grainsize)
+        {}
+#if TBB_DEPRECATED
+        //! Init range with iterators and grainsize specified
+        hash_map_range( const Iterator& begin_, const Iterator& end_, size_type grainsize = 1 ) : 
+            my_begin(begin_), 
+            my_end(end_),
+            my_grainsize(grainsize)
+        {
+            if(!my_end.my_index && !my_end.my_bucket) // end
+                my_end.my_index = my_end.my_map->my_mask + 1;
+            set_midpoint();
+            __TBB_ASSERT( grainsize>0, "grainsize must be positive" );
+        }
+#endif
+        //! Init range with container and grainsize specified
+        hash_map_range( const map_type &map, size_type grainsize = 1 ) : 
+            my_begin( Iterator( map, 0, map.my_embedded_segment, map.my_embedded_segment->node_list ) ),
+            my_end( Iterator( map, map.my_mask + 1, 0, 0 ) ),
+            my_grainsize( grainsize )
+        {
+            __TBB_ASSERT( grainsize>0, "grainsize must be positive" );
+            set_midpoint();
+        }
+        const Iterator& begin() const {return my_begin;}
+        const Iterator& end() const {return my_end;}
+        //! The grain size for this range.
+        size_type grainsize() const {return my_grainsize;}
+    };
+
+    template<typename Iterator>
+    void hash_map_range<Iterator>::set_midpoint() const {
+        // Split by groups of nodes
+        size_t m = my_end.my_index-my_begin.my_index;
+        if( m > my_grainsize ) {
+            m = my_begin.my_index + m/2u;
+            hash_map_base::bucket *b = my_begin.my_map->get_bucket(m);
+            my_midpoint = Iterator(*my_begin.my_map,m,b,b->node_list);
+        } else {
+            my_midpoint = my_end;
+        }
+        __TBB_ASSERT( my_begin.my_index <= my_midpoint.my_index,
+            "my_begin is after my_midpoint" );
+        __TBB_ASSERT( my_midpoint.my_index <= my_end.my_index,
+            "my_midpoint is after my_end" );
+        __TBB_ASSERT( my_begin != my_midpoint || my_begin == my_end,
+            "[my_begin, my_midpoint) range should not be empty" );
+    }
+} // namespace internal
+//! @endcond
+
+//! Hash multiplier
+static const size_t hash_multiplier = sizeof(size_t)==4? 2654435769U : 11400714819323198485ULL;
+//! Hasher functions
+template<typename T>
+inline static size_t tbb_hasher( const T& t ) {
+    return static_cast<size_t>( t ) * hash_multiplier;
+}
+template<typename P>
+inline static size_t tbb_hasher( P* ptr ) {
+    size_t const h = reinterpret_cast<size_t>( ptr );
+    return (h >> 3) ^ h;
+}
+template<typename E, typename S, typename A>
+inline static size_t tbb_hasher( const std::basic_string<E,S,A>& s ) {
+    size_t h = 0;
+    for( const E* c = s.c_str(); *c; c++ )
+        h = static_cast<size_t>(*c) ^ (h * hash_multiplier);
+    return h;
+}
+template<typename F, typename S>
+inline static size_t tbb_hasher( const std::pair<F,S>& p ) {
+    return tbb_hasher(p.first) ^ tbb_hasher(p.second);
+}
+
+//! hash_compare - default argument
+template<typename T>
+struct tbb_hash_compare {
+    static size_t hash( const T& t ) { return tbb_hasher(t); }
+    static bool equal( const T& a, const T& b ) { return a == b; }
+};
+
+//! Unordered map from Key to T.
+/** concurrent_hash_map is associative container with concurrent access.
+
+@par Compatibility
+    The class meets all Container Requirements from C++ Standard (See ISO/IEC 14882:2003(E), clause 23.1).
+
+@par Exception Safety
+    - Hash function is not permitted to throw an exception. User-defined types Key and T are forbidden from throwing an exception in destructors.
+    - If exception happens during insert() operations, it has no effect (unless exception raised by HashCompare::hash() function during grow_segment).
+    - If exception happens during operator=() operation, the container can have a part of source items, and methods size() and empty() can return wrong results.
+
+@par Changes since TBB 2.1
+    - Replaced internal algorithm and data structure. Patent is pending.
+    - Added buckets number argument for constructor
+
+@par Changes since TBB 2.0
+    - Fixed exception-safety
+    - Added template argument for allocator
+    - Added allocator argument in constructors
+    - Added constructor from a range of iterators
+    - Added several new overloaded insert() methods
+    - Added get_allocator()
+    - Added swap()
+    - Added count()
+    - Added overloaded erase(accessor &) and erase(const_accessor&)
+    - Added equal_range() [const]
+    - Added [const_]pointer, [const_]reference, and allocator_type types
+    - Added global functions: operator==(), operator!=(), and swap() 
+
+    @ingroup containers */
+template<typename Key, typename T, typename HashCompare, typename Allocator>
+class concurrent_hash_map : protected internal::hash_map_base {
+    template<typename Container, typename Value>
+    friend class internal::hash_map_iterator;
+
+    template<typename I>
+    friend class internal::hash_map_range;
+
+public:
+    typedef Key key_type;
+    typedef T mapped_type;
+    typedef std::pair<const Key,T> value_type;
+    typedef internal::hash_map_base::size_type size_type;
+    typedef ptrdiff_t difference_type;
+    typedef value_type *pointer;
+    typedef const value_type *const_pointer;
+    typedef value_type &reference;
+    typedef const value_type &const_reference;
+    typedef internal::hash_map_iterator<concurrent_hash_map,value_type> iterator;
+    typedef internal::hash_map_iterator<concurrent_hash_map,const value_type> const_iterator;
+    typedef internal::hash_map_range<iterator> range_type;
+    typedef internal::hash_map_range<const_iterator> const_range_type;
+    typedef Allocator allocator_type;
+
+protected:
+    friend class const_accessor;
+    struct node;
+    typedef typename Allocator::template rebind<node>::other node_allocator_type;
+    node_allocator_type my_allocator;
+    HashCompare my_hash_compare;
+
+    struct node : public node_base {
+        value_type item;
+        node( const Key &key ) : item(key, T()) {}
+        node( const Key &key, const T &t ) : item(key, t) {}
+        // exception-safe allocation, see C++ Standard 2003, clause 5.3.4p17
+        void *operator new( size_t /*size*/, node_allocator_type &a ) {
+            void *ptr = a.allocate(1);
+            if(!ptr) throw std::bad_alloc();
+            return ptr;
+        }
+        // match placement-new form above to be called if exception thrown in constructor
+        void operator delete( void *ptr, node_allocator_type &a ) {return a.deallocate(static_cast<node*>(ptr),1); }
+    };
+
+    void delete_node( node_base *n ) {
+        my_allocator.destroy( static_cast<node*>(n) );
+        my_allocator.deallocate( static_cast<node*>(n), 1);
+    }
+
+    node *search_bucket( const key_type &key, bucket *b ) const {
+        node *n = static_cast<node*>( b->node_list );
+        while( is_valid(n) && !my_hash_compare.equal(key, n->item.first) )
+            n = static_cast<node*>( n->next );
+        __TBB_ASSERT(n != internal::rehash_req, "Search can be executed only for rehashed bucket");
+        return n;
+    }
+
+    //! bucket accessor is to find, rehash, acquire a lock, and access a bucket
+    class bucket_accessor : public bucket::scoped_t {
+        bool my_is_writer; // TODO: use it from base type
+        bucket *my_b;
+    public:
+        bucket_accessor( concurrent_hash_map *base, const hashcode_t h, bool writer = false ) { acquire( base, h, writer ); }
+        //! find a bucket by masked hashcode, optionally rehash, and acquire the lock
+        inline void acquire( concurrent_hash_map *base, const hashcode_t h, bool writer = false ) {
+            my_b = base->get_bucket( h );
+#if TBB_USE_THREADING_TOOLS
+            // TODO: actually, notification is unnecessary here, just hiding double-check
+            if( itt_load_pointer_with_acquire_v3(&my_b->node_list) == internal::rehash_req
+#else
+            if( __TBB_load_with_acquire(my_b->node_list) == internal::rehash_req
+#endif
+                && try_acquire( my_b->mutex, /*write=*/true ) )
+            {
+                if( my_b->node_list == internal::rehash_req ) base->rehash_bucket( my_b, h ); //recursive rehashing
+                my_is_writer = true;
+            }
+            else bucket::scoped_t::acquire( my_b->mutex, /*write=*/my_is_writer = writer );
+            __TBB_ASSERT( my_b->node_list != internal::rehash_req, NULL);
+        }
+        //! check whether bucket is locked for write
+        bool is_writer() { return my_is_writer; }
+        //! get bucket pointer
+        bucket *operator() () { return my_b; }
+        // TODO: optimize out
+        bool upgrade_to_writer() { my_is_writer = true; return bucket::scoped_t::upgrade_to_writer(); }
+    };
+
+    // TODO refactor to hash_base
+    void rehash_bucket( bucket *b_new, const hashcode_t h ) {
+        __TBB_ASSERT( *(intptr_t*)(&b_new->mutex), "b_new must be locked (for write)");
+        __TBB_ASSERT( h > 1, "The lowermost buckets can't be rehashed" );
+        __TBB_store_with_release(b_new->node_list, internal::empty_rehashed); // mark rehashed
+        hashcode_t mask = ( 1u<<__TBB_Log2( h ) ) - 1; // get parent mask from the topmost bit
+
+        bucket_accessor b_old( this, h & mask );
+
+        mask = (mask<<1) | 1; // get full mask for new bucket
+        __TBB_ASSERT( (mask&(mask+1))==0 && (h & mask) == h, NULL );
+    restart:
+        for( node_base **p = &b_old()->node_list, *n = __TBB_load_with_acquire(*p); is_valid(n); n = *p ) {
+            hashcode_t c = my_hash_compare.hash( static_cast<node*>(n)->item.first );
+            if( (c & mask) == h ) {
+                if( !b_old.is_writer() )
+                    if( !b_old.upgrade_to_writer() ) {
+                        goto restart; // node ptr can be invalid due to concurrent erase
+                    }
+                *p = n->next; // exclude from b_old
+                add_to_bucket( b_new, n );
+            } else p = &n->next; // iterate to next item
+        }
+    }
+
+public:
+    
+    class accessor;
+    //! Combines data access, locking, and garbage collection.
+    class const_accessor {
+        friend class concurrent_hash_map<Key,T,HashCompare,Allocator>;
+        friend class accessor;
+        void operator=( const accessor & ) const; // Deny access
+        const_accessor( const accessor & );       // Deny access
+    public:
+        //! Type of value
+        typedef const typename concurrent_hash_map::value_type value_type;
+
+        //! True if result is empty.
+        bool empty() const {return !my_node;}
+
+        //! Set to null
+        void release() {
+            if( my_node ) {
+                my_lock.release();
+                my_node = 0;
+            }
+        }
+
+        //! Return reference to associated value in hash table.
+        const_reference operator*() const {
+            __TBB_ASSERT( my_node, "attempt to dereference empty accessor" );
+            return my_node->item;
+        }
+
+        //! Return pointer to associated value in hash table.
+        const_pointer operator->() const {
+            return &operator*();
+        }
+
+        //! Create empty result
+        const_accessor() : my_node(NULL) {}
+
+        //! Destroy result after releasing the underlying reference.
+        ~const_accessor() {
+            my_node = NULL; // my_lock.release() is called in scoped_lock destructor
+        }
+    private:
+        node *my_node;
+        typename node::scoped_t my_lock;
+        hashcode_t my_hash;
+    };
+
+    //! Allows write access to elements and combines data access, locking, and garbage collection.
+    class accessor: public const_accessor {
+    public:
+        //! Type of value
+        typedef typename concurrent_hash_map::value_type value_type;
+
+        //! Return reference to associated value in hash table.
+        reference operator*() const {
+            __TBB_ASSERT( this->my_node, "attempt to dereference empty accessor" );
+            return this->my_node->item;
+        }
+
+        //! Return pointer to associated value in hash table.
+        pointer operator->() const {
+            return &operator*();
+        }
+    };
+
+    //! Construct empty table.
+    concurrent_hash_map(const allocator_type &a = allocator_type())
+        : my_allocator(a)
+    {}
+
+    //! Construct empty table with n preallocated buckets. This number serves also as initial concurrency level.
+    concurrent_hash_map(size_type n, const allocator_type &a = allocator_type())
+        : my_allocator(a)
+    {
+        reserve( n );
+    }
+
+    //! Copy constructor
+    concurrent_hash_map( const concurrent_hash_map& table, const allocator_type &a = allocator_type())
+        : my_allocator(a)
+    {
+        internal_copy(table);
+    }
+
+    //! Construction with copying iteration range and given allocator instance
+    template<typename I>
+    concurrent_hash_map(I first, I last, const allocator_type &a = allocator_type())
+        : my_allocator(a)
+    {
+        reserve( std::distance(first, last) ); // TODO: load_factor?
+        internal_copy(first, last);
+    }
+
+    //! Assignment
+    concurrent_hash_map& operator=( const concurrent_hash_map& table ) {
+        if( this!=&table ) {
+            clear();
+            internal_copy(table);
+        } 
+        return *this;
+    }
+
+
+    //! Clear table
+    void clear();
+
+    //! Clear table and destroy it.  
+    ~concurrent_hash_map() { clear(); }
+
+    //------------------------------------------------------------------------
+    // Parallel algorithm support
+    //------------------------------------------------------------------------
+    range_type range( size_type grainsize=1 ) {
+        return range_type( *this, grainsize );
+    }
+    const_range_type range( size_type grainsize=1 ) const {
+        return const_range_type( *this, grainsize );
+    }
+
+    //------------------------------------------------------------------------
+    // STL support - not thread-safe methods
+    //------------------------------------------------------------------------
+    iterator begin() {return iterator(*this,0,my_embedded_segment,my_embedded_segment->node_list);}
+    iterator end() {return iterator(*this,0,0,0);}
+    const_iterator begin() const {return const_iterator(*this,0,my_embedded_segment,my_embedded_segment->node_list);}
+    const_iterator end() const {return const_iterator(*this,0,0,0);}
+    std::pair<iterator, iterator> equal_range( const Key& key ) { return internal_equal_range(key, end()); }
+    std::pair<const_iterator, const_iterator> equal_range( const Key& key ) const { return internal_equal_range(key, end()); }
+    
+    //! Number of items in table.
+    size_type size() const { return my_size; }
+
+    //! True if size()==0.
+    bool empty() const { return my_size == 0; }
+
+    //! Upper bound on size.
+    size_type max_size() const {return (~size_type(0))/sizeof(node);}
+
+    //! return allocator object
+    allocator_type get_allocator() const { return this->my_allocator; }
+
+    //! swap two instances. Iterators are invalidated
+    void swap(concurrent_hash_map &table);
+
+    //------------------------------------------------------------------------
+    // concurrent map operations
+    //------------------------------------------------------------------------
+
+    //! Return count of items (0 or 1)
+    size_type count( const Key &key ) const {
+        return const_cast<concurrent_hash_map*>(this)->lookup(/*insert*/false, key, NULL, NULL, /*write=*/false );
+    }
+
+    //! Find item and acquire a read lock on the item.
+    /** Return true if item is found, false otherwise. */
+    bool find( const_accessor &result, const Key &key ) const {
+        result.release();
+        return const_cast<concurrent_hash_map*>(this)->lookup(/*insert*/false, key, NULL, &result, /*write=*/false );
+    }
+
+    //! Find item and acquire a write lock on the item.
+    /** Return true if item is found, false otherwise. */
+    bool find( accessor &result, const Key &key ) {
+        result.release();
+        return lookup(/*insert*/false, key, NULL, &result, /*write=*/true );
+    }
+        
+    //! Insert item (if not already present) and acquire a read lock on the item.
+    /** Returns true if item is new. */
+    bool insert( const_accessor &result, const Key &key ) {
+        result.release();
+        return lookup(/*insert*/true, key, NULL, &result, /*write=*/false );
+    }
+
+    //! Insert item (if not already present) and acquire a write lock on the item.
+    /** Returns true if item is new. */
+    bool insert( accessor &result, const Key &key ) {
+        result.release();
+        return lookup(/*insert*/true, key, NULL, &result, /*write=*/true );
+    }
+
+    //! Insert item by copying if there is no such key present already and acquire a read lock on the item.
+    /** Returns true if item is new. */
+    bool insert( const_accessor &result, const value_type &value ) {
+        result.release();
+        return lookup(/*insert*/true, value.first, &value.second, &result, /*write=*/false );
+    }
+
+    //! Insert item by copying if there is no such key present already and acquire a write lock on the item.
+    /** Returns true if item is new. */
+    bool insert( accessor &result, const value_type &value ) {
+        result.release();
+        return lookup(/*insert*/true, value.first, &value.second, &result, /*write=*/true );
+    }
+
+    //! Insert item by copying if there is no such key present already
+    /** Returns true if item is inserted. */
+    bool insert( const value_type &value ) {
+        return lookup(/*insert*/true, value.first, &value.second, NULL, /*write=*/false );
+    }
+
+    //! Insert range [first, last)
+    template<typename I>
+    void insert(I first, I last) {
+        for(; first != last; ++first)
+            insert( *first );
+    }
+
+    //! Erase item.
+    /** Return true if item was erased by particularly this call. */
+    bool erase( const Key& key );
+
+    //! Erase item by const_accessor.
+    /** Return true if item was erased by particularly this call. */
+    bool erase( const_accessor& item_accessor ) {
+        return exclude( item_accessor, /*readonly=*/ true );
+    }
+
+    //! Erase item by accessor.
+    /** Return true if item was erased by particularly this call. */
+    bool erase( accessor& item_accessor ) {
+        return exclude( item_accessor, /*readonly=*/ false );
+    }
+
+protected:
+    //! Insert or find item and optionally acquire a lock on the item.
+    bool lookup( bool op_insert, const Key &key, const T *t, const_accessor *result, bool write );
+
+    //! delete item by accessor
+    bool exclude( const_accessor &item_accessor, bool readonly );
+
+    //! Returns an iterator for an item defined by the key, or for the next item after it (if upper==true)
+    template<typename I>
+    std::pair<I, I> internal_equal_range( const Key& key, I end ) const;
+
+    //! Copy "source" to *this, where *this must start out empty.
+    void internal_copy( const concurrent_hash_map& source );
+
+    template<typename I>
+    void internal_copy(I first, I last);
+
+    //! Fast find when no concurrent erasure is used. For internal use inside TBB only!
+    /** Return pointer to item with given key, or NULL if no such item exists.
+        Must not be called concurrently with erasure operations. */
+    const_pointer internal_fast_find( const Key& key ) const {
+        hashcode_t h = my_hash_compare.hash( key );
+#if TBB_USE_THREADING_TOOLS
+        hashcode_t m = (hashcode_t) itt_load_pointer_with_acquire_v3( &my_mask );
+#else
+        hashcode_t m = my_mask;
+#endif
+        node *n;
+    restart:
+        __TBB_ASSERT((m&(m+1))==0, NULL);
+        bucket *b = get_bucket( h & m );
+#if TBB_USE_THREADING_TOOLS
+        // TODO: actually, notification is unnecessary here, just hiding double-check
+        if( itt_load_pointer_with_acquire_v3(&b->node_list) == internal::rehash_req )
+#else
+        if( __TBB_load_with_acquire(b->node_list) == internal::rehash_req )
+#endif
+        {
+            bucket::scoped_t lock;
+            if( lock.try_acquire( b->mutex, /*write=*/true ) ) {
+                if( b->node_list == internal::rehash_req)
+                    const_cast<concurrent_hash_map*>(this)->rehash_bucket( b, h & m ); //recursive rehashing
+            }
+            else lock.acquire( b->mutex, /*write=*/false );
+            __TBB_ASSERT(b->node_list!=internal::rehash_req,NULL);
+        }
+        n = search_bucket( key, b );
+        if( n )
+            return &n->item;
+        else if( check_mask_race( h, m ) )
+            goto restart;
+        return 0;
+    }
+};
+
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+    // Suppress "conditional expression is constant" warning.
+    #pragma warning( push )
+    #pragma warning( disable: 4127 )
+#endif
+
+template<typename Key, typename T, typename HashCompare, typename A>
+bool concurrent_hash_map<Key,T,HashCompare,A>::lookup( bool op_insert, const Key &key, const T *t, const_accessor *result, bool write ) {
+    __TBB_ASSERT( !result || !result->my_node, NULL );
+    segment_index_t grow_segment;
+    bool return_value;
+    node *n, *tmp_n = 0;
+    hashcode_t const h = my_hash_compare.hash( key );
+#if TBB_USE_THREADING_TOOLS
+    hashcode_t m = (hashcode_t) itt_load_pointer_with_acquire_v3( &my_mask );
+#else
+    hashcode_t m = my_mask;
+#endif
+    restart:
+    {//lock scope
+        __TBB_ASSERT((m&(m+1))==0, NULL);
+        return_value = false;
+        // get bucket
+        bucket_accessor b( this, h & m );
+
+        // find a node
+        n = search_bucket( key, b() );
+        if( op_insert ) {
+            // [opt] insert a key
+            if( !n ) {
+                if( !tmp_n ) {
+                    if(t) tmp_n = new( my_allocator ) node(key, *t);
+                    else  tmp_n = new( my_allocator ) node(key);
+                }
+                if( !b.is_writer() && !b.upgrade_to_writer() ) { // TODO: improved insertion
+                    // Rerun search_list, in case another thread inserted the item during the upgrade.
+                    n = search_bucket( key, b() );
+                    if( is_valid(n) ) { // unfortunately, it did
+                        b.downgrade_to_reader();
+                        goto exists;
+                    }
+                }
+                if( check_mask_race(h, m) )
+                    goto restart; // b.release() is done in ~b().
+                // insert and set flag to grow the container
+                grow_segment = insert_new_node( b(), n = tmp_n, m );
+                tmp_n = 0;
+                return_value = true;
+            } else {
+    exists:     grow_segment = 0;
+            }
+        } else { // find or count
+            if( !n ) {
+                if( check_mask_race( h, m ) )
+                    goto restart; // b.release() is done in ~b(). TODO: replace by continue
+                return false;
+            }
+            return_value = true;
+            grow_segment = 0;
+        }
+        if( !result ) goto check_growth;
+        // TODO: the following seems as generic/regular operation
+        // acquire the item
+        if( !result->my_lock.try_acquire( n->mutex, write ) ) {
+            // we are unlucky, prepare for longer wait
+            internal::atomic_backoff trials;
+            do {
+                if( !trials.bounded_pause() ) {
+                    // the wait takes really long, restart the operation
+                    b.release();
+                    __TBB_ASSERT( !op_insert || !return_value, "Can't acquire new item in locked bucket?" );
+                    __TBB_Yield();
+                    m = my_mask;
+                    goto restart;
+                }
+            } while( !result->my_lock.try_acquire( n->mutex, write ) );
+        }
+    }//lock scope
+    result->my_node = n;
+    result->my_hash = h;
+check_growth:
+    // [opt] grow the container
+    if( grow_segment )
+        enable_segment( grow_segment );
+    if( tmp_n ) // if op_insert only
+        delete_node( tmp_n );
+    return return_value;
+}
+
+template<typename Key, typename T, typename HashCompare, typename A>
+template<typename I>
+std::pair<I, I> concurrent_hash_map<Key,T,HashCompare,A>::internal_equal_range( const Key& key, I end ) const {
+    hashcode_t h = my_hash_compare.hash( key );
+    hashcode_t m = my_mask;
+    __TBB_ASSERT((m&(m+1))==0, NULL);
+    h &= m;
+    bucket *b = get_bucket( h );
+    while( b->node_list == internal::rehash_req ) {
+        m = ( 1u<<__TBB_Log2( h ) ) - 1; // get parent mask from the topmost bit
+        b = get_bucket( h &= m );
+    }
+    node *n = search_bucket( key, b );
+    if( !n )
+        return std::make_pair(end, end);
+    iterator lower(*this, h, b, n), upper(lower);
+    return std::make_pair(lower, ++upper);
+}
+
+template<typename Key, typename T, typename HashCompare, typename A>
+bool concurrent_hash_map<Key,T,HashCompare,A>::exclude( const_accessor &item_accessor, bool readonly ) {
+    __TBB_ASSERT( item_accessor.my_node, NULL );
+    node_base *const n = item_accessor.my_node;
+    item_accessor.my_node = NULL; // we ought release accessor anyway
+    hashcode_t const h = item_accessor.my_hash;
+    hashcode_t m = my_mask;
+    do {
+        // get bucket
+        bucket_accessor b( this, h & m, /*writer=*/true );
+        node_base **p = &b()->node_list;
+        while( *p && *p != n )
+            p = &(*p)->next;
+        if( !*p ) { // someone else was the first
+            if( check_mask_race( h, m ) )
+                continue;
+            item_accessor.my_lock.release();
+            return false;
+        }
+        __TBB_ASSERT( *p == n, NULL );
+        *p = n->next; // remove from container
+        my_size--;
+        break;
+    } while(true);
+    if( readonly ) // need to get exclusive lock
+        item_accessor.my_lock.upgrade_to_writer(); // return value means nothing here
+    item_accessor.my_lock.release();
+    delete_node( n ); // Only one thread can delete it due to write lock on the chain_mutex
+    return true;
+}
+
+template<typename Key, typename T, typename HashCompare, typename A>
+bool concurrent_hash_map<Key,T,HashCompare,A>::erase( const Key &key ) {
+    node_base *n;
+    hashcode_t const h = my_hash_compare.hash( key );
+    hashcode_t m = my_mask;
+restart:
+    {//lock scope
+        // get bucket
+        bucket_accessor b( this, h & m );
+    search:
+        node_base **p = &b()->node_list;
+        n = *p;
+        while( is_valid(n) && !my_hash_compare.equal(key, static_cast<node*>(n)->item.first ) ) {
+            p = &n->next;
+            n = *p;
+        }
+        if( !n ) { // not found, but mask could be changed
+            if( check_mask_race( h, m ) )
+                goto restart;
+            return false;
+        }
+        else if( !b.is_writer() && !b.upgrade_to_writer() ) {
+            if( check_mask_race( h, m ) ) // contended upgrade, check mask
+                goto restart;
+            goto search;
+        }
+        *p = n->next;
+        my_size--;
+    }
+    {
+        typename node::scoped_t item_locker( n->mutex, /*write=*/true );
+    }
+    // note: there should be no threads pretending to acquire this mutex again, do not try to upgrade const_accessor!
+    delete_node( n ); // Only one thread can delete it due to write lock on the bucket
+    return true;
+}
+
+template<typename Key, typename T, typename HashCompare, typename A>
+void concurrent_hash_map<Key,T,HashCompare,A>::swap(concurrent_hash_map<Key,T,HashCompare,A> &table) {
+    std::swap(this->my_allocator, table.my_allocator);
+    std::swap(this->my_hash_compare, table.my_hash_compare);
+    internal_swap(table);
+}
+
+template<typename Key, typename T, typename HashCompare, typename A>
+void concurrent_hash_map<Key,T,HashCompare,A>::clear() {
+    hashcode_t m = my_mask;
+    __TBB_ASSERT((m&(m+1))==0, NULL);
+#if TBB_USE_DEBUG || TBB_USE_PERFORMANCE_WARNINGS
+#if TBB_USE_PERFORMANCE_WARNINGS
+    int size = int(my_size), buckets = int(m)+1, empty_buckets = 0, overpopulated_buckets = 0; // usage statistics
+    static bool reported = false;
+#endif
+    // check consistency
+    for( segment_index_t b = 0; b <= m; b++ ) {
+        node_base *n = get_bucket(b)->node_list;
+#if TBB_USE_PERFORMANCE_WARNINGS
+        if( n == internal::empty_rehashed ) empty_buckets++;
+        else if( n == internal::rehash_req ) buckets--;
+        else if( n->next ) overpopulated_buckets++;
+#endif
+        for(; is_valid(n); n = n->next ) {
+            hashcode_t h = my_hash_compare.hash( static_cast<node*>(n)->item.first );
+            h &= m;
+            __TBB_ASSERT( h == b || get_bucket(h)->node_list == internal::rehash_req, "Rehashing is not finished until serial stage due to concurrent or unexpectedly terminated operation" );
+        }
+    }
+#if TBB_USE_PERFORMANCE_WARNINGS
+    if( buckets > size) empty_buckets -= buckets - size;
+    else overpopulated_buckets -= size - buckets; // TODO: load_factor?
+    if( !reported && buckets >= 512 && ( 2*empty_buckets >= size || 2*overpopulated_buckets > size ) ) {
+        internal::runtime_warning(
+            "Performance is not optimal because the hash function produces bad randomness in lower bits in %s.\nSize: %d  Empties: %d  Overlaps: %d",
+            typeid(*this).name(), size, empty_buckets, overpopulated_buckets );
+        reported = true;
+    }
+#endif
+#endif//TBB_USE_DEBUG || TBB_USE_PERFORMANCE_WARNINGS
+    my_size = 0;
+    segment_index_t s = segment_index_of( m );
+    __TBB_ASSERT( s+1 == pointers_per_table || !my_table[s+1], "wrong mask or concurrent grow" );
+    cache_aligned_allocator<bucket> alloc;
+    do {
+        __TBB_ASSERT( is_valid( my_table[s] ), "wrong mask or concurrent grow" );
+        segment_ptr_t buckets = my_table[s];
+        size_type sz = segment_size( s ? s : 1 );
+        for( segment_index_t i = 0; i < sz; i++ )
+            for( node_base *n = buckets[i].node_list; is_valid(n); n = buckets[i].node_list ) {
+                buckets[i].node_list = n->next;
+                delete_node( n );
+            }
+        if( s >= first_block) // the first segment or the next
+            alloc.deallocate( buckets, sz );
+        else if( s == embedded_block && embedded_block != first_block )
+            alloc.deallocate( buckets, segment_size(first_block)-embedded_buckets );
+        if( s >= embedded_block ) my_table[s] = 0;
+    } while(s-- > 0);
+    my_mask = embedded_buckets - 1;
+}
+
+template<typename Key, typename T, typename HashCompare, typename A>
+void concurrent_hash_map<Key,T,HashCompare,A>::internal_copy( const concurrent_hash_map& source ) {
+    reserve( source.my_size ); // TODO: load_factor?
+    hashcode_t mask = source.my_mask;
+    if( my_mask == mask ) { // optimized version
+        bucket *dst = 0, *src = 0;
+        for( hashcode_t k = 0; k <= mask; k++ ) {
+            if( k & (k-2) ) ++dst,src++; // not the beginning of a segment
+            else { dst = get_bucket( k ); src = source.get_bucket( k ); }
+            __TBB_ASSERT( dst->node_list != internal::rehash_req, "Invalid bucket in destination table");
+            node *n = static_cast<node*>( src->node_list );
+            if( n == internal::rehash_req ) { // source is not rehashed, items are in previous buckets
+                bucket_accessor b( this, k );
+                rehash_bucket( b(), k ); // TODO: use without synchronization
+            } else for(; n; n = static_cast<node*>( n->next ) ) {
+                add_to_bucket( dst, new( my_allocator ) node(n->item.first, n->item.second) );
+                ++my_size; // TODO: replace by non-atomic op
+            }
+        }
+    } else internal_copy( source.begin(), source.end() );
+}
+
+template<typename Key, typename T, typename HashCompare, typename A>
+template<typename I>
+void concurrent_hash_map<Key,T,HashCompare,A>::internal_copy(I first, I last) {
+    hashcode_t m = my_mask;
+    for(; first != last; ++first) {
+        hashcode_t h = my_hash_compare.hash( first->first );
+        bucket *b = get_bucket( h & m );
+        __TBB_ASSERT( b->node_list != internal::rehash_req, "Invalid bucket in destination table");
+        node *n = new( my_allocator ) node(first->first, first->second);
+        add_to_bucket( b, n );
+        ++my_size; // TODO: replace by non-atomic op
+    }
+}
+
+template<typename Key, typename T, typename HashCompare, typename A1, typename A2>
+inline bool operator==(const concurrent_hash_map<Key, T, HashCompare, A1> &a, const concurrent_hash_map<Key, T, HashCompare, A2> &b) {
+    if(a.size() != b.size()) return false;
+    typename concurrent_hash_map<Key, T, HashCompare, A1>::const_iterator i(a.begin()), i_end(a.end());
+    typename concurrent_hash_map<Key, T, HashCompare, A2>::const_iterator j, j_end(b.end());
+    for(; i != i_end; ++i) {
+        j = b.equal_range(i->first).first;
+        if( j == j_end || !(i->second == j->second) ) return false;
+    }
+    return true;
+}
+
+template<typename Key, typename T, typename HashCompare, typename A1, typename A2>
+inline bool operator!=(const concurrent_hash_map<Key, T, HashCompare, A1> &a, const concurrent_hash_map<Key, T, HashCompare, A2> &b)
+{    return !(a == b); }
+
+template<typename Key, typename T, typename HashCompare, typename A>
+inline void swap(concurrent_hash_map<Key, T, HashCompare, A> &a, concurrent_hash_map<Key, T, HashCompare, A> &b)
+{    a.swap( b ); }
+
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+    #pragma warning( pop )
+#endif // warning 4127 is back
+
+} // namespace tbb
+
+#endif /* __TBB_concurrent_hash_map_H */
diff --git a/dep/tbb/include/tbb/concurrent_queue.h b/dep/tbb/include/tbb/concurrent_queue.h
new file mode 100644
index 000000000..f344a8471
--- /dev/null
+++ b/dep/tbb/include/tbb/concurrent_queue.h
@@ -0,0 +1,409 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_concurrent_queue_H
+#define __TBB_concurrent_queue_H
+
+#include "_concurrent_queue_internal.h"
+
+namespace tbb {
+
+namespace strict_ppl {
+
+//! A high-performance thread-safe non-blocking concurrent queue.
+/** Multiple threads may each push and pop concurrently.
+    Assignment construction is not allowed.
+    @ingroup containers */
+template<typename T, typename A = cache_aligned_allocator<T> > 
+class concurrent_queue: public internal::concurrent_queue_base_v3<T> {
+    template<typename Container, typename Value> friend class internal::concurrent_queue_iterator;
+
+    //! Allocator type
+    typedef typename A::template rebind<char>::other page_allocator_type;
+    page_allocator_type my_allocator;
+
+    //! Allocates a block of size n (bytes)
+    /*overide*/ virtual void *allocate_block( size_t n ) {
+        void *b = reinterpret_cast<void*>(my_allocator.allocate( n ));
+        if( !b ) this->internal_throw_exception(); 
+        return b;
+    }
+
+    //! Returns a block of size n (bytes)
+    /*override*/ virtual void deallocate_block( void *b, size_t n ) {
+        my_allocator.deallocate( reinterpret_cast<char*>(b), n );
+    }
+
+public:
+    //! Element type in the queue.
+    typedef T value_type;
+
+    //! Reference type
+    typedef T& reference;
+
+    //! Const reference type
+    typedef const T& const_reference;
+
+    //! Integral type for representing size of the queue.
+    typedef size_t size_type;
+
+    //! Difference type for iterator
+    typedef ptrdiff_t difference_type;
+
+    //! Allocator type
+    typedef A allocator_type;
+
+    //! Construct empty queue
+    explicit concurrent_queue(const allocator_type& a = allocator_type()) : 
+        internal::concurrent_queue_base_v3<T>( sizeof(T) ), my_allocator( a )
+    {
+    }
+
+    //! [begin,end) constructor
+    template<typename InputIterator>
+    concurrent_queue( InputIterator begin, InputIterator end, const allocator_type& a = allocator_type()) :
+        internal::concurrent_queue_base_v3<T>( sizeof(T) ), my_allocator( a )
+    {
+        for( ; begin != end; ++begin )
+            internal_push(&*begin);
+    }
+    
+    //! Copy constructor
+    concurrent_queue( const concurrent_queue& src, const allocator_type& a = allocator_type()) : 
+        internal::concurrent_queue_base_v3<T>( sizeof(T) ), my_allocator( a )
+    {
+        assign( src );
+    }
+    
+    //! Destroy queue
+    ~concurrent_queue();
+
+    //! Enqueue an item at tail of queue.
+    void push( const T& source ) {
+        internal_push( &source );
+    }
+
+    //! Attempt to dequeue an item from head of queue.
+    /** Does not wait for item to become available.
+        Returns true if successful; false otherwise. */
+    bool try_pop( T& result ) {
+        return internal_try_pop( &result );
+    }
+
+    //! Return the number of items in the queue; thread unsafe
+    size_type unsafe_size() const {return this->internal_size();}
+
+    //! Equivalent to size()==0.
+    bool empty() const {return this->internal_empty();}
+
+    //! Clear the queue. not thread-safe.
+    void clear() ;
+
+    //! Return allocator object
+    allocator_type get_allocator() const { return this->my_allocator; }
+
+    typedef internal::concurrent_queue_iterator<concurrent_queue,T> iterator;
+    typedef internal::concurrent_queue_iterator<concurrent_queue,const T> const_iterator;
+
+    //------------------------------------------------------------------------
+    // The iterators are intended only for debugging.  They are slow and not thread safe.
+    //------------------------------------------------------------------------
+    iterator unsafe_begin() {return iterator(*this);}
+    iterator unsafe_end() {return iterator();}
+    const_iterator unsafe_begin() const {return const_iterator(*this);}
+    const_iterator unsafe_end() const {return const_iterator();}
+} ;
+
+template<typename T, class A>
+concurrent_queue<T,A>::~concurrent_queue() {
+    clear();
+    this->internal_finish_clear();
+}
+
+template<typename T, class A>
+void concurrent_queue<T,A>::clear() {
+    while( !empty() ) {
+        T value;
+        internal_try_pop(&value);
+    }
+}
+
+} // namespace strict_ppl
+    
+//! A high-performance thread-safe blocking concurrent bounded queue.
+/** This is the pre-PPL TBB concurrent queue which supports boundedness and blocking semantics.
+    Note that method names agree with the PPL-style concurrent queue.
+    Multiple threads may each push and pop concurrently.
+    Assignment construction is not allowed.
+    @ingroup containers */
+template<typename T, class A = cache_aligned_allocator<T> >
+class concurrent_bounded_queue: public internal::concurrent_queue_base_v3 {
+    template<typename Container, typename Value> friend class internal::concurrent_queue_iterator;
+
+    //! Allocator type
+    typedef typename A::template rebind<char>::other page_allocator_type;
+    page_allocator_type my_allocator;
+
+    //! Class used to ensure exception-safety of method "pop" 
+    class destroyer: internal::no_copy {
+        T& my_value;
+    public:
+        destroyer( T& value ) : my_value(value) {}
+        ~destroyer() {my_value.~T();}          
+    };
+
+    T& get_ref( page& page, size_t index ) {
+        __TBB_ASSERT( index<items_per_page, NULL );
+        return static_cast<T*>(static_cast<void*>(&page+1))[index];
+    }
+
+    /*override*/ virtual void copy_item( page& dst, size_t index, const void* src ) {
+        new( &get_ref(dst,index) ) T(*static_cast<const T*>(src)); 
+    }
+
+    /*override*/ virtual void copy_page_item( page& dst, size_t dindex, const page& src, size_t sindex ) {
+        new( &get_ref(dst,dindex) ) T( static_cast<const T*>(static_cast<const void*>(&src+1))[sindex] );
+    }
+
+    /*override*/ virtual void assign_and_destroy_item( void* dst, page& src, size_t index ) {
+        T& from = get_ref(src,index);
+        destroyer d(from);
+        *static_cast<T*>(dst) = from;
+    }
+
+    /*overide*/ virtual page *allocate_page() {
+        size_t n = sizeof(page) + items_per_page*item_size;
+        page *p = reinterpret_cast<page*>(my_allocator.allocate( n ));
+        if( !p ) internal_throw_exception(); 
+        return p;
+    }
+
+    /*override*/ virtual void deallocate_page( page *p ) {
+        size_t n = sizeof(page) + items_per_page*item_size;
+        my_allocator.deallocate( reinterpret_cast<char*>(p), n );
+    }
+
+public:
+    //! Element type in the queue.
+    typedef T value_type;
+
+    //! Allocator type
+    typedef A allocator_type;
+
+    //! Reference type
+    typedef T& reference;
+
+    //! Const reference type
+    typedef const T& const_reference;
+
+    //! Integral type for representing size of the queue.
+    /** Notice that the size_type is a signed integral type.
+        This is because the size can be negative if there are pending pops without corresponding pushes. */
+    typedef std::ptrdiff_t size_type;
+
+    //! Difference type for iterator
+    typedef std::ptrdiff_t difference_type;
+
+    //! Construct empty queue
+    explicit concurrent_bounded_queue(const allocator_type& a = allocator_type()) : 
+        concurrent_queue_base_v3( sizeof(T) ), my_allocator( a )
+    {
+    }
+
+    //! Copy constructor
+    concurrent_bounded_queue( const concurrent_bounded_queue& src, const allocator_type& a = allocator_type()) : 
+        concurrent_queue_base_v3( sizeof(T) ), my_allocator( a )
+    {
+        assign( src );
+    }
+
+    //! [begin,end) constructor
+    template<typename InputIterator>
+    concurrent_bounded_queue( InputIterator begin, InputIterator end, const allocator_type& a = allocator_type()) :
+        concurrent_queue_base_v3( sizeof(T) ), my_allocator( a )
+    {
+        for( ; begin != end; ++begin )
+            internal_push_if_not_full(&*begin);
+    }
+
+    //! Destroy queue
+    ~concurrent_bounded_queue();
+
+    //! Enqueue an item at tail of queue.
+    void push( const T& source ) {
+        internal_push( &source );
+    }
+
+    //! Dequeue item from head of queue.
+    /** Block until an item becomes available, and then dequeue it. */
+    void pop( T& destination ) {
+        internal_pop( &destination );
+    }
+
+    //! Enqueue an item at tail of queue if queue is not already full.
+    /** Does not wait for queue to become not full.
+        Returns true if item is pushed; false if queue was already full. */
+    bool try_push( const T& source ) {
+        return internal_push_if_not_full( &source );
+    }
+
+    //! Attempt to dequeue an item from head of queue.
+    /** Does not wait for item to become available.
+        Returns true if successful; false otherwise. */
+    bool try_pop( T& destination ) {
+        return internal_pop_if_present( &destination );
+    }
+
+    //! Return number of pushes minus number of pops.
+    /** Note that the result can be negative if there are pops waiting for the 
+        corresponding pushes.  The result can also exceed capacity() if there 
+        are push operations in flight. */
+    size_type size() const {return internal_size();}
+
+    //! Equivalent to size()<=0.
+    bool empty() const {return internal_empty();}
+
+    //! Maximum number of allowed elements
+    size_type capacity() const {
+        return my_capacity;
+    }
+
+    //! Set the capacity
+    /** Setting the capacity to 0 causes subsequent try_push operations to always fail,
+        and subsequent push operations to block forever. */
+    void set_capacity( size_type capacity ) {
+        internal_set_capacity( capacity, sizeof(T) );
+    }
+
+    //! return allocator object
+    allocator_type get_allocator() const { return this->my_allocator; }
+
+    //! clear the queue. not thread-safe.
+    void clear() ;
+
+    typedef internal::concurrent_queue_iterator<concurrent_bounded_queue,T> iterator;
+    typedef internal::concurrent_queue_iterator<concurrent_bounded_queue,const T> const_iterator;
+
+    //------------------------------------------------------------------------
+    // The iterators are intended only for debugging.  They are slow and not thread safe.
+    //------------------------------------------------------------------------
+    iterator unsafe_begin() {return iterator(*this);}
+    iterator unsafe_end() {return iterator();}
+    const_iterator unsafe_begin() const {return const_iterator(*this);}
+    const_iterator unsafe_end() const {return const_iterator();}
+
+}; 
+
+template<typename T, class A>
+concurrent_bounded_queue<T,A>::~concurrent_bounded_queue() {
+    clear();
+    internal_finish_clear();
+}
+
+template<typename T, class A>
+void concurrent_bounded_queue<T,A>::clear() {
+    while( !empty() ) {
+        T value;
+        internal_pop_if_present(&value);
+    }
+}
+
+namespace deprecated {
+
+//! A high-performance thread-safe blocking concurrent bounded queue.
+/** This is the pre-PPL TBB concurrent queue which support boundedness and blocking semantics.
+    Note that method names agree with the PPL-style concurrent queue.
+    Multiple threads may each push and pop concurrently.
+    Assignment construction is not allowed.
+    @ingroup containers */
+template<typename T, class A = cache_aligned_allocator<T> > 
+class concurrent_queue: public concurrent_bounded_queue<T,A> {
+#if !__TBB_TEMPLATE_FRIENDS_BROKEN
+    template<typename Container, typename Value> friend class internal::concurrent_queue_iterator;
+#endif 
+
+public:
+    //! Construct empty queue
+    explicit concurrent_queue(const A& a = A()) : 
+        concurrent_bounded_queue<T,A>( a )
+    {
+    }
+
+    //! Copy constructor
+    concurrent_queue( const concurrent_queue& src, const A& a = A()) : 
+        concurrent_bounded_queue<T,A>( src, a )
+    {
+    }
+
+    //! [begin,end) constructor
+    template<typename InputIterator>
+    concurrent_queue( InputIterator begin, InputIterator end, const A& a = A()) :
+        concurrent_bounded_queue<T,A>( begin, end, a )
+    {
+    }
+
+    //! Enqueue an item at tail of queue if queue is not already full.
+    /** Does not wait for queue to become not full.
+        Returns true if item is pushed; false if queue was already full. */
+    bool push_if_not_full( const T& source ) {
+        return try_push( source );
+    }
+
+    //! Attempt to dequeue an item from head of queue.
+    /** Does not wait for item to become available.
+        Returns true if successful; false otherwise. 
+        @deprecated Use try_pop()
+        */
+    bool pop_if_present( T& destination ) {
+        return try_pop( destination );
+    }
+
+    typedef typename concurrent_bounded_queue<T,A>::iterator iterator;
+    typedef typename concurrent_bounded_queue<T,A>::const_iterator const_iterator;
+    //
+    //------------------------------------------------------------------------
+    // The iterators are intended only for debugging.  They are slow and not thread safe.
+    //------------------------------------------------------------------------
+    iterator begin() {return this->unsafe_begin();}
+    iterator end() {return this->unsafe_end();}
+    const_iterator begin() const {return this->unsafe_begin();}
+    const_iterator end() const {return this->unsafe_end();}
+}; 
+
+}
+    
+
+#if TBB_DEPRECATED
+using deprecated::concurrent_queue;
+#else
+using strict_ppl::concurrent_queue;    
+#endif
+
+} // namespace tbb
+
+#endif /* __TBB_concurrent_queue_H */
diff --git a/dep/tbb/include/tbb/concurrent_vector.h b/dep/tbb/include/tbb/concurrent_vector.h
new file mode 100644
index 000000000..383c04489
--- /dev/null
+++ b/dep/tbb/include/tbb/concurrent_vector.h
@@ -0,0 +1,1049 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_concurrent_vector_H
+#define __TBB_concurrent_vector_H
+
+#include "tbb_stddef.h"
+#include <algorithm>
+#include <iterator>
+#include <new>
+#include <cstring>
+#include "atomic.h"
+#include "cache_aligned_allocator.h"
+#include "blocked_range.h"
+
+#include "tbb_machine.h"
+
+#if _MSC_VER==1500 && !__INTEL_COMPILER
+    // VS2008/VC9 seems to have an issue; limits pull in math.h
+    #pragma warning( push )
+    #pragma warning( disable: 4985 )
+#endif
+#include <limits> /* std::numeric_limits */
+#if _MSC_VER==1500 && !__INTEL_COMPILER
+    #pragma warning( pop )
+#endif
+
+#if defined(_MSC_VER) && !defined(__INTEL_COMPILER) && defined(_Wp64)
+    // Workaround for overzealous compiler warnings in /Wp64 mode
+    #pragma warning (push)
+    #pragma warning (disable: 4267)
+#endif
+
+namespace tbb {
+
+template<typename T, class A = cache_aligned_allocator<T> >
+class concurrent_vector;
+
+
+//! @cond INTERNAL
+namespace internal {
+
+    //! Bad allocation marker
+    static void *const vector_allocation_error_flag = reinterpret_cast<void*>(size_t(63));
+    //! Routine that loads pointer from location pointed to by src without any fence, without causing ITT to report a race.
+    void* __TBB_EXPORTED_FUNC itt_load_pointer_v3( const void* src );
+
+    //! Base class of concurrent vector implementation.
+    /** @ingroup containers */
+    class concurrent_vector_base_v3 {
+    protected:
+
+        // Basic types declarations
+        typedef size_t segment_index_t;
+        typedef size_t size_type;
+
+        // Using enumerations due to Mac linking problems of static const variables
+        enum {
+            // Size constants
+            default_initial_segments = 1, // 2 initial items
+            //! Number of slots for segment's pointers inside the class
+            pointers_per_short_table = 3, // to fit into 8 words of entire structure
+            pointers_per_long_table = sizeof(segment_index_t) * 8 // one segment per bit
+        };
+
+        // Segment pointer. Can be zero-initialized
+        struct segment_t {
+            void* array;
+#if TBB_USE_ASSERT
+            ~segment_t() {
+                __TBB_ASSERT( array <= internal::vector_allocation_error_flag, "should have been freed by clear" );
+            }
+#endif /* TBB_USE_ASSERT */
+        };
+ 
+        // Data fields
+
+        //! allocator function pointer
+        void* (*vector_allocator_ptr)(concurrent_vector_base_v3 &, size_t);
+
+        //! count of segments in the first block
+        atomic<size_type> my_first_block;
+
+        //! Requested size of vector
+        atomic<size_type> my_early_size;
+
+        //! Pointer to the segments table
+        atomic<segment_t*> my_segment;
+
+        //! embedded storage of segment pointers
+        segment_t my_storage[pointers_per_short_table];
+
+        // Methods
+
+        concurrent_vector_base_v3() {
+            my_early_size = 0;
+            my_first_block = 0; // here is not default_initial_segments
+            for( segment_index_t i = 0; i < pointers_per_short_table; i++)
+                my_storage[i].array = NULL;
+            my_segment = my_storage;
+        }
+        __TBB_EXPORTED_METHOD ~concurrent_vector_base_v3();
+
+        static segment_index_t segment_index_of( size_type index ) {
+            return segment_index_t( __TBB_Log2( index|1 ) );
+        }
+
+        static segment_index_t segment_base( segment_index_t k ) {
+            return (segment_index_t(1)<<k & ~segment_index_t(1));
+        }
+
+        static inline segment_index_t segment_base_index_of( segment_index_t &index ) {
+            segment_index_t k = segment_index_of( index );
+            index -= segment_base(k);
+            return k;
+        }
+
+        static size_type segment_size( segment_index_t k ) {
+            return segment_index_t(1)<<k; // fake value for k==0
+        }
+
+        //! An operation on an n-element array starting at begin.
+        typedef void (__TBB_EXPORTED_FUNC *internal_array_op1)(void* begin, size_type n );
+
+        //! An operation on n-element destination array and n-element source array.
+        typedef void (__TBB_EXPORTED_FUNC *internal_array_op2)(void* dst, const void* src, size_type n );
+
+        //! Internal structure for compact()
+        struct internal_segments_table {
+            segment_index_t first_block;
+            void* table[pointers_per_long_table];
+        };
+
+        void __TBB_EXPORTED_METHOD internal_reserve( size_type n, size_type element_size, size_type max_size );
+        size_type __TBB_EXPORTED_METHOD internal_capacity() const;
+        void internal_grow( size_type start, size_type finish, size_type element_size, internal_array_op2 init, const void *src );
+        size_type __TBB_EXPORTED_METHOD internal_grow_by( size_type delta, size_type element_size, internal_array_op2 init, const void *src );
+        void* __TBB_EXPORTED_METHOD internal_push_back( size_type element_size, size_type& index );
+        segment_index_t __TBB_EXPORTED_METHOD internal_clear( internal_array_op1 destroy );
+        void* __TBB_EXPORTED_METHOD internal_compact( size_type element_size, void *table, internal_array_op1 destroy, internal_array_op2 copy );
+        void __TBB_EXPORTED_METHOD internal_copy( const concurrent_vector_base_v3& src, size_type element_size, internal_array_op2 copy );
+        void __TBB_EXPORTED_METHOD internal_assign( const concurrent_vector_base_v3& src, size_type element_size,
+                              internal_array_op1 destroy, internal_array_op2 assign, internal_array_op2 copy );
+        void __TBB_EXPORTED_METHOD internal_throw_exception(size_type) const;
+        void __TBB_EXPORTED_METHOD internal_swap(concurrent_vector_base_v3& v);
+
+        void __TBB_EXPORTED_METHOD internal_resize( size_type n, size_type element_size, size_type max_size, const void *src,
+                                                    internal_array_op1 destroy, internal_array_op2 init );
+        size_type __TBB_EXPORTED_METHOD internal_grow_to_at_least_with_result( size_type new_size, size_type element_size, internal_array_op2 init, const void *src );
+
+        //! Deprecated entry point for backwards compatibility to TBB 2.1.
+        void __TBB_EXPORTED_METHOD internal_grow_to_at_least( size_type new_size, size_type element_size, internal_array_op2 init, const void *src );
+private:
+        //! Private functionality
+        class helper;
+        friend class helper;
+    };
+    
+    typedef concurrent_vector_base_v3 concurrent_vector_base;
+
+    //! Meets requirements of a forward iterator for STL and a Value for a blocked_range.*/
+    /** Value is either the T or const T type of the container.
+        @ingroup containers */
+    template<typename Container, typename Value>
+    class vector_iterator 
+    {
+        //! concurrent_vector over which we are iterating.
+        Container* my_vector;
+
+        //! Index into the vector 
+        size_t my_index;
+
+        //! Caches my_vector-&gt;internal_subscript(my_index)
+        /** NULL if cached value is not available */
+        mutable Value* my_item;
+
+        template<typename C, typename T>
+        friend vector_iterator<C,T> operator+( ptrdiff_t offset, const vector_iterator<C,T>& v );
+
+        template<typename C, typename T, typename U>
+        friend bool operator==( const vector_iterator<C,T>& i, const vector_iterator<C,U>& j );
+
+        template<typename C, typename T, typename U>
+        friend bool operator<( const vector_iterator<C,T>& i, const vector_iterator<C,U>& j );
+
+        template<typename C, typename T, typename U>
+        friend ptrdiff_t operator-( const vector_iterator<C,T>& i, const vector_iterator<C,U>& j );
+    
+        template<typename C, typename U>
+        friend class internal::vector_iterator;
+
+#if !defined(_MSC_VER) || defined(__INTEL_COMPILER)
+        template<typename T, class A>
+        friend class tbb::concurrent_vector;
+#else
+public: // workaround for MSVC
+#endif 
+
+        vector_iterator( const Container& vector, size_t index, void *ptr = 0 ) : 
+            my_vector(const_cast<Container*>(&vector)), 
+            my_index(index), 
+            my_item(static_cast<Value*>(ptr))
+        {}
+
+    public:
+        //! Default constructor
+        vector_iterator() : my_vector(NULL), my_index(~size_t(0)), my_item(NULL) {}
+
+        vector_iterator( const vector_iterator<Container,typename Container::value_type>& other ) :
+            my_vector(other.my_vector),
+            my_index(other.my_index),
+            my_item(other.my_item)
+        {}
+
+        vector_iterator operator+( ptrdiff_t offset ) const {
+            return vector_iterator( *my_vector, my_index+offset );
+        }
+        vector_iterator &operator+=( ptrdiff_t offset ) {
+            my_index+=offset;
+            my_item = NULL;
+            return *this;
+        }
+        vector_iterator operator-( ptrdiff_t offset ) const {
+            return vector_iterator( *my_vector, my_index-offset );
+        }
+        vector_iterator &operator-=( ptrdiff_t offset ) {
+            my_index-=offset;
+            my_item = NULL;
+            return *this;
+        }
+        Value& operator*() const {
+            Value* item = my_item;
+            if( !item ) {
+                item = my_item = &my_vector->internal_subscript(my_index);
+            }
+            __TBB_ASSERT( item==&my_vector->internal_subscript(my_index), "corrupt cache" );
+            return *item;
+        }
+        Value& operator[]( ptrdiff_t k ) const {
+            return my_vector->internal_subscript(my_index+k);
+        }
+        Value* operator->() const {return &operator*();}
+
+        //! Pre increment
+        vector_iterator& operator++() {
+            size_t k = ++my_index;
+            if( my_item ) {
+                // Following test uses 2's-complement wizardry
+                if( (k& (k-2))==0 ) {
+                    // k is a power of two that is at least k-2
+                    my_item= NULL;
+                } else {
+                    ++my_item;
+                }
+            }
+            return *this;
+        }
+
+        //! Pre decrement
+        vector_iterator& operator--() {
+            __TBB_ASSERT( my_index>0, "operator--() applied to iterator already at beginning of concurrent_vector" ); 
+            size_t k = my_index--;
+            if( my_item ) {
+                // Following test uses 2's-complement wizardry
+                if( (k& (k-2))==0 ) {
+                    // k is a power of two that is at least k-2  
+                    my_item= NULL;
+                } else {
+                    --my_item;
+                }
+            }
+            return *this;
+        }
+
+        //! Post increment
+        vector_iterator operator++(int) {
+            vector_iterator result = *this;
+            operator++();
+            return result;
+        }
+
+        //! Post decrement
+        vector_iterator operator--(int) {
+            vector_iterator result = *this;
+            operator--();
+            return result;
+        }
+
+        // STL support
+
+        typedef ptrdiff_t difference_type;
+        typedef Value value_type;
+        typedef Value* pointer;
+        typedef Value& reference;
+        typedef std::random_access_iterator_tag iterator_category;
+    };
+
+    template<typename Container, typename T>
+    vector_iterator<Container,T> operator+( ptrdiff_t offset, const vector_iterator<Container,T>& v ) {
+        return vector_iterator<Container,T>( *v.my_vector, v.my_index+offset );
+    }
+
+    template<typename Container, typename T, typename U>
+    bool operator==( const vector_iterator<Container,T>& i, const vector_iterator<Container,U>& j ) {
+        return i.my_index==j.my_index && i.my_vector == j.my_vector;
+    }
+
+    template<typename Container, typename T, typename U>
+    bool operator!=( const vector_iterator<Container,T>& i, const vector_iterator<Container,U>& j ) {
+        return !(i==j);
+    }
+
+    template<typename Container, typename T, typename U>
+    bool operator<( const vector_iterator<Container,T>& i, const vector_iterator<Container,U>& j ) {
+        return i.my_index<j.my_index;
+    }
+
+    template<typename Container, typename T, typename U>
+    bool operator>( const vector_iterator<Container,T>& i, const vector_iterator<Container,U>& j ) {
+        return j<i;
+    }
+
+    template<typename Container, typename T, typename U>
+    bool operator>=( const vector_iterator<Container,T>& i, const vector_iterator<Container,U>& j ) {
+        return !(i<j);
+    }
+
+    template<typename Container, typename T, typename U>
+    bool operator<=( const vector_iterator<Container,T>& i, const vector_iterator<Container,U>& j ) {
+        return !(j<i);
+    }
+
+    template<typename Container, typename T, typename U>
+    ptrdiff_t operator-( const vector_iterator<Container,T>& i, const vector_iterator<Container,U>& j ) {
+        return ptrdiff_t(i.my_index)-ptrdiff_t(j.my_index);
+    }
+
+    template<typename T, class A>
+    class allocator_base {
+    public:
+        typedef typename A::template
+            rebind<T>::other allocator_type;
+        allocator_type my_allocator;
+
+        allocator_base(const allocator_type &a = allocator_type() ) : my_allocator(a) {}
+    };
+
+} // namespace internal
+//! @endcond
+
+//! Concurrent vector container
+/** concurrent_vector is a container having the following main properties:
+    - It provides random indexed access to its elements. The index of the first element is 0.
+    - It ensures safe concurrent growing its size (different threads can safely append new elements).
+    - Adding new elements does not invalidate existing iterators and does not change indices of existing items.
+
+@par Compatibility
+    The class meets all Container Requirements and Reversible Container Requirements from
+    C++ Standard (See ISO/IEC 14882:2003(E), clause 23.1). But it doesn't meet
+    Sequence Requirements due to absence of insert() and erase() methods.
+
+@par Exception Safety
+    Methods working with memory allocation and/or new elements construction can throw an
+    exception if allocator fails to allocate memory or element's default constructor throws one.
+    Concurrent vector's element of type T must conform to the following requirements:
+    - Throwing an exception is forbidden for destructor of T.
+    - Default constructor of T must not throw an exception OR its non-virtual destructor must safely work when its object memory is zero-initialized.
+    .
+    Otherwise, the program's behavior is undefined.
+@par
+    If an exception happens inside growth or assignment operation, an instance of the vector becomes invalid unless it is stated otherwise in the method documentation.
+    Invalid state means:
+    - There are no guaranties that all items were initialized by a constructor. The rest of items is zero-filled, including item where exception happens.
+    - An invalid vector instance cannot be repaired; it is unable to grow anymore.
+    - Size and capacity reported by the vector are incorrect, and calculated as if the failed operation were successful.
+    - Attempt to access not allocated elements using operator[] or iterators results in access violation or segmentation fault exception, and in case of using at() method a C++ exception is thrown.
+    .
+    If a concurrent grow operation successfully completes, all the elements it has added to the vector will remain valid and accessible even if one of subsequent grow operations fails.
+
+@par Fragmentation
+    Unlike an STL vector, a concurrent_vector does not move existing elements if it needs
+    to allocate more memory. The container is divided into a series of contiguous arrays of
+    elements. The first reservation, growth, or assignment operation determines the size of
+    the first array. Using small number of elements as initial size incurs fragmentation that
+    may increase element access time. Internal layout can be optimized by method compact() that
+    merges several smaller arrays into one solid.
+
+@par Changes since TBB 2.1
+    - Fixed guarantees of concurrent_vector::size() and grow_to_at_least() methods to assure elements are allocated.
+    - Methods end()/rbegin()/back() are partly thread-safe since they use size() to get the end of vector
+    - Added resize() methods (not thread-safe)
+    - Added cbegin/cend/crbegin/crend methods
+    - Changed return type of methods grow* and push_back to iterator
+
+@par Changes since TBB 2.0
+    - Implemented exception-safety guaranties
+    - Added template argument for allocator
+    - Added allocator argument in constructors
+    - Faster index calculation
+    - First growth call specifies a number of segments to be merged in the first allocation.
+    - Fixed memory blow up for swarm of vector's instances of small size
+    - Added grow_by(size_type n, const_reference t) growth using copying constructor to init new items. 
+    - Added STL-like constructors.
+    - Added operators ==, < and derivatives
+    - Added at() method, approved for using after an exception was thrown inside the vector
+    - Added get_allocator() method.
+    - Added assign() methods
+    - Added compact() method to defragment first segments
+    - Added swap() method
+    - range() defaults on grainsize = 1 supporting auto grainsize algorithms. 
+
+    @ingroup containers */
+template<typename T, class A>
+class concurrent_vector: protected internal::allocator_base<T, A>,
+                         private internal::concurrent_vector_base {
+private:
+    template<typename I>
+    class generic_range_type: public blocked_range<I> {
+    public:
+        typedef T value_type;
+        typedef T& reference;
+        typedef const T& const_reference;
+        typedef I iterator;
+        typedef ptrdiff_t difference_type;
+        generic_range_type( I begin_, I end_, size_t grainsize = 1) : blocked_range<I>(begin_,end_,grainsize) {} 
+        template<typename U>
+        generic_range_type( const generic_range_type<U>& r) : blocked_range<I>(r.begin(),r.end(),r.grainsize()) {} 
+        generic_range_type( generic_range_type& r, split ) : blocked_range<I>(r,split()) {}
+    };
+
+    template<typename C, typename U>
+    friend class internal::vector_iterator;
+public:
+    //------------------------------------------------------------------------
+    // STL compatible types
+    //------------------------------------------------------------------------
+    typedef internal::concurrent_vector_base_v3::size_type size_type;
+    typedef typename internal::allocator_base<T, A>::allocator_type allocator_type;
+
+    typedef T value_type;
+    typedef ptrdiff_t difference_type;
+    typedef T& reference;
+    typedef const T& const_reference;
+    typedef T *pointer;
+    typedef const T *const_pointer;
+
+    typedef internal::vector_iterator<concurrent_vector,T> iterator;
+    typedef internal::vector_iterator<concurrent_vector,const T> const_iterator;
+
+#if !defined(_MSC_VER) || _CPPLIB_VER>=300 
+    // Assume ISO standard definition of std::reverse_iterator
+    typedef std::reverse_iterator<iterator> reverse_iterator;
+    typedef std::reverse_iterator<const_iterator> const_reverse_iterator;
+#else
+    // Use non-standard std::reverse_iterator
+    typedef std::reverse_iterator<iterator,T,T&,T*> reverse_iterator;
+    typedef std::reverse_iterator<const_iterator,T,const T&,const T*> const_reverse_iterator;
+#endif /* defined(_MSC_VER) && (_MSC_VER<1300) */
+
+    //------------------------------------------------------------------------
+    // Parallel algorithm support
+    //------------------------------------------------------------------------
+    typedef generic_range_type<iterator> range_type;
+    typedef generic_range_type<const_iterator> const_range_type;
+
+    //------------------------------------------------------------------------
+    // STL compatible constructors & destructors
+    //------------------------------------------------------------------------
+
+    //! Construct empty vector.
+    explicit concurrent_vector(const allocator_type &a = allocator_type())
+        : internal::allocator_base<T, A>(a)
+    {
+        vector_allocator_ptr = &internal_allocator;
+    }
+
+    //! Copying constructor
+    concurrent_vector( const concurrent_vector& vector, const allocator_type& a = allocator_type() )
+        : internal::allocator_base<T, A>(a)
+    {
+        vector_allocator_ptr = &internal_allocator;
+        try {
+            internal_copy(vector, sizeof(T), &copy_array);
+        } catch(...) {
+            segment_t *table = my_segment;
+            internal_free_segments( reinterpret_cast<void**>(table), internal_clear(&destroy_array), my_first_block );
+            throw;
+        }
+    }
+
+    //! Copying constructor for vector with different allocator type
+    template<class M>
+    concurrent_vector( const concurrent_vector<T, M>& vector, const allocator_type& a = allocator_type() )
+        : internal::allocator_base<T, A>(a)
+    {
+        vector_allocator_ptr = &internal_allocator;
+        try {
+            internal_copy(vector.internal_vector_base(), sizeof(T), &copy_array);
+        } catch(...) {
+            segment_t *table = my_segment;
+            internal_free_segments( reinterpret_cast<void**>(table), internal_clear(&destroy_array), my_first_block );
+            throw;
+        }
+    }
+
+    //! Construction with initial size specified by argument n
+    explicit concurrent_vector(size_type n)
+    {
+        vector_allocator_ptr = &internal_allocator;
+        try {
+            internal_resize( n, sizeof(T), max_size(), NULL, &destroy_array, &initialize_array );
+        } catch(...) {
+            segment_t *table = my_segment;
+            internal_free_segments( reinterpret_cast<void**>(table), internal_clear(&destroy_array), my_first_block );
+            throw;
+        }
+    }
+
+    //! Construction with initial size specified by argument n, initialization by copying of t, and given allocator instance
+    concurrent_vector(size_type n, const_reference t, const allocator_type& a = allocator_type())
+        : internal::allocator_base<T, A>(a)
+    {
+        vector_allocator_ptr = &internal_allocator;
+        try {
+            internal_resize( n, sizeof(T), max_size(), static_cast<const void*>(&t), &destroy_array, &initialize_array_by );
+        } catch(...) {
+            segment_t *table = my_segment;
+            internal_free_segments( reinterpret_cast<void**>(table), internal_clear(&destroy_array), my_first_block );
+            throw;
+        }
+    }
+
+    //! Construction with copying iteration range and given allocator instance
+    template<class I>
+    concurrent_vector(I first, I last, const allocator_type &a = allocator_type())
+        : internal::allocator_base<T, A>(a)
+    {
+        vector_allocator_ptr = &internal_allocator;
+        try {
+            internal_assign_range(first, last, static_cast<is_integer_tag<std::numeric_limits<I>::is_integer> *>(0) );
+        } catch(...) {
+            segment_t *table = my_segment;
+            internal_free_segments( reinterpret_cast<void**>(table), internal_clear(&destroy_array), my_first_block );
+            throw;
+        }
+    }
+
+    //! Assignment
+    concurrent_vector& operator=( const concurrent_vector& vector ) {
+        if( this != &vector )
+            internal_assign(vector, sizeof(T), &destroy_array, &assign_array, &copy_array);
+        return *this;
+    }
+
+    //! Assignment for vector with different allocator type
+    template<class M>
+    concurrent_vector& operator=( const concurrent_vector<T, M>& vector ) {
+        if( static_cast<void*>( this ) != static_cast<const void*>( &vector ) )
+            internal_assign(vector.internal_vector_base(),
+                sizeof(T), &destroy_array, &assign_array, &copy_array);
+        return *this;
+    }
+
+    //------------------------------------------------------------------------
+    // Concurrent operations
+    //------------------------------------------------------------------------
+    //! Grow by "delta" elements.
+#if TBB_DEPRECATED
+    /** Returns old size. */
+    size_type grow_by( size_type delta ) {
+        return delta ? internal_grow_by( delta, sizeof(T), &initialize_array, NULL ) : my_early_size;
+    }
+#else
+    /** Returns iterator pointing to the first new element. */
+    iterator grow_by( size_type delta ) {
+        return iterator(*this, delta ? internal_grow_by( delta, sizeof(T), &initialize_array, NULL ) : my_early_size);
+    }
+#endif
+
+    //! Grow by "delta" elements using copying constuctor.
+#if TBB_DEPRECATED
+    /** Returns old size. */
+    size_type grow_by( size_type delta, const_reference t ) {
+        return delta ? internal_grow_by( delta, sizeof(T), &initialize_array_by, static_cast<const void*>(&t) ) : my_early_size;
+    }
+#else
+    /** Returns iterator pointing to the first new element. */
+    iterator grow_by( size_type delta, const_reference t ) {
+        return iterator(*this, delta ? internal_grow_by( delta, sizeof(T), &initialize_array_by, static_cast<const void*>(&t) ) : my_early_size);
+    }
+#endif
+
+    //! Append minimal sequence of elements such that size()>=n.  
+#if TBB_DEPRECATED
+    /** The new elements are default constructed.  Blocks until all elements in range [0..n) are allocated.
+        May return while other elements are being constructed by other threads. */
+    void grow_to_at_least( size_type n ) {
+        if( n ) internal_grow_to_at_least_with_result( n, sizeof(T), &initialize_array, NULL );
+    };
+#else
+    /** The new elements are default constructed.  Blocks until all elements in range [0..n) are allocated.
+        May return while other elements are being constructed by other threads.
+        Returns iterator that points to beginning of appended sequence.
+        If no elements were appended, returns iterator pointing to nth element. */
+    iterator grow_to_at_least( size_type n ) {
+        size_type m=0;
+        if( n ) {
+            m = internal_grow_to_at_least_with_result( n, sizeof(T), &initialize_array, NULL );
+            if( m>n ) m=n;
+        }
+        return iterator(*this, m);
+    };
+#endif
+
+    //! Push item 
+#if TBB_DEPRECATED
+    size_type push_back( const_reference item )
+#else
+    /** Returns iterator pointing to the new element. */
+    iterator push_back( const_reference item )
+#endif
+    {
+        size_type k;
+        void *ptr = internal_push_back(sizeof(T),k);
+        internal_loop_guide loop(1, ptr);
+        loop.init(&item);
+#if TBB_DEPRECATED
+        return k;
+#else
+        return iterator(*this, k, ptr);
+#endif
+    }
+
+    //! Get reference to element at given index.
+    /** This method is thread-safe for concurrent reads, and also while growing the vector,
+        as long as the calling thread has checked that index&lt;size(). */
+    reference operator[]( size_type index ) {
+        return internal_subscript(index);
+    }
+
+    //! Get const reference to element at given index.
+    const_reference operator[]( size_type index ) const {
+        return internal_subscript(index);
+    }
+
+    //! Get reference to element at given index. Throws exceptions on errors.
+    reference at( size_type index ) {
+        return internal_subscript_with_exceptions(index);
+    }
+
+    //! Get const reference to element at given index. Throws exceptions on errors.
+    const_reference at( size_type index ) const {
+        return internal_subscript_with_exceptions(index);
+    }
+
+    //! Get range for iterating with parallel algorithms
+    range_type range( size_t grainsize = 1) {
+        return range_type( begin(), end(), grainsize );
+    }
+
+    //! Get const range for iterating with parallel algorithms
+    const_range_type range( size_t grainsize = 1 ) const {
+        return const_range_type( begin(), end(), grainsize );
+    }
+    //------------------------------------------------------------------------
+    // Capacity
+    //------------------------------------------------------------------------
+    //! Return size of vector. It may include elements under construction
+    size_type size() const {
+        size_type sz = my_early_size, cp = internal_capacity();
+        return cp < sz ? cp : sz;
+    }
+
+    //! Return true if vector is not empty or has elements under construction at least.
+    bool empty() const {return !my_early_size;}
+
+    //! Maximum size to which array can grow without allocating more memory. Concurrent allocations are not included in the value.
+    size_type capacity() const {return internal_capacity();}
+
+    //! Allocate enough space to grow to size n without having to allocate more memory later.
+    /** Like most of the methods provided for STL compatibility, this method is *not* thread safe. 
+        The capacity afterwards may be bigger than the requested reservation. */
+    void reserve( size_type n ) {
+        if( n )
+            internal_reserve(n, sizeof(T), max_size());
+    }
+
+    //! Resize the vector. Not thread-safe.
+    void resize( size_type n ) {
+        internal_resize( n, sizeof(T), max_size(), NULL, &destroy_array, &initialize_array );
+    }
+    
+    //! Resize the vector, copy t for new elements. Not thread-safe.
+    void resize( size_type n, const_reference t ) {
+        internal_resize( n, sizeof(T), max_size(), static_cast<const void*>(&t), &destroy_array, &initialize_array_by );
+    }
+   
+#if TBB_DEPRECATED 
+    //! An alias for shrink_to_fit()
+    void compact() {shrink_to_fit();}
+#endif /* TBB_DEPRECATED */
+
+    //! Optimize memory usage and fragmentation.
+    void shrink_to_fit();
+
+    //! Upper bound on argument to reserve.
+    size_type max_size() const {return (~size_type(0))/sizeof(T);}
+
+    //------------------------------------------------------------------------
+    // STL support
+    //------------------------------------------------------------------------
+
+    //! start iterator
+    iterator begin() {return iterator(*this,0);}
+    //! end iterator
+    iterator end() {return iterator(*this,size());}
+    //! start const iterator
+    const_iterator begin() const {return const_iterator(*this,0);}
+    //! end const iterator
+    const_iterator end() const {return const_iterator(*this,size());}
+    //! start const iterator
+    const_iterator cbegin() const {return const_iterator(*this,0);}
+    //! end const iterator
+    const_iterator cend() const {return const_iterator(*this,size());}
+    //! reverse start iterator
+    reverse_iterator rbegin() {return reverse_iterator(end());}
+    //! reverse end iterator
+    reverse_iterator rend() {return reverse_iterator(begin());}
+    //! reverse start const iterator
+    const_reverse_iterator rbegin() const {return const_reverse_iterator(end());}
+    //! reverse end const iterator
+    const_reverse_iterator rend() const {return const_reverse_iterator(begin());}
+    //! reverse start const iterator
+    const_reverse_iterator crbegin() const {return const_reverse_iterator(end());}
+    //! reverse end const iterator
+    const_reverse_iterator crend() const {return const_reverse_iterator(begin());}
+    //! the first item
+    reference front() {
+        __TBB_ASSERT( size()>0, NULL);
+        return static_cast<T*>(my_segment[0].array)[0];
+    }
+    //! the first item const
+    const_reference front() const {
+        __TBB_ASSERT( size()>0, NULL);
+        return static_cast<const T*>(my_segment[0].array)[0];
+    }
+    //! the last item
+    reference back() {
+        __TBB_ASSERT( size()>0, NULL);
+        return internal_subscript( size()-1 );
+    }
+    //! the last item const
+    const_reference back() const {
+        __TBB_ASSERT( size()>0, NULL);
+        return internal_subscript( size()-1 );
+    }
+    //! return allocator object
+    allocator_type get_allocator() const { return this->my_allocator; }
+
+    //! assign n items by copying t item
+    void assign(size_type n, const_reference t) {
+        clear();
+        internal_resize( n, sizeof(T), max_size(), static_cast<const void*>(&t), &destroy_array, &initialize_array_by );
+    }
+
+    //! assign range [first, last)
+    template<class I>
+    void assign(I first, I last) {
+        clear(); internal_assign_range( first, last, static_cast<is_integer_tag<std::numeric_limits<I>::is_integer> *>(0) );
+    }
+
+    //! swap two instances
+    void swap(concurrent_vector &vector) {
+        if( this != &vector ) {
+            concurrent_vector_base_v3::internal_swap(static_cast<concurrent_vector_base_v3&>(vector));
+            std::swap(this->my_allocator, vector.my_allocator);
+        }
+    }
+
+    //! Clear container while keeping memory allocated.
+    /** To free up the memory, use in conjunction with method compact(). Not thread safe **/
+    void clear() {
+        internal_clear(&destroy_array);
+    }
+
+    //! Clear and destroy vector.
+    ~concurrent_vector() {
+        segment_t *table = my_segment;
+        internal_free_segments( reinterpret_cast<void**>(table), internal_clear(&destroy_array), my_first_block );
+        // base class destructor call should be then
+    }
+
+    const internal::concurrent_vector_base_v3 &internal_vector_base() const { return *this; }
+private:
+    //! Allocate k items
+    static void *internal_allocator(internal::concurrent_vector_base_v3 &vb, size_t k) {
+        return static_cast<concurrent_vector<T, A>&>(vb).my_allocator.allocate(k);
+    }
+    //! Free k segments from table
+    void internal_free_segments(void *table[], segment_index_t k, segment_index_t first_block);
+
+    //! Get reference to element at given index.
+    T& internal_subscript( size_type index ) const;
+
+    //! Get reference to element at given index with errors checks
+    T& internal_subscript_with_exceptions( size_type index ) const;
+
+    //! assign n items by copying t
+    void internal_assign_n(size_type n, const_pointer p) {
+        internal_resize( n, sizeof(T), max_size(), static_cast<const void*>(p), &destroy_array, p? &initialize_array_by : &initialize_array );
+    }
+
+    //! helper class
+    template<bool B> class is_integer_tag;
+
+    //! assign integer items by copying when arguments are treated as iterators. See C++ Standard 2003 23.1.1p9
+    template<class I>
+    void internal_assign_range(I first, I last, is_integer_tag<true> *) {
+        internal_assign_n(static_cast<size_type>(first), &static_cast<T&>(last));
+    }
+    //! inline proxy assign by iterators
+    template<class I>
+    void internal_assign_range(I first, I last, is_integer_tag<false> *) {
+        internal_assign_iterators(first, last);
+    }
+    //! assign by iterators
+    template<class I>
+    void internal_assign_iterators(I first, I last);
+
+    //! Construct n instances of T, starting at "begin".
+    static void __TBB_EXPORTED_FUNC initialize_array( void* begin, const void*, size_type n );
+
+    //! Construct n instances of T, starting at "begin".
+    static void __TBB_EXPORTED_FUNC initialize_array_by( void* begin, const void* src, size_type n );
+
+    //! Construct n instances of T, starting at "begin".
+    static void __TBB_EXPORTED_FUNC copy_array( void* dst, const void* src, size_type n );
+
+    //! Assign n instances of T, starting at "begin".
+    static void __TBB_EXPORTED_FUNC assign_array( void* dst, const void* src, size_type n );
+
+    //! Destroy n instances of T, starting at "begin".
+    static void __TBB_EXPORTED_FUNC destroy_array( void* begin, size_type n );
+
+    //! Exception-aware helper class for filling a segment by exception-danger operators of user class
+    class internal_loop_guide : internal::no_copy {
+    public:
+        const pointer array;
+        const size_type n;
+        size_type i;
+        internal_loop_guide(size_type ntrials, void *ptr)
+            : array(static_cast<pointer>(ptr)), n(ntrials), i(0) {}
+        void init() {   for(; i < n; ++i) new( &array[i] ) T(); }
+        void init(const void *src) { for(; i < n; ++i) new( &array[i] ) T(*static_cast<const T*>(src)); }
+        void copy(const void *src) { for(; i < n; ++i) new( &array[i] ) T(static_cast<const T*>(src)[i]); }
+        void assign(const void *src) { for(; i < n; ++i) array[i] = static_cast<const T*>(src)[i]; }
+        template<class I> void iterate(I &src) { for(; i < n; ++i, ++src) new( &array[i] ) T( *src ); }
+        ~internal_loop_guide() {
+            if(i < n) // if exception raised, do zerroing on the rest of items
+                std::memset(array+i, 0, (n-i)*sizeof(value_type));
+        }
+    };
+};
+
+template<typename T, class A>
+void concurrent_vector<T, A>::shrink_to_fit() {
+    internal_segments_table old;
+    try {
+        if( internal_compact( sizeof(T), &old, &destroy_array, &copy_array ) )
+            internal_free_segments( old.table, pointers_per_long_table, old.first_block ); // free joined and unnecessary segments
+    } catch(...) {
+        if( old.first_block ) // free segment allocated for compacting. Only for support of exceptions in ctor of user T[ype]
+            internal_free_segments( old.table, 1, old.first_block );
+        throw;
+    }
+}
+
+template<typename T, class A>
+void concurrent_vector<T, A>::internal_free_segments(void *table[], segment_index_t k, segment_index_t first_block) {
+    // Free the arrays
+    while( k > first_block ) {
+        --k;
+        T* array = static_cast<T*>(table[k]);
+        table[k] = NULL;
+        if( array > internal::vector_allocation_error_flag ) // check for correct segment pointer
+            this->my_allocator.deallocate( array, segment_size(k) );
+    }
+    T* array = static_cast<T*>(table[0]);
+    if( array > internal::vector_allocation_error_flag ) {
+        __TBB_ASSERT( first_block > 0, NULL );
+        while(k > 0) table[--k] = NULL;
+        this->my_allocator.deallocate( array, segment_size(first_block) );
+    }
+}
+
+template<typename T, class A>
+T& concurrent_vector<T, A>::internal_subscript( size_type index ) const {
+    __TBB_ASSERT( index < my_early_size, "index out of bounds" );
+    size_type j = index;
+    segment_index_t k = segment_base_index_of( j );
+    __TBB_ASSERT( my_segment != (segment_t*)my_storage || k < pointers_per_short_table, "index is being allocated" );
+    // no need in __TBB_load_with_acquire since thread works in own space or gets 
+#if TBB_USE_THREADING_TOOLS
+    T* array = static_cast<T*>( tbb::internal::itt_load_pointer_v3(&my_segment[k].array));
+#else
+    T* array = static_cast<T*>(my_segment[k].array);
+#endif /* TBB_USE_THREADING_TOOLS */
+    __TBB_ASSERT( array != internal::vector_allocation_error_flag, "the instance is broken by bad allocation. Use at() instead" );
+    __TBB_ASSERT( array, "index is being allocated" );
+    return array[j];
+}
+
+template<typename T, class A>
+T& concurrent_vector<T, A>::internal_subscript_with_exceptions( size_type index ) const {
+    if( index >= my_early_size )
+        internal_throw_exception(0); // throw std::out_of_range
+    size_type j = index;
+    segment_index_t k = segment_base_index_of( j );
+    if( my_segment == (segment_t*)my_storage && k >= pointers_per_short_table )
+        internal_throw_exception(1); // throw std::range_error
+    void *array = my_segment[k].array; // no need in __TBB_load_with_acquire
+    if( array <= internal::vector_allocation_error_flag ) // check for correct segment pointer
+        internal_throw_exception(2); // throw std::range_error
+    return static_cast<T*>(array)[j];
+}
+
+template<typename T, class A> template<class I>
+void concurrent_vector<T, A>::internal_assign_iterators(I first, I last) {
+    __TBB_ASSERT(my_early_size == 0, NULL);
+    size_type n = std::distance(first, last);
+    if( !n ) return;
+    internal_reserve(n, sizeof(T), max_size());
+    my_early_size = n;
+    segment_index_t k = 0;
+    size_type sz = segment_size( my_first_block );
+    while( sz < n ) {
+        internal_loop_guide loop(sz, my_segment[k].array);
+        loop.iterate(first);
+        n -= sz;
+        if( !k ) k = my_first_block;
+        else { ++k; sz <<= 1; }
+    }
+    internal_loop_guide loop(n, my_segment[k].array);
+    loop.iterate(first);
+}
+
+template<typename T, class A>
+void concurrent_vector<T, A>::initialize_array( void* begin, const void *, size_type n ) {
+    internal_loop_guide loop(n, begin); loop.init();
+}
+
+template<typename T, class A>
+void concurrent_vector<T, A>::initialize_array_by( void* begin, const void *src, size_type n ) {
+    internal_loop_guide loop(n, begin); loop.init(src);
+}
+
+template<typename T, class A>
+void concurrent_vector<T, A>::copy_array( void* dst, const void* src, size_type n ) {
+    internal_loop_guide loop(n, dst); loop.copy(src);
+}
+
+template<typename T, class A>
+void concurrent_vector<T, A>::assign_array( void* dst, const void* src, size_type n ) {
+    internal_loop_guide loop(n, dst); loop.assign(src);
+}
+
+#if defined(_MSC_VER) && !defined(__INTEL_COMPILER) 
+    // Workaround for overzealous compiler warning
+    #pragma warning (push)
+    #pragma warning (disable: 4189)
+#endif
+template<typename T, class A>
+void concurrent_vector<T, A>::destroy_array( void* begin, size_type n ) {
+    T* array = static_cast<T*>(begin);
+    for( size_type j=n; j>0; --j )
+        array[j-1].~T(); // destructors are supposed to not throw any exceptions
+}
+#if defined(_MSC_VER) && !defined(__INTEL_COMPILER) 
+    #pragma warning (pop)
+#endif // warning 4189 is back 
+
+// concurrent_vector's template functions
+template<typename T, class A1, class A2>
+inline bool operator==(const concurrent_vector<T, A1> &a, const concurrent_vector<T, A2> &b) {
+    // Simply:    return a.size() == b.size() && std::equal(a.begin(), a.end(), b.begin());
+    if(a.size() != b.size()) return false;
+    typename concurrent_vector<T, A1>::const_iterator i(a.begin());
+    typename concurrent_vector<T, A2>::const_iterator j(b.begin());
+    for(; i != a.end(); ++i, ++j)
+        if( !(*i == *j) ) return false;
+    return true;
+}
+
+template<typename T, class A1, class A2>
+inline bool operator!=(const concurrent_vector<T, A1> &a, const concurrent_vector<T, A2> &b)
+{    return !(a == b); }
+
+template<typename T, class A1, class A2>
+inline bool operator<(const concurrent_vector<T, A1> &a, const concurrent_vector<T, A2> &b)
+{    return (std::lexicographical_compare(a.begin(), a.end(), b.begin(), b.end())); }
+
+template<typename T, class A1, class A2>
+inline bool operator>(const concurrent_vector<T, A1> &a, const concurrent_vector<T, A2> &b)
+{    return b < a; }
+
+template<typename T, class A1, class A2>
+inline bool operator<=(const concurrent_vector<T, A1> &a, const concurrent_vector<T, A2> &b)
+{    return !(b < a); }
+
+template<typename T, class A1, class A2>
+inline bool operator>=(const concurrent_vector<T, A1> &a, const concurrent_vector<T, A2> &b)
+{    return !(a < b); }
+
+template<typename T, class A>
+inline void swap(concurrent_vector<T, A> &a, concurrent_vector<T, A> &b)
+{    a.swap( b ); }
+
+} // namespace tbb
+
+#if defined(_MSC_VER) && !defined(__INTEL_COMPILER) && defined(_Wp64)
+    #pragma warning (pop)
+#endif // warning 4267 is back
+
+#endif /* __TBB_concurrent_vector_H */
diff --git a/dep/tbb/include/tbb/enumerable_thread_specific.h b/dep/tbb/include/tbb/enumerable_thread_specific.h
new file mode 100644
index 000000000..123a62f00
--- /dev/null
+++ b/dep/tbb/include/tbb/enumerable_thread_specific.h
@@ -0,0 +1,880 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_enumerable_thread_specific_H
+#define __TBB_enumerable_thread_specific_H
+
+#include "concurrent_vector.h"
+#include "tbb_thread.h"
+#include "concurrent_hash_map.h"
+#include "cache_aligned_allocator.h"
+#if __SUNPRO_CC
+#include <string.h>  // for memcpy
+#endif
+
+#if _WIN32||_WIN64
+#include <windows.h>
+#else
+#include <pthread.h>
+#endif
+
+namespace tbb {
+
+    //! enum for selecting between single key and key-per-instance versions
+    enum ets_key_usage_type { ets_key_per_instance, ets_no_key };
+
+    //! @cond
+    namespace internal {
+        
+        //! Random access iterator for traversing the thread local copies.
+        template< typename Container, typename Value >
+        class enumerable_thread_specific_iterator 
+#if defined(_WIN64) && defined(_MSC_VER) 
+            // Ensure that Microsoft's internal template function _Val_type works correctly.
+            : public std::iterator<std::random_access_iterator_tag,Value>
+#endif /* defined(_WIN64) && defined(_MSC_VER) */
+        {
+            //! current position in the concurrent_vector 
+        
+            Container *my_container;
+            typename Container::size_type my_index;
+            mutable Value *my_value;
+        
+            template<typename C, typename T>
+            friend enumerable_thread_specific_iterator<C,T> operator+( ptrdiff_t offset, 
+                                                                       const enumerable_thread_specific_iterator<C,T>& v );
+        
+            template<typename C, typename T, typename U>
+            friend bool operator==( const enumerable_thread_specific_iterator<C,T>& i, 
+                                    const enumerable_thread_specific_iterator<C,U>& j );
+        
+            template<typename C, typename T, typename U>
+            friend bool operator<( const enumerable_thread_specific_iterator<C,T>& i, 
+                                   const enumerable_thread_specific_iterator<C,U>& j );
+        
+            template<typename C, typename T, typename U>
+            friend ptrdiff_t operator-( const enumerable_thread_specific_iterator<C,T>& i, const enumerable_thread_specific_iterator<C,U>& j );
+            
+            template<typename C, typename U> 
+            friend class enumerable_thread_specific_iterator;
+        
+            public:
+        
+            enumerable_thread_specific_iterator( const Container &container, typename Container::size_type index ) : 
+                my_container(&const_cast<Container &>(container)), my_index(index), my_value(NULL) {}
+        
+            //! Default constructor
+            enumerable_thread_specific_iterator() : my_container(NULL), my_index(0), my_value(NULL) {}
+        
+            template<typename U>
+            enumerable_thread_specific_iterator( const enumerable_thread_specific_iterator<Container, U>& other ) :
+                    my_container( other.my_container ), my_index( other.my_index), my_value( const_cast<Value *>(other.my_value) ) {}
+        
+            enumerable_thread_specific_iterator operator+( ptrdiff_t offset ) const {
+                return enumerable_thread_specific_iterator(*my_container, my_index + offset);
+            }
+        
+            enumerable_thread_specific_iterator &operator+=( ptrdiff_t offset ) {
+                my_index += offset;
+                my_value = NULL;
+                return *this;
+            }
+        
+            enumerable_thread_specific_iterator operator-( ptrdiff_t offset ) const {
+                return enumerable_thread_specific_iterator( *my_container, my_index-offset );
+            }
+        
+            enumerable_thread_specific_iterator &operator-=( ptrdiff_t offset ) {
+                my_index -= offset;
+                my_value = NULL;
+                return *this;
+            }
+        
+            Value& operator*() const {
+                Value* value = my_value;
+                if( !value ) {
+                    value = my_value = &(*my_container)[my_index].value;
+                }
+                __TBB_ASSERT( value==&(*my_container)[my_index].value, "corrupt cache" );
+                return *value;
+            }
+        
+            Value& operator[]( ptrdiff_t k ) const {
+               return (*my_container)[my_index + k].value;
+            }
+        
+            Value* operator->() const {return &operator*();}
+        
+            enumerable_thread_specific_iterator& operator++() {
+                ++my_index;
+                my_value = NULL;
+                return *this;
+            }
+        
+            enumerable_thread_specific_iterator& operator--() {
+                --my_index;
+                my_value = NULL;
+                return *this;
+            }
+        
+            //! Post increment
+            enumerable_thread_specific_iterator operator++(int) {
+                enumerable_thread_specific_iterator result = *this;
+                ++my_index;
+                my_value = NULL;
+                return result;
+            }
+        
+            //! Post decrement
+            enumerable_thread_specific_iterator operator--(int) {
+                enumerable_thread_specific_iterator result = *this;
+                --my_index;
+                my_value = NULL;
+                return result;
+            }
+        
+            // STL support
+            typedef ptrdiff_t difference_type;
+            typedef Value value_type;
+            typedef Value* pointer;
+            typedef Value& reference;
+            typedef std::random_access_iterator_tag iterator_category;
+        };
+        
+        template<typename Container, typename T>
+        enumerable_thread_specific_iterator<Container,T> operator+( ptrdiff_t offset, 
+                                                                    const enumerable_thread_specific_iterator<Container,T>& v ) {
+            return enumerable_thread_specific_iterator<Container,T>( v.my_container, v.my_index + offset );
+        }
+        
+        template<typename Container, typename T, typename U>
+        bool operator==( const enumerable_thread_specific_iterator<Container,T>& i, 
+                         const enumerable_thread_specific_iterator<Container,U>& j ) {
+            return i.my_index==j.my_index && i.my_container == j.my_container;
+        }
+        
+        template<typename Container, typename T, typename U>
+        bool operator!=( const enumerable_thread_specific_iterator<Container,T>& i, 
+                         const enumerable_thread_specific_iterator<Container,U>& j ) {
+            return !(i==j);
+        }
+        
+        template<typename Container, typename T, typename U>
+        bool operator<( const enumerable_thread_specific_iterator<Container,T>& i, 
+                        const enumerable_thread_specific_iterator<Container,U>& j ) {
+            return i.my_index<j.my_index;
+        }
+        
+        template<typename Container, typename T, typename U>
+        bool operator>( const enumerable_thread_specific_iterator<Container,T>& i, 
+                        const enumerable_thread_specific_iterator<Container,U>& j ) {
+            return j<i;
+        }
+        
+        template<typename Container, typename T, typename U>
+        bool operator>=( const enumerable_thread_specific_iterator<Container,T>& i, 
+                         const enumerable_thread_specific_iterator<Container,U>& j ) {
+            return !(i<j);
+        }
+        
+        template<typename Container, typename T, typename U>
+        bool operator<=( const enumerable_thread_specific_iterator<Container,T>& i, 
+                         const enumerable_thread_specific_iterator<Container,U>& j ) {
+            return !(j<i);
+        }
+        
+        template<typename Container, typename T, typename U>
+        ptrdiff_t operator-( const enumerable_thread_specific_iterator<Container,T>& i, 
+                             const enumerable_thread_specific_iterator<Container,U>& j ) {
+            return i.my_index-j.my_index;
+        }
+
+    template<typename SegmentedContainer, typename Value >
+        class segmented_iterator
+#if defined(_WIN64) && defined(_MSC_VER)
+        : public std::iterator<std::input_iterator_tag, Value>
+#endif
+        {
+            template<typename C, typename T, typename U>
+            friend bool operator==(const segmented_iterator<C,T>& i, const segmented_iterator<C,U>& j);
+
+            template<typename C, typename T, typename U>
+            friend bool operator!=(const segmented_iterator<C,T>& i, const segmented_iterator<C,U>& j);
+            
+            template<typename C, typename U> 
+            friend class segmented_iterator;
+
+            public:
+
+                segmented_iterator() {my_segcont = NULL;}
+
+                segmented_iterator( const SegmentedContainer& _segmented_container ) : 
+                    my_segcont(const_cast<SegmentedContainer*>(&_segmented_container)),
+                    outer_iter(my_segcont->end()) { }
+
+                ~segmented_iterator() {}
+
+                typedef typename SegmentedContainer::iterator outer_iterator;
+                typedef typename SegmentedContainer::value_type InnerContainer;
+                typedef typename InnerContainer::iterator inner_iterator;
+
+                // STL support
+                typedef ptrdiff_t difference_type;
+                typedef Value value_type;
+                typedef typename SegmentedContainer::size_type size_type;
+                typedef Value* pointer;
+                typedef Value& reference;
+                typedef std::input_iterator_tag iterator_category;
+
+                // Copy Constructor
+                template<typename U>
+                segmented_iterator(const segmented_iterator<SegmentedContainer, U>& other) :
+                    my_segcont(other.my_segcont),
+                    outer_iter(other.outer_iter),
+                    // can we assign a default-constructed iterator to inner if we're at the end?
+                    inner_iter(other.inner_iter)
+                {}
+
+                // assignment
+                template<typename U>
+                segmented_iterator& operator=( const segmented_iterator<SegmentedContainer, U>& other) {
+                    if(this != &other) {
+                        my_segcont = other.my_segcont;
+                        outer_iter = other.outer_iter;
+                        if(outer_iter != my_segcont->end()) inner_iter = other.inner_iter;
+                    }
+                    return *this;
+                }
+
+                // allow assignment of outer iterator to segmented iterator.  Once it is
+                // assigned, move forward until a non-empty inner container is found or
+                // the end of the outer container is reached.
+                segmented_iterator& operator=(const outer_iterator& new_outer_iter) {
+                    __TBB_ASSERT(my_segcont != NULL, NULL);
+                    // check that this iterator points to something inside the segmented container
+                    for(outer_iter = new_outer_iter ;outer_iter!=my_segcont->end(); ++outer_iter) {
+                        if( !outer_iter->empty() ) {
+                            inner_iter = outer_iter->begin();
+                            break;
+                        }
+                    }
+                    return *this;
+                }
+
+                // pre-increment
+                segmented_iterator& operator++() {
+                    advance_me();
+                    return *this;
+                }
+
+                // post-increment
+                segmented_iterator operator++(int) {
+                    segmented_iterator tmp = *this;
+                    operator++();
+                    return tmp;
+                }
+
+                bool operator==(const outer_iterator& other_outer) const {
+                    __TBB_ASSERT(my_segcont != NULL, NULL);
+                    return (outer_iter == other_outer &&
+                            (outer_iter == my_segcont->end() || inner_iter == outer_iter->begin()));
+                }
+
+                bool operator!=(const outer_iterator& other_outer) const {
+                    return !operator==(other_outer);
+
+                }
+
+                // (i)* RHS
+                reference operator*() const {
+                    __TBB_ASSERT(my_segcont != NULL, NULL);
+                    __TBB_ASSERT(outer_iter != my_segcont->end(), "Dereferencing a pointer at end of container");
+                    __TBB_ASSERT(inner_iter != outer_iter->end(), NULL); // should never happen
+                    return *inner_iter;
+                }
+
+                // i->
+                pointer operator->() const { return &operator*();}
+
+            private:
+                SegmentedContainer*             my_segcont;
+                outer_iterator outer_iter;
+                inner_iterator inner_iter;
+
+                void advance_me() {
+                    __TBB_ASSERT(my_segcont != NULL, NULL);
+                    __TBB_ASSERT(outer_iter != my_segcont->end(), NULL); // not true if there are no inner containers
+                    __TBB_ASSERT(inner_iter != outer_iter->end(), NULL); // not true if the inner containers are all empty.
+                    ++inner_iter;
+                    while(inner_iter == outer_iter->end() && ++outer_iter != my_segcont->end()) {
+                        inner_iter = outer_iter->begin();
+                    }
+                }
+        };    // segmented_iterator
+
+        template<typename SegmentedContainer, typename T, typename U>
+        bool operator==( const segmented_iterator<SegmentedContainer,T>& i, 
+                         const segmented_iterator<SegmentedContainer,U>& j ) {
+            if(i.my_segcont != j.my_segcont) return false;
+            if(i.my_segcont == NULL) return true;
+            if(i.outer_iter != j.outer_iter) return false;
+            if(i.outer_iter == i.my_segcont->end()) return true;
+            return i.inner_iter == j.inner_iter;
+        }
+
+        // !=
+        template<typename SegmentedContainer, typename T, typename U>
+        bool operator!=( const segmented_iterator<SegmentedContainer,T>& i, 
+                         const segmented_iterator<SegmentedContainer,U>& j ) {
+            return !(i==j);
+        }
+
+        // empty template for following specializations
+        template<ets_key_usage_type et>
+        struct tls_manager {};
+        
+        //! Struct that doesn't use a key
+        template <>
+        struct tls_manager<ets_no_key> {
+            typedef size_t tls_key_t;
+            static inline void create_key( tls_key_t &) { }
+            static inline void destroy_key( tls_key_t & ) { }
+            static inline void set_tls( tls_key_t &, void *  ) { }
+            static inline void * get_tls( tls_key_t & ) { return (size_t)0; }
+        };
+
+        //! Struct to use native TLS support directly
+        template <>
+        struct tls_manager <ets_key_per_instance> {
+#if _WIN32||_WIN64
+            typedef DWORD tls_key_t;
+            static inline void create_key( tls_key_t &k) { k = TlsAlloc(); }
+            static inline void destroy_key( tls_key_t &k) { TlsFree(k); }
+            static inline void set_tls( tls_key_t &k, void * value) { TlsSetValue(k, (LPVOID)value); }
+            static inline void * get_tls( tls_key_t &k ) { return (void *)TlsGetValue(k); }
+#else
+            typedef pthread_key_t tls_key_t;
+            static inline void create_key( tls_key_t &k) { pthread_key_create(&k, NULL); }
+            static inline void destroy_key( tls_key_t &k) { pthread_key_delete(k); }
+            static inline void set_tls( tls_key_t &k, void * value) { pthread_setspecific(k, value); }
+            static inline void * get_tls( tls_key_t &k ) { return pthread_getspecific(k); }
+#endif
+        };
+
+        class thread_hash_compare {
+        public:
+            // using hack suggested by Arch to get value for thread id for hashing...
+#if _WIN32||_WIN64
+            typedef DWORD thread_key;
+#else
+            typedef pthread_t thread_key;
+#endif
+            static thread_key my_thread_key(const tbb_thread::id j) {
+                thread_key key_val;
+                memcpy(&key_val, &j, sizeof(thread_key));
+                return key_val;
+            }
+
+            bool equal( const thread_key j, const thread_key k) const {
+                return j == k;
+            }
+            unsigned long hash(const thread_key k) const {
+                return (unsigned long)k;
+            }
+        };
+
+        // storage for initialization function pointer
+        template<typename T>
+        struct callback_base {
+            virtual T apply( ) = 0;
+            virtual void destroy( ) = 0;
+            // need to be able to create copies of callback_base for copy constructor
+            virtual callback_base* make_copy() = 0;
+            // need virtual destructor to satisfy GCC compiler warning
+            virtual ~callback_base() { }
+        };
+
+        template <typename T, typename Functor>
+        struct callback_leaf : public callback_base<T> {
+            typedef Functor my_callback_type;
+            typedef callback_leaf<T,Functor> my_type;
+            typedef my_type* callback_pointer;
+            typedef typename tbb::tbb_allocator<my_type> my_allocator_type;
+            Functor f;
+            callback_leaf( const Functor& f_) : f(f_) {
+            }
+
+            static callback_pointer new_callback(const Functor& f_ ) {
+                void* new_void = my_allocator_type().allocate(1);
+                callback_pointer new_cb = new (new_void) callback_leaf<T,Functor>(f_); // placement new
+                return new_cb;
+            }
+
+            /* override */ callback_pointer make_copy() {
+                return new_callback( f );
+            }
+
+             /* override */ void destroy( ) {
+                 callback_pointer my_ptr = this;
+                 my_allocator_type().destroy(my_ptr);
+                 my_allocator_type().deallocate(my_ptr,1);
+             }
+            /* override */ T apply() { return f(); }  // does copy construction of returned value.
+        };
+
+        template<typename Key, typename T, typename HC, typename A>
+        class ets_concurrent_hash_map : public tbb::concurrent_hash_map<Key, T, HC, A> {
+        public:
+            typedef tbb::concurrent_hash_map<Key, T, HC, A> base_type;
+            typedef typename base_type::const_pointer const_pointer;
+            typedef typename base_type::key_type key_type;
+            const_pointer find( const key_type &k ) {
+                return internal_fast_find( k );
+            } // make public
+        };
+    
+    } // namespace internal
+    //! @endcond
+
+    //! The thread local class template
+    template <typename T, 
+              typename Allocator=cache_aligned_allocator<T>, 
+              ets_key_usage_type ETS_key_type=ets_no_key > 
+    class enumerable_thread_specific { 
+
+        template<typename U, typename A, ets_key_usage_type C> friend class enumerable_thread_specific;
+    
+        typedef internal::tls_manager< ETS_key_type > my_tls_manager;
+
+        //! The padded elements; padded to avoid false sharing
+        template<typename U>
+        struct padded_element {
+            U value;
+            char padding[ ( (sizeof(U) - 1) / internal::NFS_MaxLineSize + 1 ) * internal::NFS_MaxLineSize - sizeof(U) ];
+            padded_element(const U &v) : value(v) {}
+            padded_element() {}
+        };
+    
+        //! A generic range, used to create range objects from the iterators
+        template<typename I>
+        class generic_range_type: public blocked_range<I> {
+        public:
+            typedef T value_type;
+            typedef T& reference;
+            typedef const T& const_reference;
+            typedef I iterator;
+            typedef ptrdiff_t difference_type;
+            generic_range_type( I begin_, I end_, size_t grainsize = 1) : blocked_range<I>(begin_,end_,grainsize) {} 
+            template<typename U>
+            generic_range_type( const generic_range_type<U>& r) : blocked_range<I>(r.begin(),r.end(),r.grainsize()) {} 
+            generic_range_type( generic_range_type& r, split ) : blocked_range<I>(r,split()) {}
+        };
+    
+        typedef typename Allocator::template rebind< padded_element<T> >::other padded_allocator_type;
+        typedef tbb::concurrent_vector< padded_element<T>, padded_allocator_type > internal_collection_type;
+        typedef typename internal_collection_type::size_type hash_table_index_type; // storing array indices rather than iterators to simplify
+        // copying the hash table that correlates thread IDs with concurrent vector elements.
+        
+        typedef typename Allocator::template rebind< std::pair< typename internal::thread_hash_compare::thread_key, hash_table_index_type > >::other hash_element_allocator;
+        typedef internal::ets_concurrent_hash_map< typename internal::thread_hash_compare::thread_key, hash_table_index_type, internal::thread_hash_compare, hash_element_allocator > thread_to_index_type;
+
+        typename my_tls_manager::tls_key_t my_key;
+
+        void reset_key() {
+            my_tls_manager::destroy_key(my_key);
+            my_tls_manager::create_key(my_key); 
+        }
+
+        internal::callback_base<T> *my_finit_callback;
+
+        // need to use a pointed-to exemplar because T may not be assignable.
+        // using tbb_allocator instead of padded_element_allocator because we may be
+        // copying an exemplar from one instantiation of ETS to another with a different
+        // allocator.
+        typedef typename tbb::tbb_allocator<padded_element<T> > exemplar_allocator_type;
+        static padded_element<T> * create_exemplar(const T& my_value) {
+            padded_element<T> *new_exemplar = 0;
+            // void *new_space = padded_allocator_type().allocate(1);
+            void *new_space = exemplar_allocator_type().allocate(1);
+            new_exemplar = new(new_space) padded_element<T>(my_value);
+            return new_exemplar;
+        }
+
+        static padded_element<T> *create_exemplar( ) {
+            // void *new_space = padded_allocator_type().allocate(1);
+            void *new_space = exemplar_allocator_type().allocate(1);
+            padded_element<T> *new_exemplar = new(new_space) padded_element<T>( );
+            return new_exemplar;
+        }
+
+        static void free_exemplar(padded_element<T> *my_ptr) {
+            // padded_allocator_type().destroy(my_ptr);
+            // padded_allocator_type().deallocate(my_ptr,1);
+            exemplar_allocator_type().destroy(my_ptr);
+            exemplar_allocator_type().deallocate(my_ptr,1);
+        }
+
+        padded_element<T>* my_exemplar_ptr;
+
+        internal_collection_type my_locals;
+        thread_to_index_type my_hash_tbl;
+    
+    public:
+    
+        //! Basic types
+        typedef Allocator allocator_type;
+        typedef T value_type;
+        typedef T& reference;
+        typedef const T& const_reference;
+        typedef T* pointer;
+        typedef const T* const_pointer;
+        typedef typename internal_collection_type::size_type size_type;
+        typedef typename internal_collection_type::difference_type difference_type;
+    
+        // Iterator types
+        typedef typename internal::enumerable_thread_specific_iterator< internal_collection_type, value_type > iterator;
+        typedef typename internal::enumerable_thread_specific_iterator< internal_collection_type, const value_type > const_iterator;
+
+        // Parallel range types
+        typedef generic_range_type< iterator > range_type;
+        typedef generic_range_type< const_iterator > const_range_type;
+    
+        //! Default constructor, which leads to default construction of local copies
+        enumerable_thread_specific() : my_finit_callback(0) { 
+            my_exemplar_ptr = create_exemplar();
+            my_tls_manager::create_key(my_key); 
+        }
+
+        //! construction with initializer method
+        // Finit should be a function taking 0 parameters and returning a T
+        template <typename Finit>
+        enumerable_thread_specific( Finit _finit )
+        {
+            my_finit_callback = internal::callback_leaf<T,Finit>::new_callback( _finit );
+            my_tls_manager::create_key(my_key);
+            my_exemplar_ptr = 0; // don't need exemplar if function is provided
+        }
+    
+        //! Constuction with exemplar, which leads to copy construction of local copies
+        enumerable_thread_specific(const T &_exemplar) : my_finit_callback(0) {
+            my_exemplar_ptr = create_exemplar(_exemplar);
+            my_tls_manager::create_key(my_key); 
+        }
+    
+        //! Destructor
+        ~enumerable_thread_specific() { 
+            my_tls_manager::destroy_key(my_key); 
+            if(my_finit_callback) {
+                my_finit_callback->destroy();
+            }
+            if(my_exemplar_ptr)
+            {
+                free_exemplar(my_exemplar_ptr);
+            }
+        }
+      
+        //! returns reference to local, discarding exists
+        reference local() {
+            bool exists;
+            return local(exists);
+        }
+
+        //! Returns reference to calling thread's local copy, creating one if necessary
+        reference local(bool& exists)  {
+            if ( pointer local_ptr = static_cast<pointer>(my_tls_manager::get_tls(my_key)) ) {
+                exists = true;
+               return *local_ptr;
+            }
+            hash_table_index_type local_index;
+            typename internal::thread_hash_compare::thread_key my_t_key = internal::thread_hash_compare::my_thread_key(tbb::this_tbb_thread::get_id());
+            {
+                typename thread_to_index_type::const_pointer my_existing_entry;
+                my_existing_entry = my_hash_tbl.find(my_t_key);
+                if(my_existing_entry) {
+                    exists = true;
+                    local_index = my_existing_entry->second;
+                }
+                else {
+
+                    // see if the table entry can be found by accessor
+                    typename thread_to_index_type::accessor a;
+                    if(!my_hash_tbl.insert(a, my_t_key)) {
+                        exists = true;
+                        local_index = a->second;
+                    }
+                    else {
+                        // create new entry
+                        exists = false;
+                        if(my_finit_callback) {
+                            // convert iterator to array index
+#if TBB_DEPRECATED
+                            local_index = my_locals.push_back(my_finit_callback->apply());
+#else
+                            local_index = my_locals.push_back(my_finit_callback->apply()) - my_locals.begin();
+#endif
+                        }
+                        else {
+                            // convert iterator to array index
+#if TBB_DEPRECATED
+                            local_index = my_locals.push_back(*my_exemplar_ptr);
+#else
+                            local_index = my_locals.push_back(*my_exemplar_ptr) - my_locals.begin();
+#endif
+                        }
+                        // insert into hash table
+                        a->second = local_index;
+                    }
+                }
+            }
+
+            reference local_ref = (my_locals[local_index].value);
+            my_tls_manager::set_tls( my_key, static_cast<void *>(&local_ref) );
+            return local_ref;
+        } // local
+
+        //! Get the number of local copies
+        size_type size() const { return my_locals.size(); }
+    
+        //! true if there have been no local copies created
+        bool empty() const { return my_locals.empty(); }
+    
+        //! begin iterator
+        iterator begin() { return iterator( my_locals, 0 ); }
+        //! end iterator
+        iterator end() { return iterator(my_locals, my_locals.size() ); }
+    
+        //! begin const iterator
+        const_iterator begin() const { return const_iterator(my_locals, 0); }
+    
+        //! end const iterator
+        const_iterator end() const { return const_iterator(my_locals, my_locals.size()); }
+
+        //! Get range for parallel algorithms
+        range_type range( size_t grainsize=1 ) { return range_type( begin(), end(), grainsize ); } 
+        
+        //! Get const range for parallel algorithms
+        const_range_type range( size_t grainsize=1 ) const { return const_range_type( begin(), end(), grainsize ); }
+    
+        //! Destroys local copies
+        void clear() {
+            my_locals.clear();
+            my_hash_tbl.clear();
+            reset_key();
+            // callback is not destroyed
+            // exemplar is not destroyed
+        }
+
+        // STL container methods
+        // copy constructor
+
+    private:
+
+        template<typename U, typename A2, ets_key_usage_type C2>
+        void
+        internal_copy_construct( const enumerable_thread_specific<U, A2, C2>& other) {
+            typedef typename tbb::enumerable_thread_specific<U, A2, C2> other_type;
+            for(typename other_type::const_iterator ci = other.begin(); ci != other.end(); ++ci) {
+                my_locals.push_back(*ci);
+            }
+            if(other.my_finit_callback) {
+                my_finit_callback = other.my_finit_callback->make_copy();
+            }
+            else {
+                my_finit_callback = 0;
+            }
+            if(other.my_exemplar_ptr) {
+                my_exemplar_ptr = create_exemplar(other.my_exemplar_ptr->value);
+            }
+            else {
+                my_exemplar_ptr = 0;
+            }
+            my_tls_manager::create_key(my_key);
+        }
+
+    public:
+
+        template<typename U, typename Alloc, ets_key_usage_type Cachetype>
+        enumerable_thread_specific( const enumerable_thread_specific<U, Alloc, Cachetype>& other ) : my_hash_tbl(other.my_hash_tbl) 
+        {   // Have to do push_back because the contained elements are not necessarily assignable.
+            internal_copy_construct(other);
+        }
+
+        // non-templatized version
+        enumerable_thread_specific( const enumerable_thread_specific& other ) : my_hash_tbl(other.my_hash_tbl) 
+        {
+            internal_copy_construct(other);
+        }
+
+    private:
+
+        template<typename U, typename A2, ets_key_usage_type C2>
+        enumerable_thread_specific &
+        internal_assign(const enumerable_thread_specific<U, A2, C2>& other) {
+            typedef typename tbb::enumerable_thread_specific<U, A2, C2> other_type;
+            if(static_cast<void *>( this ) != static_cast<const void *>( &other )) {
+                this->clear(); // resets TLS key
+                my_hash_tbl = other.my_hash_tbl;
+                // cannot use assign because T may not be assignable.
+                for(typename other_type::const_iterator ci = other.begin(); ci != other.end(); ++ci) {
+                    my_locals.push_back(*ci);
+                }
+
+                if(my_finit_callback) {
+                    my_finit_callback->destroy();
+                    my_finit_callback = 0;
+                }
+                if(my_exemplar_ptr) {
+                    free_exemplar(my_exemplar_ptr);
+                    my_exemplar_ptr = 0;
+                }
+                if(other.my_finit_callback) {
+                    my_finit_callback = other.my_finit_callback->make_copy();
+                }
+
+                if(other.my_exemplar_ptr) {
+                    my_exemplar_ptr = create_exemplar(other.my_exemplar_ptr->value);
+                }
+            }
+            return *this;
+        }
+
+    public:
+
+        // assignment
+        enumerable_thread_specific& operator=(const enumerable_thread_specific& other) {
+            return internal_assign(other);
+        }
+
+        template<typename U, typename Alloc, ets_key_usage_type Cachetype>
+        enumerable_thread_specific& operator=(const enumerable_thread_specific<U, Alloc, Cachetype>& other)
+        {
+            return internal_assign(other);
+        }
+
+    private:
+
+        // combine_func_t has signature T(T,T) or T(const T&, const T&)
+        template <typename combine_func_t>
+        T internal_combine(typename internal_collection_type::const_range_type r, combine_func_t f_combine) {
+            if(r.is_divisible()) {
+                typename internal_collection_type::const_range_type r2(r,split());
+                return f_combine(internal_combine(r2, f_combine), internal_combine(r, f_combine));
+            }
+            if(r.size() == 1) {
+                return r.begin()->value;
+            }
+            typename internal_collection_type::const_iterator i2 = r.begin();
+            ++i2;
+            return f_combine(r.begin()->value, i2->value);
+        }
+
+    public:
+
+        // combine_func_t has signature T(T,T) or T(const T&, const T&)
+        template <typename combine_func_t>
+        T combine(combine_func_t f_combine) {
+            if(my_locals.begin() == my_locals.end()) {
+                if(my_finit_callback) {
+                    return my_finit_callback->apply();
+                }
+                return (*my_exemplar_ptr).value;
+            }
+            typename internal_collection_type::const_range_type r(my_locals.begin(), my_locals.end(), (size_t)2);
+            return internal_combine(r, f_combine);
+        }
+
+        // combine_func_t has signature void(T) or void(const T&)
+        template <typename combine_func_t>
+        void combine_each(combine_func_t f_combine) {
+            for(const_iterator ci = begin(); ci != end(); ++ci) {
+                f_combine( *ci );
+            }
+        }
+    }; // enumerable_thread_specific
+
+    template< typename Container >
+    class flattened2d {
+
+        // This intermediate typedef is to address issues with VC7.1 compilers
+        typedef typename Container::value_type conval_type;
+
+    public:
+
+        //! Basic types
+        typedef typename conval_type::size_type size_type;
+        typedef typename conval_type::difference_type difference_type;
+        typedef typename conval_type::allocator_type allocator_type;
+        typedef typename conval_type::value_type value_type;
+        typedef typename conval_type::reference reference;
+        typedef typename conval_type::const_reference const_reference;
+        typedef typename conval_type::pointer pointer;
+        typedef typename conval_type::const_pointer const_pointer;
+
+        typedef typename internal::segmented_iterator<Container, value_type> iterator;
+        typedef typename internal::segmented_iterator<Container, const value_type> const_iterator;
+
+        flattened2d( const Container &c, typename Container::const_iterator b, typename Container::const_iterator e ) : 
+            my_container(const_cast<Container*>(&c)), my_begin(b), my_end(e) { }
+
+        flattened2d( const Container &c ) : 
+            my_container(const_cast<Container*>(&c)), my_begin(c.begin()), my_end(c.end()) { }
+
+        iterator begin() { return iterator(*my_container) = my_begin; }
+        iterator end() { return iterator(*my_container) = my_end; }
+        const_iterator begin() const { return const_iterator(*my_container) = my_begin; }
+        const_iterator end() const { return const_iterator(*my_container) = my_end; }
+
+        size_type size() const {
+            size_type tot_size = 0;
+            for(typename Container::const_iterator i = my_begin; i != my_end; ++i) {
+                tot_size += i->size();
+            }
+            return tot_size;
+        }
+
+    private:
+
+        Container *my_container;
+        typename Container::const_iterator my_begin;
+        typename Container::const_iterator my_end;
+
+    };
+
+    template <typename Container>
+    flattened2d<Container> flatten2d(const Container &c, const typename Container::const_iterator b, const typename Container::const_iterator e) {
+        return flattened2d<Container>(c, b, e);
+    }
+
+    template <typename Container>
+    flattened2d<Container> flatten2d(const Container &c) {
+        return flattened2d<Container>(c);
+    }
+
+} // namespace tbb
+
+#endif
diff --git a/dep/tbb/include/tbb/index.html b/dep/tbb/include/tbb/index.html
new file mode 100644
index 000000000..fa0596588
--- /dev/null
+++ b/dep/tbb/include/tbb/index.html
@@ -0,0 +1,28 @@
+<HTML>
+<BODY>
+
+<H2>Overview</H2>
+Include files for Threading Building Blocks classes and functions.
+
+<BR><A HREF=".">Click here</A> to see all files in the directory.
+
+<H2>Directories</H2>
+<DL>
+<DT><A HREF="machine">machine</A>
+<DD>Include files for low-level architecture specific functionality.
+<DT><A HREF="compat">compat</A>
+<DD>Include files for source level compatibility with other frameworks.
+</DL>
+
+<HR>
+<A HREF="../index.html">Up to parent directory</A>
+<p></p>
+Copyright &copy; 2005-2009 Intel Corporation.  All Rights Reserved.
+<p></p>
+Intel, Pentium, Intel Xeon, Itanium, Intel XScale and VTune are 
+registered trademarks or trademarks of Intel Corporation or its 
+subsidiaries in the United States and other countries. 
+<p></p>
+* Other names and brands may be claimed as the property of others.
+</BODY>
+</HTML>
diff --git a/dep/tbb/include/tbb/machine/ibm_aix51.h b/dep/tbb/include/tbb/machine/ibm_aix51.h
new file mode 100644
index 000000000..439011540
--- /dev/null
+++ b/dep/tbb/include/tbb/machine/ibm_aix51.h
@@ -0,0 +1,52 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_machine_H
+#error Do not include this file directly; include tbb_machine.h instead
+#endif
+
+#define __TBB_WORDSIZE 8
+#define __TBB_BIG_ENDIAN 1
+
+#include <stdint.h>
+#include <unistd.h>
+#include <sched.h>
+
+extern "C" {
+
+int32_t __TBB_machine_cas_32 (volatile void* ptr, int32_t value, int32_t comparand);
+int64_t __TBB_machine_cas_64 (volatile void* ptr, int64_t value, int64_t comparand);
+#define __TBB_fence_for_acquire() __TBB_machine_flush ()
+#define __TBB_fence_for_release() __TBB_machine_flush ()
+
+}
+
+#define __TBB_CompareAndSwap4(P,V,C) __TBB_machine_cas_32(P,V,C)
+#define __TBB_CompareAndSwap8(P,V,C) __TBB_machine_cas_64(P,V,C)
+#define __TBB_CompareAndSwapW(P,V,C) __TBB_machine_cas_64(P,V,C)
+#define __TBB_Yield() sched_yield()
diff --git a/dep/tbb/include/tbb/machine/linux_common.h b/dep/tbb/include/tbb/machine/linux_common.h
new file mode 100644
index 000000000..35bff2592
--- /dev/null
+++ b/dep/tbb/include/tbb/machine/linux_common.h
@@ -0,0 +1,95 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_machine_H
+#error Do not include this file directly; include tbb_machine.h instead
+#endif
+
+#include <stdint.h>
+#include <unistd.h>
+#include <sched.h>
+
+// Definition of __TBB_Yield()
+#define __TBB_Yield()  sched_yield()
+
+/* Futex definitions */
+#include <sys/syscall.h>
+
+#if defined(SYS_futex)
+
+#define __TBB_USE_FUTEX 1
+#include <limits.h>
+#include <errno.h>
+// Unfortunately, some versions of Linux do not have a header that defines FUTEX_WAIT and FUTEX_WAKE.
+
+#ifdef FUTEX_WAIT
+#define __TBB_FUTEX_WAIT FUTEX_WAIT
+#else
+#define __TBB_FUTEX_WAIT 0
+#endif
+
+#ifdef FUTEX_WAKE
+#define __TBB_FUTEX_WAKE FUTEX_WAKE
+#else
+#define __TBB_FUTEX_WAKE 1
+#endif
+
+#ifndef __TBB_ASSERT
+#error machine specific headers must be included after tbb_stddef.h
+#endif
+
+namespace tbb {
+
+namespace internal {
+
+inline int futex_wait( void *futex, int comparand ) {
+    int r = ::syscall( SYS_futex,futex,__TBB_FUTEX_WAIT,comparand,NULL,NULL,0 );
+#if TBB_USE_ASSERT
+    int e = errno;
+    __TBB_ASSERT( r==0||r==EWOULDBLOCK||(r==-1&&(e==EAGAIN||e==EINTR)), "futex_wait failed." );
+#endif /* TBB_USE_ASSERT */
+    return r;
+}
+
+inline int futex_wakeup_one( void *futex ) {
+    int r = ::syscall( SYS_futex,futex,__TBB_FUTEX_WAKE,1,NULL,NULL,0 );
+    __TBB_ASSERT( r==0||r==1, "futex_wakeup_one: more than one thread woken up?" );
+    return r;
+}
+
+inline int futex_wakeup_all( void *futex ) {
+    int r = ::syscall( SYS_futex,futex,__TBB_FUTEX_WAKE,INT_MAX,NULL,NULL,0 );
+    __TBB_ASSERT( r>=0, "futex_wakeup_all: error in waking up threads" );
+    return r;
+}
+
+} /* namespace internal */
+
+} /* namespace tbb */
+
+#endif /* SYS_futex */
diff --git a/dep/tbb/include/tbb/machine/linux_ia32.h b/dep/tbb/include/tbb/machine/linux_ia32.h
new file mode 100644
index 000000000..514e3d79d
--- /dev/null
+++ b/dep/tbb/include/tbb/machine/linux_ia32.h
@@ -0,0 +1,253 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_machine_H
+#error Do not include this file directly; include tbb_machine.h instead
+#endif
+
+#if !__MINGW32__
+#include "linux_common.h"
+#endif
+
+#define __TBB_WORDSIZE 4
+#define __TBB_BIG_ENDIAN 0
+
+#define __TBB_release_consistency_helper() __asm__ __volatile__("": : :"memory")
+
+inline void __TBB_rel_acq_fence() { __asm__ __volatile__("mfence": : :"memory"); }
+
+#define __MACHINE_DECL_ATOMICS(S,T,X) \
+static inline T __TBB_machine_cmpswp##S (volatile void *ptr, T value, T comparand )  \
+{                                                                                    \
+    T result;                                                                        \
+                                                                                     \
+    __asm__ __volatile__("lock\ncmpxchg" X " %2,%1"                                  \
+                          : "=a"(result), "=m"(*(T *)ptr)                            \
+                          : "q"(value), "0"(comparand), "m"(*(T *)ptr)               \
+                          : "memory");                                               \
+    return result;                                                                   \
+}                                                                                    \
+                                                                                     \
+static inline T __TBB_machine_fetchadd##S(volatile void *ptr, T addend)              \
+{                                                                                    \
+    T result;                                                                        \
+    __asm__ __volatile__("lock\nxadd" X " %0,%1"                                     \
+                          : "=r"(result), "=m"(*(T *)ptr)                            \
+                          : "0"(addend), "m"(*(T *)ptr)                              \
+                          : "memory");                                               \
+    return result;                                                                   \
+}                                                                                    \
+                                                                                     \
+static inline  T __TBB_machine_fetchstore##S(volatile void *ptr, T value)            \
+{                                                                                    \
+    T result;                                                                        \
+    __asm__ __volatile__("lock\nxchg" X " %0,%1"                                     \
+                          : "=r"(result), "=m"(*(T *)ptr)                            \
+                          : "0"(value), "m"(*(T *)ptr)                               \
+                          : "memory");                                               \
+    return result;                                                                   \
+}                                                                                    \
+                                                                                     
+__MACHINE_DECL_ATOMICS(1,int8_t,"")
+__MACHINE_DECL_ATOMICS(2,int16_t,"")
+__MACHINE_DECL_ATOMICS(4,int32_t,"l")
+
+static inline int64_t __TBB_machine_cmpswp8 (volatile void *ptr, int64_t value, int64_t comparand )
+{
+    int64_t result;
+#if __PIC__ 
+    /* compiling position-independent code */
+    // EBX register preserved for compliancy with position-independent code rules on IA32
+    __asm__ __volatile__ (
+            "pushl %%ebx\n\t"
+            "movl  (%%ecx),%%ebx\n\t"
+            "movl  4(%%ecx),%%ecx\n\t"
+            "lock\n\t cmpxchg8b %1\n\t"
+            "popl  %%ebx"
+             : "=A"(result), "=m"(*(int64_t *)ptr)
+             : "m"(*(int64_t *)ptr)
+             , "0"(comparand)
+             , "c"(&value)
+             : "memory", "esp"
+#if __INTEL_COMPILER
+             ,"ebx"
+#endif
+    );
+#else /* !__PIC__ */
+    union {
+        int64_t i64;
+        int32_t i32[2];
+    };
+    i64 = value;
+    __asm__ __volatile__ (
+            "lock\n\t cmpxchg8b %1\n\t"
+             : "=A"(result), "=m"(*(int64_t *)ptr)
+             : "m"(*(int64_t *)ptr)
+             , "0"(comparand)
+             , "b"(i32[0]), "c"(i32[1])
+             : "memory"
+    );
+#endif /* __PIC__ */
+    return result;
+}
+
+static inline int32_t __TBB_machine_lg( uint32_t x ) {
+    int32_t j;
+    __asm__ ("bsr %1,%0" : "=r"(j) : "r"(x));
+    return j;
+}
+
+static inline void __TBB_machine_or( volatile void *ptr, uint32_t addend ) {
+    __asm__ __volatile__("lock\norl %1,%0" : "=m"(*(uint32_t *)ptr) : "r"(addend), "m"(*(uint32_t *)ptr) : "memory");
+}
+
+static inline void __TBB_machine_and( volatile void *ptr, uint32_t addend ) {
+    __asm__ __volatile__("lock\nandl %1,%0" : "=m"(*(uint32_t *)ptr) : "r"(addend), "m"(*(uint32_t *)ptr) : "memory");
+}
+
+static inline void __TBB_machine_pause( int32_t delay ) {
+    for (int32_t i = 0; i < delay; i++) {
+       __asm__ __volatile__("pause;");
+    }
+    return;
+}   
+
+static inline int64_t __TBB_machine_load8 (const volatile void *ptr) {
+    int64_t result;
+    if( ((uint32_t)ptr&7u)==0 ) {
+        // Aligned load
+        __asm__ __volatile__ ( "fildq %1\n\t"
+                               "fistpq %0" :  "=m"(result) : "m"(*(uint64_t *)ptr) : "memory" );
+    } else {
+        // Unaligned load
+        result = __TBB_machine_cmpswp8((void*)ptr,0,0);
+    }
+    return result;
+}
+
+//! Handles misaligned 8-byte store
+/** Defined in tbb_misc.cpp */
+extern "C" void __TBB_machine_store8_slow( volatile void *ptr, int64_t value );
+extern "C" void __TBB_machine_store8_slow_perf_warning( volatile void *ptr );
+
+static inline void __TBB_machine_store8(volatile void *ptr, int64_t value) {
+    if( ((uint32_t)ptr&7u)==0 ) {
+        // Aligned store
+        __asm__ __volatile__ ( "fildq %1\n\t"
+                               "fistpq %0" :  "=m"(*(int64_t *)ptr) : "m"(value) : "memory" );
+    } else {
+        // Unaligned store
+#if TBB_USE_PERFORMANCE_WARNINGS
+        __TBB_machine_store8_slow_perf_warning(ptr);
+#endif /* TBB_USE_PERFORMANCE_WARNINGS */
+        __TBB_machine_store8_slow(ptr,value);
+    }
+}
+ 
+template <typename T, size_t S>
+struct __TBB_machine_load_store {
+    static inline T load_with_acquire(const volatile T& location) {
+        T to_return = location;
+        __asm__ __volatile__("" : : : "memory" );   // Compiler fence to keep operations from migrating upwards
+        return to_return;
+    }
+
+    static inline void store_with_release(volatile T &location, T value) {
+        __asm__ __volatile__("" : : : "memory" );   // Compiler fence to keep operations from migrating upwards
+        location = value;
+    }
+};
+
+template <typename T>
+struct __TBB_machine_load_store<T,8> {
+    static inline T load_with_acquire(const volatile T& location) {
+        T to_return = __TBB_machine_load8((volatile void *)&location);
+        __asm__ __volatile__("" : : : "memory" );   // Compiler fence to keep operations from migrating upwards
+        return to_return;
+    }
+
+    static inline void store_with_release(volatile T &location, T value) {
+        __asm__ __volatile__("" : : : "memory" );   // Compiler fence to keep operations from migrating downwards
+        __TBB_machine_store8((volatile void *)&location,(int64_t)value);
+    }
+};
+
+template<typename T>
+inline T __TBB_machine_load_with_acquire(const volatile T &location) {
+    return __TBB_machine_load_store<T,sizeof(T)>::load_with_acquire(location);
+}
+
+template<typename T, typename V>
+inline void __TBB_machine_store_with_release(volatile T &location, V value) {
+    __TBB_machine_load_store<T,sizeof(T)>::store_with_release(location,value);
+}
+
+#define __TBB_load_with_acquire(L) __TBB_machine_load_with_acquire((L))
+#define __TBB_store_with_release(L,V) __TBB_machine_store_with_release((L),(V))
+
+// Machine specific atomic operations
+
+#define __TBB_CompareAndSwap1(P,V,C) __TBB_machine_cmpswp1(P,V,C)
+#define __TBB_CompareAndSwap2(P,V,C) __TBB_machine_cmpswp2(P,V,C)
+#define __TBB_CompareAndSwap4(P,V,C) __TBB_machine_cmpswp4(P,V,C)
+#define __TBB_CompareAndSwap8(P,V,C) __TBB_machine_cmpswp8(P,V,C)
+#define __TBB_CompareAndSwapW(P,V,C) __TBB_machine_cmpswp4(P,V,C)
+
+#define __TBB_FetchAndAdd1(P,V) __TBB_machine_fetchadd1(P,V)
+#define __TBB_FetchAndAdd2(P,V) __TBB_machine_fetchadd2(P,V)
+#define __TBB_FetchAndAdd4(P,V) __TBB_machine_fetchadd4(P,V)
+#define __TBB_FetchAndAddW(P,V) __TBB_machine_fetchadd4(P,V)
+
+#define __TBB_FetchAndStore1(P,V) __TBB_machine_fetchstore1(P,V)
+#define __TBB_FetchAndStore2(P,V) __TBB_machine_fetchstore2(P,V)
+#define __TBB_FetchAndStore4(P,V) __TBB_machine_fetchstore4(P,V)
+#define __TBB_FetchAndStoreW(P,V) __TBB_machine_fetchstore4(P,V)
+
+#define __TBB_Store8(P,V) __TBB_machine_store8(P,V)
+#define __TBB_Load8(P)    __TBB_machine_load8(P)
+
+#define __TBB_AtomicOR(P,V) __TBB_machine_or(P,V)
+#define __TBB_AtomicAND(P,V) __TBB_machine_and(P,V)
+
+
+// Those we chose not to implement (they will be implemented generically using CMPSWP8)
+#undef __TBB_FetchAndAdd8
+#undef __TBB_FetchAndStore8
+
+// Definition of other functions
+#define __TBB_Pause(V) __TBB_machine_pause(V)
+#define __TBB_Log2(V)  __TBB_machine_lg(V)
+
+// Special atomic functions
+#define __TBB_FetchAndAddWrelease(P,V) __TBB_FetchAndAddW(P,V)
+#define __TBB_FetchAndIncrementWacquire(P) __TBB_FetchAndAddW(P,1)
+#define __TBB_FetchAndDecrementWrelease(P) __TBB_FetchAndAddW(P,-1)
+
+// Use generic definitions from tbb_machine.h
+#undef __TBB_TryLockByte
+#undef __TBB_LockByte
diff --git a/dep/tbb/include/tbb/machine/linux_ia64.h b/dep/tbb/include/tbb/machine/linux_ia64.h
new file mode 100644
index 000000000..59347b5cd
--- /dev/null
+++ b/dep/tbb/include/tbb/machine/linux_ia64.h
@@ -0,0 +1,169 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_machine_H
+#error Do not include this file directly; include tbb_machine.h instead
+#endif
+
+#include "linux_common.h"
+#include <ia64intrin.h>
+
+#define __TBB_WORDSIZE 8
+#define __TBB_BIG_ENDIAN 0
+#define __TBB_DECL_FENCED_ATOMICS 1
+
+// Most of the functions will be in a .s file
+
+extern "C" {
+    int8_t __TBB_machine_cmpswp1__TBB_full_fence (volatile void *ptr, int8_t value, int8_t comparand); 
+    int8_t __TBB_machine_fetchadd1__TBB_full_fence (volatile void *ptr, int8_t addend);
+    int8_t __TBB_machine_fetchadd1acquire(volatile void *ptr, int8_t addend);
+    int8_t __TBB_machine_fetchadd1release(volatile void *ptr, int8_t addend);
+    int8_t __TBB_machine_fetchstore1acquire(volatile void *ptr, int8_t value);
+    int8_t __TBB_machine_fetchstore1release(volatile void *ptr, int8_t value);
+
+    int16_t __TBB_machine_cmpswp2__TBB_full_fence (volatile void *ptr, int16_t value, int16_t comparand);
+    int16_t __TBB_machine_fetchadd2__TBB_full_fence (volatile void *ptr, int16_t addend);
+    int16_t __TBB_machine_fetchadd2acquire(volatile void *ptr, int16_t addend);
+    int16_t __TBB_machine_fetchadd2release(volatile void *ptr, int16_t addend);
+    int16_t __TBB_machine_fetchstore2acquire(volatile void *ptr, int16_t value);
+    int16_t __TBB_machine_fetchstore2release(volatile void *ptr, int16_t value);
+
+    int32_t __TBB_machine_fetchstore4__TBB_full_fence (volatile void *ptr, int32_t value);
+    int32_t __TBB_machine_fetchstore4acquire(volatile void *ptr, int32_t value);
+    int32_t __TBB_machine_fetchstore4release(volatile void *ptr, int32_t value);
+    int32_t __TBB_machine_fetchadd4acquire(volatile void *ptr, int32_t addend);
+    int32_t __TBB_machine_fetchadd4release(volatile void *ptr, int32_t addend);
+
+    int64_t __TBB_machine_cmpswp8__TBB_full_fence (volatile void *ptr, int64_t value, int64_t comparand);
+    int64_t __TBB_machine_fetchstore8__TBB_full_fence (volatile void *ptr, int64_t value);
+    int64_t __TBB_machine_fetchstore8acquire(volatile void *ptr, int64_t value);
+    int64_t __TBB_machine_fetchstore8release(volatile void *ptr, int64_t value);
+    int64_t __TBB_machine_fetchadd8acquire(volatile void *ptr, int64_t addend);
+    int64_t __TBB_machine_fetchadd8release(volatile void *ptr, int64_t addend);
+
+    int8_t __TBB_machine_cmpswp1acquire(volatile void *ptr, int8_t value, int8_t comparand); 
+    int8_t __TBB_machine_cmpswp1release(volatile void *ptr, int8_t value, int8_t comparand); 
+    int8_t __TBB_machine_fetchstore1__TBB_full_fence (volatile void *ptr, int8_t value);
+
+    int16_t __TBB_machine_cmpswp2acquire(volatile void *ptr, int16_t value, int16_t comparand); 
+    int16_t __TBB_machine_cmpswp2release(volatile void *ptr, int16_t value, int16_t comparand); 
+    int16_t __TBB_machine_fetchstore2__TBB_full_fence (volatile void *ptr, int16_t value);
+
+    int32_t __TBB_machine_cmpswp4__TBB_full_fence (volatile void *ptr, int32_t value, int32_t comparand);
+    int32_t __TBB_machine_cmpswp4acquire(volatile void *ptr, int32_t value, int32_t comparand); 
+    int32_t __TBB_machine_cmpswp4release(volatile void *ptr, int32_t value, int32_t comparand); 
+    int32_t __TBB_machine_fetchadd4__TBB_full_fence (volatile void *ptr, int32_t value);
+
+    int64_t __TBB_machine_cmpswp8acquire(volatile void *ptr, int64_t value, int64_t comparand); 
+    int64_t __TBB_machine_cmpswp8release(volatile void *ptr, int64_t value, int64_t comparand); 
+    int64_t __TBB_machine_fetchadd8__TBB_full_fence (volatile void *ptr, int64_t value);
+
+    int64_t __TBB_machine_lg(uint64_t value);
+    void __TBB_machine_pause(int32_t delay);
+    bool __TBB_machine_trylockbyte( volatile unsigned char &ptr );
+    int64_t __TBB_machine_lockbyte( volatile unsigned char &ptr );
+
+    //! Retrieves the current RSE backing store pointer. IA64 specific.
+    void* __TBB_get_bsp();
+}
+
+#define __TBB_CompareAndSwap1(P,V,C) __TBB_machine_cmpswp1__TBB_full_fence(P,V,C)
+#define __TBB_CompareAndSwap2(P,V,C) __TBB_machine_cmpswp2__TBB_full_fence(P,V,C) 
+
+#define __TBB_FetchAndAdd1(P,V)        __TBB_machine_fetchadd1__TBB_full_fence(P,V)
+#define __TBB_FetchAndAdd1acquire(P,V) __TBB_machine_fetchadd1acquire(P,V)
+#define __TBB_FetchAndAdd1release(P,V) __TBB_machine_fetchadd1release(P,V)
+#define __TBB_FetchAndAdd2(P,V)        __TBB_machine_fetchadd2__TBB_full_fence(P,V)
+#define __TBB_FetchAndAdd2acquire(P,V) __TBB_machine_fetchadd2acquire(P,V)
+#define __TBB_FetchAndAdd2release(P,V) __TBB_machine_fetchadd2release(P,V)
+#define __TBB_FetchAndAdd4acquire(P,V) __TBB_machine_fetchadd4acquire(P,V)
+#define __TBB_FetchAndAdd4release(P,V) __TBB_machine_fetchadd4release(P,V)
+#define __TBB_FetchAndAdd8acquire(P,V) __TBB_machine_fetchadd8acquire(P,V)
+#define __TBB_FetchAndAdd8release(P,V) __TBB_machine_fetchadd8release(P,V)
+
+#define __TBB_FetchAndStore1acquire(P,V) __TBB_machine_fetchstore1acquire(P,V)
+#define __TBB_FetchAndStore1release(P,V) __TBB_machine_fetchstore1release(P,V)
+#define __TBB_FetchAndStore2acquire(P,V) __TBB_machine_fetchstore2acquire(P,V)
+#define __TBB_FetchAndStore2release(P,V) __TBB_machine_fetchstore2release(P,V)
+#define __TBB_FetchAndStore4acquire(P,V) __TBB_machine_fetchstore4acquire(P,V)
+#define __TBB_FetchAndStore4release(P,V) __TBB_machine_fetchstore4release(P,V)
+#define __TBB_FetchAndStore8acquire(P,V) __TBB_machine_fetchstore8acquire(P,V)
+#define __TBB_FetchAndStore8release(P,V) __TBB_machine_fetchstore8release(P,V)
+
+#define __TBB_CompareAndSwap1acquire(P,V,C) __TBB_machine_cmpswp1acquire(P,V,C)
+#define __TBB_CompareAndSwap1release(P,V,C) __TBB_machine_cmpswp1release(P,V,C)
+#define __TBB_CompareAndSwap2acquire(P,V,C) __TBB_machine_cmpswp2acquire(P,V,C)
+#define __TBB_CompareAndSwap2release(P,V,C) __TBB_machine_cmpswp2release(P,V,C)
+#define __TBB_CompareAndSwap4(P,V,C)        __TBB_machine_cmpswp4__TBB_full_fence(P,V,C)
+#define __TBB_CompareAndSwap4acquire(P,V,C) __TBB_machine_cmpswp4acquire(P,V,C)
+#define __TBB_CompareAndSwap4release(P,V,C) __TBB_machine_cmpswp4release(P,V,C)
+#define __TBB_CompareAndSwap8(P,V,C)        __TBB_machine_cmpswp8__TBB_full_fence(P,V,C)
+#define __TBB_CompareAndSwap8acquire(P,V,C) __TBB_machine_cmpswp8acquire(P,V,C)
+#define __TBB_CompareAndSwap8release(P,V,C) __TBB_machine_cmpswp8release(P,V,C)
+
+#define __TBB_FetchAndAdd4(P,V) __TBB_machine_fetchadd4__TBB_full_fence(P,V)
+#define __TBB_FetchAndAdd8(P,V) __TBB_machine_fetchadd8__TBB_full_fence(P,V)
+
+#define __TBB_FetchAndStore1(P,V) __TBB_machine_fetchstore1__TBB_full_fence(P,V)
+#define __TBB_FetchAndStore2(P,V) __TBB_machine_fetchstore2__TBB_full_fence(P,V)
+#define __TBB_FetchAndStore4(P,V) __TBB_machine_fetchstore4__TBB_full_fence(P,V)
+#define __TBB_FetchAndStore8(P,V) __TBB_machine_fetchstore8__TBB_full_fence(P,V)
+
+#define __TBB_FetchAndIncrementWacquire(P) __TBB_FetchAndAdd8acquire(P,1)
+#define __TBB_FetchAndDecrementWrelease(P) __TBB_FetchAndAdd8release(P,-1)
+
+#ifndef __INTEL_COMPILER
+/* Even though GCC imbues volatile loads with acquire semantics, 
+   it sometimes moves loads over the acquire fence.  The
+   fences defined here stop such incorrect code motion. */
+#define __TBB_release_consistency_helper() __asm__ __volatile__("": : :"memory")
+#define __TBB_rel_acq_fence() __asm__ __volatile__("mf": : :"memory")
+#else
+#define __TBB_release_consistency_helper()
+#define __TBB_rel_acq_fence() __mf()
+#endif /* __INTEL_COMPILER */
+
+// Special atomic functions
+#define __TBB_CompareAndSwapW(P,V,C)   __TBB_CompareAndSwap8(P,V,C)
+#define __TBB_FetchAndStoreW(P,V)      __TBB_FetchAndStore8(P,V)
+#define __TBB_FetchAndAddW(P,V)        __TBB_FetchAndAdd8(P,V)
+#define __TBB_FetchAndAddWrelease(P,V) __TBB_FetchAndAdd8release(P,V)
+
+// Not needed
+#undef __TBB_Store8
+#undef __TBB_Load8
+
+// Definition of Lock functions
+#define __TBB_TryLockByte(P) __TBB_machine_trylockbyte(P)
+#define __TBB_LockByte(P)    __TBB_machine_lockbyte(P)
+
+// Definition of other utility functions
+#define __TBB_Pause(V) __TBB_machine_pause(V)
+#define __TBB_Log2(V)  __TBB_machine_lg(V)
+
diff --git a/dep/tbb/include/tbb/machine/linux_intel64.h b/dep/tbb/include/tbb/machine/linux_intel64.h
new file mode 100644
index 000000000..55bca95eb
--- /dev/null
+++ b/dep/tbb/include/tbb/machine/linux_intel64.h
@@ -0,0 +1,139 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_machine_H
+#error Do not include this file directly; include tbb_machine.h instead
+#endif
+
+#include "linux_common.h"
+
+#define __TBB_WORDSIZE 8
+#define __TBB_BIG_ENDIAN 0
+
+#define __TBB_release_consistency_helper() __asm__ __volatile__("": : :"memory")
+
+#ifndef __TBB_rel_acq_fence
+inline void __TBB_rel_acq_fence() { __asm__ __volatile__("mfence": : :"memory"); }
+#endif
+
+#define __MACHINE_DECL_ATOMICS(S,T,X) \
+static inline T __TBB_machine_cmpswp##S (volatile void *ptr, T value, T comparand )  \
+{                                                                                    \
+    T result;                                                                        \
+                                                                                     \
+    __asm__ __volatile__("lock\ncmpxchg" X " %2,%1"                                  \
+                          : "=a"(result), "=m"(*(T *)ptr)                            \
+                          : "q"(value), "0"(comparand), "m"(*(T *)ptr)               \
+                          : "memory");                                               \
+    return result;                                                                   \
+}                                                                                    \
+                                                                                     \
+static inline T __TBB_machine_fetchadd##S(volatile void *ptr, T addend)              \
+{                                                                                    \
+    T result;                                                                        \
+    __asm__ __volatile__("lock\nxadd" X " %0,%1"                                     \
+                          : "=r"(result),"=m"(*(T *)ptr)                             \
+                          : "0"(addend), "m"(*(T *)ptr)                              \
+                          : "memory");                                               \
+    return result;                                                                   \
+}                                                                                    \
+                                                                                     \
+static inline  T __TBB_machine_fetchstore##S(volatile void *ptr, T value)            \
+{                                                                                    \
+    T result;                                                                        \
+    __asm__ __volatile__("lock\nxchg" X " %0,%1"                                     \
+                          : "=r"(result),"=m"(*(T *)ptr)                             \
+                          : "0"(value), "m"(*(T *)ptr)                               \
+                          : "memory");                                               \
+    return result;                                                                   \
+}                                                                                    \
+                                                                                     
+__MACHINE_DECL_ATOMICS(1,int8_t,"")
+__MACHINE_DECL_ATOMICS(2,int16_t,"")
+__MACHINE_DECL_ATOMICS(4,int32_t,"")
+__MACHINE_DECL_ATOMICS(8,int64_t,"q")
+
+static inline int64_t __TBB_machine_lg( uint64_t x ) {
+    int64_t j;
+    __asm__ ("bsr %1,%0" : "=r"(j) : "r"(x));
+    return j;
+}
+
+static inline void __TBB_machine_or( volatile void *ptr, uint64_t addend ) {
+    __asm__ __volatile__("lock\norq %1,%0" : "=m"(*(uint64_t *)ptr) : "r"(addend), "m"(*(uint64_t *)ptr) : "memory");
+}
+
+static inline void __TBB_machine_and( volatile void *ptr, uint64_t addend ) {
+    __asm__ __volatile__("lock\nandq %1,%0" : "=m"(*(uint64_t *)ptr) : "r"(addend), "m"(*(uint64_t *)ptr) : "memory");
+}
+
+static inline void __TBB_machine_pause( int32_t delay ) {
+    for (int32_t i = 0; i < delay; i++) {
+       __asm__ __volatile__("pause;");
+    }
+    return;
+}   
+
+// Machine specific atomic operations
+
+#define __TBB_CompareAndSwap1(P,V,C) __TBB_machine_cmpswp1(P,V,C)
+#define __TBB_CompareAndSwap2(P,V,C) __TBB_machine_cmpswp2(P,V,C)
+#define __TBB_CompareAndSwap4(P,V,C) __TBB_machine_cmpswp4(P,V,C)
+#define __TBB_CompareAndSwap8(P,V,C) __TBB_machine_cmpswp8(P,V,C)
+#define __TBB_CompareAndSwapW(P,V,C) __TBB_machine_cmpswp8(P,V,C)
+
+#define __TBB_FetchAndAdd1(P,V) __TBB_machine_fetchadd1(P,V)
+#define __TBB_FetchAndAdd2(P,V) __TBB_machine_fetchadd2(P,V)
+#define __TBB_FetchAndAdd4(P,V) __TBB_machine_fetchadd4(P,V)
+#define __TBB_FetchAndAdd8(P,V)  __TBB_machine_fetchadd8(P,V)
+#define __TBB_FetchAndAddW(P,V)  __TBB_machine_fetchadd8(P,V)
+
+#define __TBB_FetchAndStore1(P,V) __TBB_machine_fetchstore1(P,V)
+#define __TBB_FetchAndStore2(P,V) __TBB_machine_fetchstore2(P,V)
+#define __TBB_FetchAndStore4(P,V) __TBB_machine_fetchstore4(P,V)
+#define __TBB_FetchAndStore8(P,V)  __TBB_machine_fetchstore8(P,V)
+#define __TBB_FetchAndStoreW(P,V)  __TBB_machine_fetchstore8(P,V)
+
+#define __TBB_Store8(P,V) (*P = V)
+#define __TBB_Load8(P)    (*P)
+
+#define __TBB_AtomicOR(P,V) __TBB_machine_or(P,V)
+#define __TBB_AtomicAND(P,V) __TBB_machine_and(P,V)
+
+// Definition of other functions
+#define __TBB_Pause(V) __TBB_machine_pause(V)
+#define __TBB_Log2(V)    __TBB_machine_lg(V)
+
+// Special atomic functions
+#define __TBB_FetchAndAddWrelease(P,V) __TBB_FetchAndAddW(P,V)
+#define __TBB_FetchAndIncrementWacquire(P) __TBB_FetchAndAddW(P,1)
+#define __TBB_FetchAndDecrementWrelease(P) __TBB_FetchAndAddW(P,-1)
+
+// Use generic definitions from tbb_machine.h
+#undef __TBB_TryLockByte
+#undef __TBB_LockByte
diff --git a/dep/tbb/include/tbb/machine/mac_ppc.h b/dep/tbb/include/tbb/machine/mac_ppc.h
new file mode 100644
index 000000000..6d6b1befe
--- /dev/null
+++ b/dep/tbb/include/tbb/machine/mac_ppc.h
@@ -0,0 +1,85 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_machine_H
+#error Do not include this file directly; include tbb_machine.h instead
+#endif
+
+#include <stdint.h>
+#include <unistd.h>
+
+#include <sched.h> // sched_yield
+
+inline int32_t __TBB_machine_cmpswp4 (volatile void *ptr, int32_t value, int32_t comparand )
+{
+    int32_t result;
+
+    __asm__ __volatile__("sync\n"
+                         "0: lwarx %0,0,%2\n\t"  /* load w/ reservation */
+                         "cmpw %0,%4\n\t"        /* compare against comparand */
+                         "bne- 1f\n\t"           /* exit if not same */
+                         "stwcx. %3,0,%2\n\t"    /* store new_value */
+                         "bne- 0b\n"             /* retry if reservation lost */
+                         "1: sync"               /* the exit */
+                          : "=&r"(result), "=m"(* (int32_t*) ptr)
+                          : "r"(ptr), "r"(value), "r"(comparand), "m"(* (int32_t*) ptr)
+                          : "cr0");
+    return result;
+}
+
+inline int64_t __TBB_machine_cmpswp8 (volatile void *ptr, int64_t value, int64_t comparand )
+{
+    int64_t result;
+    __asm__ __volatile__("sync\n"
+                         "0: ldarx %0,0,%2\n\t"  /* load w/ reservation */
+                         "cmpd %0,%4\n\t"        /* compare against comparand */
+                         "bne- 1f\n\t"           /* exit if not same */
+                         "stdcx. %3,0,%2\n\t"    /* store new_value */
+                         "bne- 0b\n"             /* retry if reservation lost */
+                         "1: sync"               /* the exit */
+                          : "=&b"(result), "=m"(* (int64_t*) ptr)
+                          : "r"(ptr), "r"(value), "r"(comparand), "m"(* (int64_t*) ptr)
+                          : "cr0");
+    return result;
+}
+
+#define __TBB_BIG_ENDIAN 1
+
+#if defined(powerpc64) || defined(__powerpc64__) || defined(__ppc64__)
+#define __TBB_WORDSIZE 8
+#define __TBB_CompareAndSwapW(P,V,C) __TBB_machine_cmpswp8(P,V,C)
+#else
+#define __TBB_WORDSIZE 4
+#define __TBB_CompareAndSwapW(P,V,C) __TBB_machine_cmpswp4(P,V,C)
+#endif
+
+#define __TBB_CompareAndSwap4(P,V,C) __TBB_machine_cmpswp4(P,V,C)
+#define __TBB_CompareAndSwap8(P,V,C) __TBB_machine_cmpswp8(P,V,C)
+#define __TBB_Yield() sched_yield()
+#define __TBB_rel_acq_fence() __asm__ __volatile__("lwsync": : :"memory")
+#define __TBB_release_consistency_helper() __TBB_rel_acq_fence()
diff --git a/dep/tbb/include/tbb/machine/windows_ia32.h b/dep/tbb/include/tbb/machine/windows_ia32.h
new file mode 100644
index 000000000..69c961a24
--- /dev/null
+++ b/dep/tbb/include/tbb/machine/windows_ia32.h
@@ -0,0 +1,242 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_machine_H
+#error Do not include this file directly; include tbb_machine.h instead
+#endif
+
+#if defined(__INTEL_COMPILER)
+#define __TBB_release_consistency_helper() __asm { __asm nop }
+#elif _MSC_VER >= 1300
+extern "C" void _ReadWriteBarrier();
+#pragma intrinsic(_ReadWriteBarrier)
+#define __TBB_release_consistency_helper() _ReadWriteBarrier()
+#else
+#error Unsupported compiler - need to define __TBB_release_consistency_helper to support it
+#endif
+
+inline void __TBB_rel_acq_fence() { __asm { __asm mfence } }
+
+#define __TBB_WORDSIZE 4
+#define __TBB_BIG_ENDIAN 0
+
+#if defined(_MSC_VER) && !defined(__INTEL_COMPILER)
+    // Workaround for overzealous compiler warnings in /Wp64 mode
+    #pragma warning (push)
+    #pragma warning (disable: 4244 4267)
+#endif
+
+extern "C" {
+    __int64 __TBB_EXPORTED_FUNC __TBB_machine_cmpswp8 (volatile void *ptr, __int64 value, __int64 comparand );
+    __int64 __TBB_EXPORTED_FUNC __TBB_machine_fetchadd8 (volatile void *ptr, __int64 addend );
+    __int64 __TBB_EXPORTED_FUNC __TBB_machine_fetchstore8 (volatile void *ptr, __int64 value );
+    void __TBB_EXPORTED_FUNC __TBB_machine_store8 (volatile void *ptr, __int64 value );
+    __int64 __TBB_EXPORTED_FUNC __TBB_machine_load8 (const volatile void *ptr);
+}
+
+template <typename T, size_t S>
+struct __TBB_machine_load_store {
+    static inline T load_with_acquire(const volatile T& location) {
+        T to_return = location;
+        __TBB_release_consistency_helper();
+        return to_return;
+    }
+
+    static inline void store_with_release(volatile T &location, T value) {
+        __TBB_release_consistency_helper();
+        location = value;
+    }
+};
+
+template <typename T>
+struct __TBB_machine_load_store<T,8> {
+    static inline T load_with_acquire(const volatile T& location) {
+        return __TBB_machine_load8((volatile void *)&location);
+    }
+
+    static inline void store_with_release(T &location, T value) {
+        __TBB_machine_store8((volatile void *)&location,(__int64)value);
+    }
+};
+
+template<typename T>
+inline T __TBB_machine_load_with_acquire(const volatile T &location) {
+    return __TBB_machine_load_store<T,sizeof(T)>::load_with_acquire(location);
+}
+
+template<typename T, typename V>
+inline void __TBB_machine_store_with_release(T& location, V value) {
+    __TBB_machine_load_store<T,sizeof(T)>::store_with_release(location,value);
+}
+
+//! Overload that exists solely to avoid /Wp64 warnings.
+inline void __TBB_machine_store_with_release(size_t& location, size_t value) {
+    __TBB_machine_load_store<size_t,sizeof(size_t)>::store_with_release(location,value);
+} 
+
+#define __TBB_load_with_acquire(L) __TBB_machine_load_with_acquire((L))
+#define __TBB_store_with_release(L,V) __TBB_machine_store_with_release((L),(V))
+
+#define __TBB_DEFINE_ATOMICS(S,T,U,A,C) \
+static inline T __TBB_machine_cmpswp##S ( volatile void * ptr, U value, U comparand ) { \
+    T result; \
+    volatile T *p = (T *)ptr; \
+    __TBB_release_consistency_helper(); \
+    __asm \
+    { \
+       __asm mov edx, p \
+       __asm mov C , value \
+       __asm mov A , comparand \
+       __asm lock cmpxchg [edx], C \
+       __asm mov result, A \
+    } \
+    __TBB_release_consistency_helper(); \
+    return result; \
+} \
+\
+static inline T __TBB_machine_fetchadd##S ( volatile void * ptr, U addend ) { \
+    T result; \
+    volatile T *p = (T *)ptr; \
+    __TBB_release_consistency_helper(); \
+    __asm \
+    { \
+        __asm mov edx, p \
+        __asm mov A, addend \
+        __asm lock xadd [edx], A \
+        __asm mov result, A \
+    } \
+    __TBB_release_consistency_helper(); \
+    return result; \
+}\
+\
+static inline T __TBB_machine_fetchstore##S ( volatile void * ptr, U value ) { \
+    T result; \
+    volatile T *p = (T *)ptr; \
+    __TBB_release_consistency_helper(); \
+    __asm \
+    { \
+        __asm mov edx, p \
+        __asm mov A, value \
+        __asm lock xchg [edx], A \
+        __asm mov result, A \
+    } \
+    __TBB_release_consistency_helper(); \
+    return result; \
+}
+
+__TBB_DEFINE_ATOMICS(1, __int8, __int8, al, cl)
+__TBB_DEFINE_ATOMICS(2, __int16, __int16, ax, cx)
+__TBB_DEFINE_ATOMICS(4, __int32, ptrdiff_t, eax, ecx)
+
+static inline __int32 __TBB_machine_lg( unsigned __int64 i ) {
+    unsigned __int32 j;
+    __asm
+    {
+        bsr eax, i
+        mov j, eax
+    }
+    return j;
+}
+
+static inline void __TBB_machine_OR( volatile void *operand, __int32 addend ) {
+   __asm 
+   {
+       mov eax, addend
+       mov edx, [operand]
+       lock or [edx], eax
+   }
+}
+
+static inline void __TBB_machine_AND( volatile void *operand, __int32 addend ) {
+   __asm 
+   {
+       mov eax, addend
+       mov edx, [operand]
+       lock and [edx], eax
+   }
+}
+
+static inline void __TBB_machine_pause (__int32 delay ) {
+    _asm 
+    {
+        mov eax, delay
+      L1: 
+        pause
+        add eax, -1
+        jne L1  
+    }
+    return;
+}
+
+#define __TBB_CompareAndSwap1(P,V,C) __TBB_machine_cmpswp1(P,V,C)
+#define __TBB_CompareAndSwap2(P,V,C) __TBB_machine_cmpswp2(P,V,C)
+#define __TBB_CompareAndSwap4(P,V,C) __TBB_machine_cmpswp4(P,V,C)
+#define __TBB_CompareAndSwap8(P,V,C) __TBB_machine_cmpswp8(P,V,C)
+#define __TBB_CompareAndSwapW(P,V,C) __TBB_machine_cmpswp4(P,V,C)
+
+#define __TBB_FetchAndAdd1(P,V) __TBB_machine_fetchadd1(P,V)
+#define __TBB_FetchAndAdd2(P,V) __TBB_machine_fetchadd2(P,V)
+#define __TBB_FetchAndAdd4(P,V) __TBB_machine_fetchadd4(P,V)
+#define __TBB_FetchAndAdd8(P,V) __TBB_machine_fetchadd8(P,V)
+#define __TBB_FetchAndAddW(P,V) __TBB_machine_fetchadd4(P,V)
+
+#define __TBB_FetchAndStore1(P,V) __TBB_machine_fetchstore1(P,V)
+#define __TBB_FetchAndStore2(P,V) __TBB_machine_fetchstore2(P,V)
+#define __TBB_FetchAndStore4(P,V) __TBB_machine_fetchstore4(P,V)
+#define __TBB_FetchAndStore8(P,V) __TBB_machine_fetchstore8(P,V)
+#define __TBB_FetchAndStoreW(P,V) __TBB_machine_fetchstore4(P,V)
+
+// Should define this: 
+#define __TBB_Store8(P,V) __TBB_machine_store8(P,V)
+#define __TBB_Load8(P) __TBB_machine_load8(P)
+#define __TBB_AtomicOR(P,V) __TBB_machine_OR(P,V)
+#define __TBB_AtomicAND(P,V) __TBB_machine_AND(P,V)
+
+// Definition of other functions
+extern "C" __declspec(dllimport) int __stdcall SwitchToThread( void );
+#define __TBB_Yield()  SwitchToThread()
+#define __TBB_Pause(V) __TBB_machine_pause(V)
+#define __TBB_Log2(V)    __TBB_machine_lg(V)
+
+// Use generic definitions from tbb_machine.h
+#undef __TBB_TryLockByte
+#undef __TBB_LockByte
+
+#if defined(_MSC_VER)&&_MSC_VER<1400
+    static inline void* __TBB_machine_get_current_teb () {
+        void* pteb;
+        __asm mov eax, fs:[0x18]
+        __asm mov pteb, eax
+        return pteb;
+    }
+#endif
+
+#if defined(_MSC_VER) && !defined(__INTEL_COMPILER)
+    #pragma warning (pop)
+#endif // warnings 4244, 4267 are back
+
diff --git a/dep/tbb/include/tbb/machine/windows_intel64.h b/dep/tbb/include/tbb/machine/windows_intel64.h
new file mode 100644
index 000000000..a885aa46d
--- /dev/null
+++ b/dep/tbb/include/tbb/machine/windows_intel64.h
@@ -0,0 +1,132 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_machine_H
+#error Do not include this file directly; include tbb_machine.h instead
+#endif
+
+#include <intrin.h>
+#if !defined(__INTEL_COMPILER)
+#pragma intrinsic(_InterlockedOr64)
+#pragma intrinsic(_InterlockedAnd64)
+#pragma intrinsic(_InterlockedCompareExchange)
+#pragma intrinsic(_InterlockedCompareExchange64)
+#pragma intrinsic(_InterlockedExchangeAdd)
+#pragma intrinsic(_InterlockedExchangeAdd64)
+#pragma intrinsic(_InterlockedExchange)
+#pragma intrinsic(_InterlockedExchange64)
+#endif /* !defined(__INTEL_COMPILER) */
+
+#if defined(__INTEL_COMPILER)
+#define __TBB_release_consistency_helper() __asm { __asm nop }
+inline void __TBB_rel_acq_fence() { __asm { __asm mfence } }
+#elif _MSC_VER >= 1300
+extern "C" void _ReadWriteBarrier();
+#pragma intrinsic(_ReadWriteBarrier)
+#define __TBB_release_consistency_helper() _ReadWriteBarrier()
+#pragma intrinsic(_mm_mfence)
+inline void __TBB_rel_acq_fence() { _mm_mfence(); }
+#endif
+
+#define __TBB_WORDSIZE 8
+#define __TBB_BIG_ENDIAN 0
+
+// ATTENTION: if you ever change argument types in machine-specific primitives,
+// please take care of atomic_word<> specializations in tbb/atomic.h
+extern "C" {
+    __int8 __TBB_EXPORTED_FUNC __TBB_machine_cmpswp1 (volatile void *ptr, __int8 value, __int8 comparand );
+    __int8 __TBB_EXPORTED_FUNC __TBB_machine_fetchadd1 (volatile void *ptr, __int8 addend );
+    __int8 __TBB_EXPORTED_FUNC __TBB_machine_fetchstore1 (volatile void *ptr, __int8 value );
+    __int16 __TBB_EXPORTED_FUNC __TBB_machine_cmpswp2 (volatile void *ptr, __int16 value, __int16 comparand );
+    __int16 __TBB_EXPORTED_FUNC __TBB_machine_fetchadd2 (volatile void *ptr, __int16 addend );
+    __int16 __TBB_EXPORTED_FUNC __TBB_machine_fetchstore2 (volatile void *ptr, __int16 value );
+    void __TBB_EXPORTED_FUNC __TBB_machine_pause (__int32 delay );
+}
+
+
+#if !__INTEL_COMPILER
+extern "C" unsigned char _BitScanReverse64( unsigned long* i, unsigned __int64 w );
+#pragma intrinsic(_BitScanReverse64)
+#endif
+
+inline __int64 __TBB_machine_lg( unsigned __int64 i ) {
+#if __INTEL_COMPILER
+    unsigned __int64 j;
+    __asm
+    {
+        bsr rax, i
+        mov j, rax
+    }
+#else
+    unsigned long j;
+    _BitScanReverse64( &j, i );
+#endif
+    return j;
+}
+
+inline void __TBB_machine_OR( volatile void *operand, intptr_t addend ) {
+    _InterlockedOr64((__int64*)operand, addend); 
+}
+
+inline void __TBB_machine_AND( volatile void *operand, intptr_t addend ) {
+    _InterlockedAnd64((__int64*)operand, addend); 
+}
+
+#define __TBB_CompareAndSwap1(P,V,C) __TBB_machine_cmpswp1(P,V,C)
+#define __TBB_CompareAndSwap2(P,V,C) __TBB_machine_cmpswp2(P,V,C)
+#define __TBB_CompareAndSwap4(P,V,C) _InterlockedCompareExchange( (long*) P , V , C ) 
+#define __TBB_CompareAndSwap8(P,V,C) _InterlockedCompareExchange64( (__int64*) P , V , C )
+#define __TBB_CompareAndSwapW(P,V,C) _InterlockedCompareExchange64( (__int64*) P , V , C )
+
+#define __TBB_FetchAndAdd1(P,V) __TBB_machine_fetchadd1(P,V)
+#define __TBB_FetchAndAdd2(P,V) __TBB_machine_fetchadd2(P,V)
+#define __TBB_FetchAndAdd4(P,V) _InterlockedExchangeAdd((long*) P , V )
+#define __TBB_FetchAndAdd8(P,V) _InterlockedExchangeAdd64((__int64*) P , V )
+#define __TBB_FetchAndAddW(P,V) _InterlockedExchangeAdd64((__int64*) P , V )
+
+#define __TBB_FetchAndStore1(P,V) __TBB_machine_fetchstore1(P,V)
+#define __TBB_FetchAndStore2(P,V) __TBB_machine_fetchstore2(P,V)
+#define __TBB_FetchAndStore4(P,V) _InterlockedExchange((long*) P , V )
+#define __TBB_FetchAndStore8(P,V) _InterlockedExchange64((__int64*) P , V )
+#define __TBB_FetchAndStoreW(P,V) _InterlockedExchange64((__int64*) P , V ) 
+
+// Not used if wordsize == 8
+#undef __TBB_Store8
+#undef __TBB_Load8
+
+#define __TBB_AtomicOR(P,V) __TBB_machine_OR(P,V)
+#define __TBB_AtomicAND(P,V) __TBB_machine_AND(P,V)
+
+extern "C" __declspec(dllimport) int __stdcall SwitchToThread( void );
+#define __TBB_Yield()  SwitchToThread()
+#define __TBB_Pause(V) __TBB_machine_pause(V)
+#define __TBB_Log2(V)    __TBB_machine_lg(V)
+
+// Use generic definitions from tbb_machine.h
+#undef __TBB_TryLockByte
+#undef __TBB_LockByte
diff --git a/dep/tbb/include/tbb/mutex.h b/dep/tbb/include/tbb/mutex.h
new file mode 100644
index 000000000..a14735f8b
--- /dev/null
+++ b/dep/tbb/include/tbb/mutex.h
@@ -0,0 +1,236 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_mutex_H
+#define __TBB_mutex_H
+
+#if _WIN32||_WIN64
+#include <windows.h>
+#if !defined(_WIN32_WINNT)
+// The following Windows API function is declared explicitly;
+// otherwise any user would have to specify /D_WIN32_WINNT=0x0400
+extern "C" BOOL WINAPI TryEnterCriticalSection( LPCRITICAL_SECTION );
+#endif
+
+#else /* if not _WIN32||_WIN64 */
+#include <pthread.h>
+namespace tbb { namespace internal {
+// Use this internal TBB function to throw an exception
+extern void handle_perror( int error_code, const char* what );
+} } //namespaces
+#endif /* _WIN32||_WIN64 */
+
+#include <new>
+#include "aligned_space.h"
+#include "tbb_stddef.h"
+#include "tbb_profiling.h"
+
+namespace tbb {
+
+//! Wrapper around the platform's native reader-writer lock.
+/** For testing purposes only.
+    @ingroup synchronization */
+class mutex {
+public:
+    //! Construct unacquired mutex.
+    mutex() {
+#if TBB_USE_ASSERT || TBB_USE_THREADING_TOOLS
+    internal_construct();
+#else
+  #if _WIN32||_WIN64
+        InitializeCriticalSection(&impl);
+  #else
+        int error_code = pthread_mutex_init(&impl,NULL);
+        if( error_code )
+            tbb::internal::handle_perror(error_code,"mutex: pthread_mutex_init failed");
+  #endif /* _WIN32||_WIN64*/
+#endif /* TBB_USE_ASSERT */
+    };
+
+    ~mutex() {
+#if TBB_USE_ASSERT
+        internal_destroy();
+#else
+  #if _WIN32||_WIN64
+        DeleteCriticalSection(&impl);
+  #else
+        pthread_mutex_destroy(&impl); 
+
+  #endif /* _WIN32||_WIN64 */
+#endif /* TBB_USE_ASSERT */
+    };
+
+    class scoped_lock;
+    friend class scoped_lock;
+
+    //! The scoped locking pattern
+    /** It helps to avoid the common problem of forgetting to release lock.
+        It also nicely provides the "node" for queuing locks. */
+    class scoped_lock : internal::no_copy {
+    public:
+        //! Construct lock that has not acquired a mutex. 
+        scoped_lock() : my_mutex(NULL) {};
+
+        //! Acquire lock on given mutex.
+        /** Upon entry, *this should not be in the "have acquired a mutex" state. */
+        scoped_lock( mutex& mutex ) {
+            acquire( mutex );
+        }
+
+        //! Release lock (if lock is held).
+        ~scoped_lock() {
+            if( my_mutex ) 
+                release();
+        }
+
+        //! Acquire lock on given mutex.
+        void acquire( mutex& mutex ) {
+#if TBB_USE_ASSERT
+            internal_acquire(mutex);
+#else
+            mutex.lock();
+            my_mutex = &mutex;
+#endif /* TBB_USE_ASSERT */
+        }
+
+        //! Try acquire lock on given mutex.
+        bool try_acquire( mutex& mutex ) {
+#if TBB_USE_ASSERT
+            return internal_try_acquire (mutex);
+#else
+            bool result = mutex.try_lock();
+            if( result )
+                my_mutex = &mutex;
+            return result;
+#endif /* TBB_USE_ASSERT */
+        }
+
+        //! Release lock
+        void release() {
+#if TBB_USE_ASSERT
+            internal_release ();
+#else
+            my_mutex->unlock();
+            my_mutex = NULL;
+#endif /* TBB_USE_ASSERT */
+        }
+
+    private:
+        //! The pointer to the current mutex to work
+        mutex* my_mutex;
+
+        //! All checks from acquire using mutex.state were moved here
+        void __TBB_EXPORTED_METHOD internal_acquire( mutex& m );
+
+        //! All checks from try_acquire using mutex.state were moved here
+        bool __TBB_EXPORTED_METHOD internal_try_acquire( mutex& m );
+
+        //! All checks from release using mutex.state were moved here
+        void __TBB_EXPORTED_METHOD internal_release();
+
+        friend class mutex;
+    };
+
+    // Mutex traits
+    static const bool is_rw_mutex = false;
+    static const bool is_recursive_mutex = false;
+    static const bool is_fair_mutex = false;
+
+    // ISO C++0x compatibility methods
+
+    //! Acquire lock
+    void lock() {
+#if TBB_USE_ASSERT
+        aligned_space<scoped_lock,1> tmp;
+        new(tmp.begin()) scoped_lock(*this);
+#else
+  #if _WIN32||_WIN64
+        EnterCriticalSection(&impl);
+  #else
+        pthread_mutex_lock(&impl);
+  #endif /* _WIN32||_WIN64 */
+#endif /* TBB_USE_ASSERT */
+    }
+
+    //! Try acquiring lock (non-blocking)
+    /** Return true if lock acquired; false otherwise. */
+    bool try_lock() {
+#if TBB_USE_ASSERT
+        aligned_space<scoped_lock,1> tmp;
+        scoped_lock& s = *tmp.begin();
+        s.my_mutex = NULL;
+        return s.internal_try_acquire(*this);
+#else
+  #if _WIN32||_WIN64
+        return TryEnterCriticalSection(&impl)!=0;
+  #else
+        return pthread_mutex_trylock(&impl)==0;
+  #endif /* _WIN32||_WIN64 */
+#endif /* TBB_USE_ASSERT */
+    }
+
+    //! Release lock
+    void unlock() {
+#if TBB_USE_ASSERT
+        aligned_space<scoped_lock,1> tmp;
+        scoped_lock& s = *tmp.begin();
+        s.my_mutex = this;
+        s.internal_release();
+#else
+  #if _WIN32||_WIN64
+        LeaveCriticalSection(&impl);
+  #else
+        pthread_mutex_unlock(&impl);
+  #endif /* _WIN32||_WIN64 */
+#endif /* TBB_USE_ASSERT */
+    }
+
+private:
+#if _WIN32||_WIN64
+    CRITICAL_SECTION impl;    
+    enum state_t {
+        INITIALIZED=0x1234,
+        DESTROYED=0x789A,
+        HELD=0x56CD
+    } state;
+#else
+    pthread_mutex_t impl;
+#endif /* _WIN32||_WIN64 */
+
+    //! All checks from mutex constructor using mutex.state were moved here
+    void __TBB_EXPORTED_METHOD internal_construct();
+
+    //! All checks from mutex destructor using mutex.state were moved here
+    void __TBB_EXPORTED_METHOD internal_destroy();
+};
+
+__TBB_DEFINE_PROFILING_SET_NAME(mutex)
+
+} // namespace tbb 
+
+#endif /* __TBB_mutex_H */
diff --git a/dep/tbb/include/tbb/null_mutex.h b/dep/tbb/include/tbb/null_mutex.h
new file mode 100644
index 000000000..6cf8dc8cf
--- /dev/null
+++ b/dep/tbb/include/tbb/null_mutex.h
@@ -0,0 +1,63 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_null_mutex_H
+#define __TBB_null_mutex_H
+
+namespace tbb {
+    
+//! A mutex which does nothing
+/** A null_mutex does no operation and simulates success.
+    @ingroup synchronization */
+class null_mutex {   
+    //! Deny assignment and copy construction 
+    null_mutex( const null_mutex& );   
+    void operator=( const null_mutex& );   
+public:   
+    //! Represents acquisition of a mutex.
+    class scoped_lock {   
+    public:   
+        scoped_lock() {}
+        scoped_lock( null_mutex& ) {}   
+        ~scoped_lock() {}
+        void acquire( null_mutex& ) {}
+        bool try_acquire( null_mutex& ) { return true; }
+        void release() {}
+    };
+  
+    null_mutex() {}
+    
+    // Mutex traits   
+    static const bool is_rw_mutex = false;   
+    static const bool is_recursive_mutex = true;
+    static const bool is_fair_mutex = true;
+};  
+
+}
+
+#endif /* __TBB_null_mutex_H */
diff --git a/dep/tbb/include/tbb/null_rw_mutex.h b/dep/tbb/include/tbb/null_rw_mutex.h
new file mode 100644
index 000000000..6be42e184
--- /dev/null
+++ b/dep/tbb/include/tbb/null_rw_mutex.h
@@ -0,0 +1,65 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_null_rw_mutex_H
+#define __TBB_null_rw_mutex_H
+
+namespace tbb {
+    
+//! A rw mutex which does nothing
+/** A null_rw_mutex is a rw mutex that does nothing and simulates successful operation.
+    @ingroup synchronization */
+class null_rw_mutex {
+    //! Deny assignment and copy construction 
+    null_rw_mutex( const null_rw_mutex& );   
+    void operator=( const null_rw_mutex& );   
+public:   
+    //! Represents acquisition of a mutex.
+    class scoped_lock {   
+    public:   
+        scoped_lock() {}
+        scoped_lock( null_rw_mutex& , bool = true ) {}
+        ~scoped_lock() {}
+        void acquire( null_rw_mutex& , bool = true ) {}
+        bool upgrade_to_writer() { return true; }
+        bool downgrade_to_reader() { return true; }
+        bool try_acquire( null_rw_mutex& , bool = true ) { return true; }
+        void release() {}
+    };
+  
+    null_rw_mutex() {}
+    
+    // Mutex traits   
+    static const bool is_rw_mutex = true;   
+    static const bool is_recursive_mutex = true;
+    static const bool is_fair_mutex = true;
+};  
+
+}
+
+#endif /* __TBB_null_rw_mutex_H */
diff --git a/dep/tbb/include/tbb/parallel_do.h b/dep/tbb/include/tbb/parallel_do.h
new file mode 100644
index 000000000..922c9684a
--- /dev/null
+++ b/dep/tbb/include/tbb/parallel_do.h
@@ -0,0 +1,508 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_parallel_do_H
+#define __TBB_parallel_do_H
+
+#include "task.h"
+#include "aligned_space.h"
+#include <iterator>
+
+namespace tbb {
+
+//! @cond INTERNAL
+namespace internal {
+    template<typename Body, typename Item> class parallel_do_feeder_impl;
+    template<typename Body> class do_group_task;
+
+    //! Strips its template type argument from 'cv' and '&' qualifiers
+    template<typename T>
+    struct strip { typedef T type; };
+    template<typename T>
+    struct strip<T&> { typedef T type; };
+    template<typename T>
+    struct strip<const T&> { typedef T type; };
+    template<typename T>
+    struct strip<volatile T&> { typedef T type; };
+    template<typename T>
+    struct strip<const volatile T&> { typedef T type; };
+    // Most of the compilers remove cv-qualifiers from non-reference function argument types. 
+    // But unfortunately there are those that don't.
+    template<typename T>
+    struct strip<const T> { typedef T type; };
+    template<typename T>
+    struct strip<volatile T> { typedef T type; };
+    template<typename T>
+    struct strip<const volatile T> { typedef T type; };
+} // namespace internal
+//! @endcond
+
+//! Class the user supplied algorithm body uses to add new tasks
+/** \param Item Work item type **/
+template<typename Item>
+class parallel_do_feeder: internal::no_copy
+{
+    parallel_do_feeder() {}
+    virtual ~parallel_do_feeder () {}
+    virtual void internal_add( const Item& item ) = 0;
+    template<typename Body_, typename Item_> friend class internal::parallel_do_feeder_impl;
+public:
+    //! Add a work item to a running parallel_do.
+    void add( const Item& item ) {internal_add(item);}
+};
+
+//! @cond INTERNAL
+namespace internal {
+    //! For internal use only.
+    /** Selects one of the two possible forms of function call member operator.
+        @ingroup algorithms **/
+    template<class Body, typename Item>
+    class parallel_do_operator_selector
+    {
+        typedef parallel_do_feeder<Item> Feeder;
+        template<typename A1, typename A2, typename CvItem >
+        static void internal_call( const Body& obj, A1& arg1, A2&, void (Body::*)(CvItem) const ) {
+            obj(arg1);
+        }
+        template<typename A1, typename A2, typename CvItem >
+        static void internal_call( const Body& obj, A1& arg1, A2& arg2, void (Body::*)(CvItem, parallel_do_feeder<Item>&) const ) {
+            obj(arg1, arg2);
+        }
+
+    public:
+        template<typename A1, typename A2 >
+        static void call( const Body& obj, A1& arg1, A2& arg2 )
+        {
+            internal_call( obj, arg1, arg2, &Body::operator() );
+        }
+    };
+
+    //! For internal use only.
+    /** Executes one iteration of a do.
+        @ingroup algorithms */
+    template<typename Body, typename Item>
+    class do_iteration_task: public task
+    {
+        typedef parallel_do_feeder_impl<Body, Item> feeder_type;
+
+        Item my_value;
+        feeder_type& my_feeder;
+
+        do_iteration_task( const Item& value, feeder_type& feeder ) : 
+            my_value(value), my_feeder(feeder)
+        {}
+
+        /*override*/ 
+        task* execute()
+        {
+            parallel_do_operator_selector<Body, Item>::call(*my_feeder.my_body, my_value, my_feeder);
+            return NULL;
+        }
+
+        template<typename Body_, typename Item_> friend class parallel_do_feeder_impl;
+    }; // class do_iteration_task
+
+    template<typename Iterator, typename Body, typename Item>
+    class do_iteration_task_iter: public task
+    {
+        typedef parallel_do_feeder_impl<Body, Item> feeder_type;
+
+        Iterator my_iter;
+        feeder_type& my_feeder;
+
+        do_iteration_task_iter( const Iterator& iter, feeder_type& feeder ) : 
+            my_iter(iter), my_feeder(feeder)
+        {}
+
+        /*override*/ 
+        task* execute()
+        {
+            parallel_do_operator_selector<Body, Item>::call(*my_feeder.my_body, *my_iter, my_feeder);
+            return NULL;
+        }
+
+        template<typename Iterator_, typename Body_, typename Item_> friend class do_group_task_forward;    
+        template<typename Body_, typename Item_> friend class do_group_task_input;    
+        template<typename Iterator_, typename Body_, typename Item_> friend class do_task_iter;    
+    }; // class do_iteration_task_iter
+
+    //! For internal use only.
+    /** Implements new task adding procedure.
+        @ingroup algorithms **/
+    template<class Body, typename Item>
+    class parallel_do_feeder_impl : public parallel_do_feeder<Item>
+    {
+        /*override*/ 
+        void internal_add( const Item& item )
+        {
+            typedef do_iteration_task<Body, Item> iteration_type;
+
+            iteration_type& t = *new (task::self().allocate_additional_child_of(*my_barrier)) iteration_type(item, *this);
+
+            t.spawn( t );
+        }
+    public:
+        const Body* my_body;
+        empty_task* my_barrier;
+
+        parallel_do_feeder_impl()
+        {
+            my_barrier = new( task::allocate_root() ) empty_task();
+            __TBB_ASSERT(my_barrier, "root task allocation failed");
+        }
+
+#if __TBB_EXCEPTIONS
+        parallel_do_feeder_impl(tbb::task_group_context &context)
+        {
+            my_barrier = new( task::allocate_root(context) ) empty_task();
+            __TBB_ASSERT(my_barrier, "root task allocation failed");
+        }
+#endif
+
+        ~parallel_do_feeder_impl()
+        {
+            my_barrier->destroy(*my_barrier);
+        }
+    }; // class parallel_do_feeder_impl
+
+
+    //! For internal use only
+    /** Unpacks a block of iterations.
+        @ingroup algorithms */
+    
+    template<typename Iterator, typename Body, typename Item>
+    class do_group_task_forward: public task
+    {
+        static const size_t max_arg_size = 4;         
+
+        typedef parallel_do_feeder_impl<Body, Item> feeder_type;
+
+        feeder_type& my_feeder;
+        Iterator my_first;
+        size_t my_size;
+        
+        do_group_task_forward( Iterator first, size_t size, feeder_type& feeder ) 
+            : my_feeder(feeder), my_first(first), my_size(size)
+        {}
+
+        /*override*/ task* execute()
+        {
+            typedef do_iteration_task_iter<Iterator, Body, Item> iteration_type;
+            __TBB_ASSERT( my_size>0, NULL );
+            task_list list;
+            task* t; 
+            size_t k=0; 
+            for(;;) {
+                t = new( allocate_child() ) iteration_type( my_first, my_feeder );
+                ++my_first;
+                if( ++k==my_size ) break;
+                list.push_back(*t);
+            }
+            set_ref_count(int(k+1));
+            spawn(list);
+            spawn_and_wait_for_all(*t);
+            return NULL;
+        }
+
+        template<typename Iterator_, typename Body_, typename _Item> friend class do_task_iter;
+    }; // class do_group_task_forward
+
+    template<typename Body, typename Item>
+    class do_group_task_input: public task
+    {
+        static const size_t max_arg_size = 4;         
+        
+        typedef parallel_do_feeder_impl<Body, Item> feeder_type;
+
+        feeder_type& my_feeder;
+        size_t my_size;
+        aligned_space<Item, max_arg_size> my_arg;
+
+        do_group_task_input( feeder_type& feeder ) 
+            : my_feeder(feeder), my_size(0)
+        {}
+
+        /*override*/ task* execute()
+        {
+            typedef do_iteration_task_iter<Item*, Body, Item> iteration_type;
+            __TBB_ASSERT( my_size>0, NULL );
+            task_list list;
+            task* t; 
+            size_t k=0; 
+            for(;;) {
+                t = new( allocate_child() ) iteration_type( my_arg.begin() + k, my_feeder );
+                if( ++k==my_size ) break;
+                list.push_back(*t);
+            }
+            set_ref_count(int(k+1));
+            spawn(list);
+            spawn_and_wait_for_all(*t);
+            return NULL;
+        }
+
+        ~do_group_task_input(){
+            for( size_t k=0; k<my_size; ++k)
+                (my_arg.begin() + k)->~Item();
+        }
+
+        template<typename Iterator_, typename Body_, typename Item_> friend class do_task_iter;
+    }; // class do_group_task_input
+    
+    //! For internal use only.
+    /** Gets block of iterations and packages them into a do_group_task.
+        @ingroup algorithms */
+    template<typename Iterator, typename Body, typename Item>
+    class do_task_iter: public task
+    {
+        typedef parallel_do_feeder_impl<Body, Item> feeder_type;
+
+    public:
+        do_task_iter( Iterator first, Iterator last , feeder_type& feeder ) : 
+            my_first(first), my_last(last), my_feeder(feeder)
+        {}
+
+    private:
+        Iterator my_first;
+        Iterator my_last;
+        feeder_type& my_feeder;
+
+        /* Do not merge run(xxx) and run_xxx() methods. They are separated in order
+            to make sure that compilers will eliminate unused argument of type xxx
+            (that is will not put it on stack). The sole purpose of this argument 
+            is overload resolution.
+            
+            An alternative could be using template functions, but explicit specialization 
+            of member function templates is not supported for non specialized class 
+            templates. Besides template functions would always fall back to the least 
+            efficient variant (the one for input iterators) in case of iterators having 
+            custom tags derived from basic ones. */
+        /*override*/ task* execute()
+        {
+            typedef typename std::iterator_traits<Iterator>::iterator_category iterator_tag;
+            return run( (iterator_tag*)NULL );
+        }
+
+        /** This is the most restricted variant that operates on input iterators or
+            iterators with unknown tags (tags not derived from the standard ones). **/
+        inline task* run( void* ) { return run_for_input_iterator(); }
+        
+        task* run_for_input_iterator() {
+            typedef do_group_task_input<Body, Item> block_type;
+
+            block_type& t = *new( allocate_additional_child_of(*my_feeder.my_barrier) ) block_type(my_feeder);
+            size_t k=0; 
+            while( !(my_first == my_last) ) {
+                new (t.my_arg.begin() + k) Item(*my_first);
+                ++my_first;
+                if( ++k==block_type::max_arg_size ) {
+                    if ( !(my_first == my_last) )
+                        recycle_to_reexecute();
+                    break;
+                }
+            }
+            if( k==0 ) {
+                destroy(t);
+                return NULL;
+            } else {
+                t.my_size = k;
+                return &t;
+            }
+        }
+
+        inline task* run( std::forward_iterator_tag* ) { return run_for_forward_iterator(); }
+
+        task* run_for_forward_iterator() {
+            typedef do_group_task_forward<Iterator, Body, Item> block_type;
+
+            Iterator first = my_first;
+            size_t k=0; 
+            while( !(my_first==my_last) ) {
+                ++my_first;
+                if( ++k==block_type::max_arg_size ) {
+                    if ( !(my_first==my_last) )
+                        recycle_to_reexecute();
+                    break;
+                }
+            }
+            return k==0 ? NULL : new( allocate_additional_child_of(*my_feeder.my_barrier) ) block_type(first, k, my_feeder);
+        }
+        
+        inline task* run( std::random_access_iterator_tag* ) { return run_for_random_access_iterator(); }
+
+        task* run_for_random_access_iterator() {
+            typedef do_group_task_forward<Iterator, Body, Item> block_type;
+            typedef do_iteration_task_iter<Iterator, Body, Item> iteration_type;
+            
+            size_t k = static_cast<size_t>(my_last-my_first); 
+            if( k > block_type::max_arg_size ) {
+                Iterator middle = my_first + k/2;
+
+                empty_task& c = *new( allocate_continuation() ) empty_task;
+                do_task_iter& b = *new( c.allocate_child() ) do_task_iter(middle, my_last, my_feeder);
+                recycle_as_child_of(c);
+
+                my_last = middle;
+                c.set_ref_count(2);
+                c.spawn(b);
+                return this;
+            }else if( k != 0 ) {
+                task_list list;
+                task* t; 
+                size_t k1=0; 
+                for(;;) {
+                    t = new( allocate_child() ) iteration_type(my_first, my_feeder);
+                    ++my_first;
+                    if( ++k1==k ) break;
+                    list.push_back(*t);
+                }
+                set_ref_count(int(k+1));
+                spawn(list);
+                spawn_and_wait_for_all(*t);
+            }
+            return NULL;
+        }
+    }; // class do_task_iter
+
+    //! For internal use only.
+    /** Implements parallel iteration over a range.
+        @ingroup algorithms */
+    template<typename Iterator, typename Body, typename Item> 
+    void run_parallel_do( Iterator first, Iterator last, const Body& body
+#if __TBB_EXCEPTIONS
+        , task_group_context& context
+#endif
+        )
+    {
+        typedef do_task_iter<Iterator, Body, Item> root_iteration_task;
+#if __TBB_EXCEPTIONS
+        parallel_do_feeder_impl<Body, Item> feeder(context);
+#else
+        parallel_do_feeder_impl<Body, Item> feeder;
+#endif
+        feeder.my_body = &body;
+
+        root_iteration_task &t = *new( feeder.my_barrier->allocate_child() ) root_iteration_task(first, last, feeder);
+
+        feeder.my_barrier->set_ref_count(2);
+        feeder.my_barrier->spawn_and_wait_for_all(t);
+    }
+
+    //! For internal use only.
+    /** Detects types of Body's operator function arguments.
+        @ingroup algorithms **/
+    template<typename Iterator, typename Body, typename Item> 
+    void select_parallel_do( Iterator first, Iterator last, const Body& body, void (Body::*)(Item) const
+#if __TBB_EXCEPTIONS
+        , task_group_context& context 
+#endif // __TBB_EXCEPTIONS 
+        )
+    {
+        run_parallel_do<Iterator, Body, typename strip<Item>::type>( first, last, body
+#if __TBB_EXCEPTIONS
+            , context
+#endif // __TBB_EXCEPTIONS 
+            );
+    }
+
+    //! For internal use only.
+    /** Detects types of Body's operator function arguments.
+        @ingroup algorithms **/
+    template<typename Iterator, typename Body, typename Item, typename _Item> 
+    void select_parallel_do( Iterator first, Iterator last, const Body& body, void (Body::*)(Item, parallel_do_feeder<_Item>&) const
+#if __TBB_EXCEPTIONS
+        , task_group_context& context 
+#endif // __TBB_EXCEPTIONS
+        )
+    {
+        run_parallel_do<Iterator, Body, typename strip<Item>::type>( first, last, body
+#if __TBB_EXCEPTIONS
+            , context
+#endif // __TBB_EXCEPTIONS
+            );
+    }
+
+} // namespace internal
+//! @endcond
+
+
+/** \page parallel_do_body_req Requirements on parallel_do body
+    Class \c Body implementing the concept of parallel_do body must define:
+    - \code 
+        B::operator()( 
+                cv_item_type item,
+                parallel_do_feeder<item_type>& feeder
+        ) const
+        
+        OR
+
+        B::operator()( cv_item_type& item ) const
+      \endcode                                                      Process item. 
+                                                                    May be invoked concurrently  for the same \c this but different \c item.
+                                                        
+    - \code item_type( const item_type& ) \endcode 
+                                                                    Copy a work item.
+    - \code ~item_type() \endcode                            Destroy a work item
+**/
+
+/** \name parallel_do
+    See also requirements on \ref parallel_do_body_req "parallel_do Body". **/
+//@{
+//! Parallel iteration over a range, with optional addition of more work.
+/** @ingroup algorithms */
+template<typename Iterator, typename Body> 
+void parallel_do( Iterator first, Iterator last, const Body& body )
+{
+    if ( first == last )
+        return;
+#if __TBB_EXCEPTIONS
+    task_group_context context;
+#endif // __TBB_EXCEPTIONS
+    internal::select_parallel_do( first, last, body, &Body::operator()
+#if __TBB_EXCEPTIONS
+        , context
+#endif // __TBB_EXCEPTIONS
+        );
+}
+
+#if __TBB_EXCEPTIONS
+//! Parallel iteration over a range, with optional addition of more work and user-supplied context
+/** @ingroup algorithms */
+template<typename Iterator, typename Body> 
+void parallel_do( Iterator first, Iterator last, const Body& body, task_group_context& context  )
+{
+    if ( first == last )
+        return;
+    internal::select_parallel_do( first, last, body, &Body::operator(), context );
+}
+#endif // __TBB_EXCEPTIONS
+
+//@}
+
+} // namespace 
+
+#endif /* __TBB_parallel_do_H */
diff --git a/dep/tbb/include/tbb/parallel_for.h b/dep/tbb/include/tbb/parallel_for.h
new file mode 100644
index 000000000..8d103e027
--- /dev/null
+++ b/dep/tbb/include/tbb/parallel_for.h
@@ -0,0 +1,242 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_parallel_for_H
+#define __TBB_parallel_for_H
+
+#include "task.h"
+#include "partitioner.h"
+#include "blocked_range.h"
+#include <new>
+#include <stdexcept> // std::invalid_argument
+#include <string> // std::invalid_argument text
+
+namespace tbb {
+
+//! @cond INTERNAL
+namespace internal {
+
+    //! Task type used in parallel_for
+    /** @ingroup algorithms */
+    template<typename Range, typename Body, typename Partitioner>
+    class start_for: public task {
+        Range my_range;
+        const Body my_body;
+        typename Partitioner::partition_type my_partition;
+        /*override*/ task* execute();
+
+        //! Constructor for root task.
+        start_for( const Range& range, const Body& body, Partitioner& partitioner ) :
+            my_range(range),    
+            my_body(body),
+            my_partition(partitioner)
+        {
+        }
+        //! Splitting constructor used to generate children.
+        /** this becomes left child.  Newly constructed object is right child. */
+        start_for( start_for& parent, split ) :
+            my_range(parent.my_range,split()),    
+            my_body(parent.my_body),
+            my_partition(parent.my_partition,split())
+        {
+            my_partition.set_affinity(*this);
+        }
+        //! Update affinity info, if any.
+        /*override*/ void note_affinity( affinity_id id ) {
+            my_partition.note_affinity( id );
+        }
+    public:
+        static void run(  const Range& range, const Body& body, const Partitioner& partitioner ) {
+            if( !range.empty() ) {
+#if !__TBB_EXCEPTIONS || TBB_JOIN_OUTER_TASK_GROUP
+                start_for& a = *new(task::allocate_root()) start_for(range,body,const_cast<Partitioner&>(partitioner));
+#else
+                // Bound context prevents exceptions from body to affect nesting or sibling algorithms,
+                // and allows users to handle exceptions safely by wrapping parallel_for in the try-block.
+                task_group_context context;
+                start_for& a = *new(task::allocate_root(context)) start_for(range,body,const_cast<Partitioner&>(partitioner));
+#endif /* __TBB_EXCEPTIONS && !TBB_JOIN_OUTER_TASK_GROUP */
+                task::spawn_root_and_wait(a);
+            }
+        }
+#if __TBB_EXCEPTIONS
+        static void run(  const Range& range, const Body& body, const Partitioner& partitioner, task_group_context& context ) {
+            if( !range.empty() ) {
+                start_for& a = *new(task::allocate_root(context)) start_for(range,body,const_cast<Partitioner&>(partitioner));
+                task::spawn_root_and_wait(a);
+            }
+        }
+#endif /* __TBB_EXCEPTIONS */
+    };
+
+    template<typename Range, typename Body, typename Partitioner>
+    task* start_for<Range,Body,Partitioner>::execute() {
+        if( !my_range.is_divisible() || my_partition.should_execute_range(*this) ) {
+            my_body( my_range );
+            return my_partition.continue_after_execute_range(*this); 
+        } else {
+            empty_task& c = *new( this->allocate_continuation() ) empty_task;
+            recycle_as_child_of(c);
+            c.set_ref_count(2);
+            bool delay = my_partition.decide_whether_to_delay();
+            start_for& b = *new( c.allocate_child() ) start_for(*this,split());
+            my_partition.spawn_or_delay(delay,*this,b);
+            return this;
+        }
+    } 
+} // namespace internal
+//! @endcond
+
+
+// Requirements on Range concept are documented in blocked_range.h
+
+/** \page parallel_for_body_req Requirements on parallel_for body
+    Class \c Body implementing the concept of parallel_for body must define:
+    - \code Body::Body( const Body& ); \endcode                 Copy constructor
+    - \code Body::~Body(); \endcode                             Destructor
+    - \code void Body::operator()( Range& r ) const; \endcode   Function call operator applying the body to range \c r.
+**/
+
+/** \name parallel_for
+    See also requirements on \ref range_req "Range" and \ref parallel_for_body_req "parallel_for Body". **/
+//@{
+
+//! Parallel iteration over range with default partitioner. 
+/** @ingroup algorithms **/
+template<typename Range, typename Body>
+void parallel_for( const Range& range, const Body& body ) {
+    internal::start_for<Range,Body,__TBB_DEFAULT_PARTITIONER>::run(range,body,__TBB_DEFAULT_PARTITIONER());
+}
+
+//! Parallel iteration over range with simple partitioner.
+/** @ingroup algorithms **/
+template<typename Range, typename Body>
+void parallel_for( const Range& range, const Body& body, const simple_partitioner& partitioner ) {
+    internal::start_for<Range,Body,simple_partitioner>::run(range,body,partitioner);
+}
+
+//! Parallel iteration over range with auto_partitioner.
+/** @ingroup algorithms **/
+template<typename Range, typename Body>
+void parallel_for( const Range& range, const Body& body, const auto_partitioner& partitioner ) {
+    internal::start_for<Range,Body,auto_partitioner>::run(range,body,partitioner);
+}
+
+//! Parallel iteration over range with affinity_partitioner.
+/** @ingroup algorithms **/
+template<typename Range, typename Body>
+void parallel_for( const Range& range, const Body& body, affinity_partitioner& partitioner ) {
+    internal::start_for<Range,Body,affinity_partitioner>::run(range,body,partitioner);
+}
+
+#if __TBB_EXCEPTIONS
+//! Parallel iteration over range with simple partitioner and user-supplied context.
+/** @ingroup algorithms **/
+template<typename Range, typename Body>
+void parallel_for( const Range& range, const Body& body, const simple_partitioner& partitioner, task_group_context& context ) {
+    internal::start_for<Range,Body,simple_partitioner>::run(range, body, partitioner, context);
+}
+
+//! Parallel iteration over range with auto_partitioner and user-supplied context.
+/** @ingroup algorithms **/
+template<typename Range, typename Body>
+void parallel_for( const Range& range, const Body& body, const auto_partitioner& partitioner, task_group_context& context ) {
+    internal::start_for<Range,Body,auto_partitioner>::run(range, body, partitioner, context);
+}
+
+//! Parallel iteration over range with affinity_partitioner and user-supplied context.
+/** @ingroup algorithms **/
+template<typename Range, typename Body>
+void parallel_for( const Range& range, const Body& body, affinity_partitioner& partitioner, task_group_context& context ) {
+    internal::start_for<Range,Body,affinity_partitioner>::run(range,body,partitioner, context);
+}
+#endif /* __TBB_EXCEPTIONS */
+//@}
+
+//! @cond INTERNAL
+namespace internal {
+    //! Calls the function with values from range [begin, end) with a step provided
+template<typename Function, typename Index>
+class parallel_for_body : internal::no_assign {
+    const Function &my_func;
+    const Index my_begin;
+    const Index my_step; 
+public:
+    parallel_for_body( const Function& _func, Index& _begin, Index& _step) 
+        : my_func(_func), my_begin(_begin), my_step(_step) {}
+    
+    void operator()( tbb::blocked_range<Index>& r ) const {
+        for( Index i = r.begin(),  k = my_begin + i * my_step; i < r.end(); i++, k = k + my_step)
+            my_func( k );
+    }
+};
+} // namespace internal
+//! @endcond
+
+namespace strict_ppl {
+
+//@{
+//! Parallel iteration over a range of integers with a step provided
+template <typename Index, typename Function>
+void parallel_for(Index first, Index last, Index step, const Function& f) {
+    tbb::task_group_context context;
+    parallel_for(first, last, step, f, context);
+}
+template <typename Index, typename Function>
+void parallel_for(Index first, Index last, Index step, const Function& f, tbb::task_group_context &context) {
+    if (step <= 0 ) throw std::invalid_argument("step should be positive");
+
+    if (last > first) {
+        Index end = (last - first) / step;
+        if (first + end * step < last) end++;
+        tbb::blocked_range<Index> range(static_cast<Index>(0), end);
+        internal::parallel_for_body<Function, Index> body(f, first, step);
+        tbb::parallel_for(range, body, tbb::auto_partitioner(), context);
+    }
+}
+//! Parallel iteration over a range of integers with a default step value
+template <typename Index, typename Function>
+void parallel_for(Index first, Index last, const Function& f) {
+    tbb::task_group_context context;
+    parallel_for(first, last, static_cast<Index>(1), f, context);
+}
+template <typename Index, typename Function>
+void parallel_for(Index first, Index last, const Function& f, tbb::task_group_context &context) {
+    parallel_for(first, last, static_cast<Index>(1), f, context);
+}
+
+//@}
+
+} // namespace strict_ppl
+
+using strict_ppl::parallel_for;
+
+} // namespace tbb
+
+#endif /* __TBB_parallel_for_H */
+
diff --git a/dep/tbb/include/tbb/parallel_for_each.h b/dep/tbb/include/tbb/parallel_for_each.h
new file mode 100644
index 000000000..fa67b6cbc
--- /dev/null
+++ b/dep/tbb/include/tbb/parallel_for_each.h
@@ -0,0 +1,79 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_parallel_for_each_H
+#define __TBB_parallel_for_each_H
+
+#include "parallel_do.h"
+
+namespace tbb {
+
+//! @cond INTERNAL
+namespace internal {
+    // The class calls user function in operator()
+    template <typename Function, typename Iterator>
+    class parallel_for_each_body : internal::no_assign {
+        Function &my_func;
+    public:
+        parallel_for_each_body(Function &_func) : my_func(_func) {}
+        parallel_for_each_body(const parallel_for_each_body<Function, Iterator> &_caller) : my_func(_caller.my_func) {}
+
+        void operator() ( typename std::iterator_traits<Iterator>::value_type value ) const {
+            my_func(value);
+        }
+    };
+} // namespace internal
+//! @endcond
+
+/** \name parallel_for_each
+    **/
+//@{
+//! Calls function f for all items from [first, last) interval using user-supplied context
+/** @ingroup algorithms */
+template<typename InputIterator, typename Function>
+Function parallel_for_each(InputIterator first, InputIterator last, Function f, task_group_context &context) {
+    internal::parallel_for_each_body<Function, InputIterator> body(f);
+
+    tbb::parallel_do (first, last, body, context);
+    return f;
+}
+
+//! Uses default context
+template<typename InputIterator, typename Function>
+Function parallel_for_each(InputIterator first, InputIterator last, Function f) {
+    internal::parallel_for_each_body<Function, InputIterator> body(f);
+
+    tbb::parallel_do (first, last, body);
+    return f;
+}
+
+//@}
+
+} // namespace
+
+#endif /* __TBB_parallel_for_each_H */
diff --git a/dep/tbb/include/tbb/parallel_invoke.h b/dep/tbb/include/tbb/parallel_invoke.h
new file mode 100644
index 000000000..fb425c676
--- /dev/null
+++ b/dep/tbb/include/tbb/parallel_invoke.h
@@ -0,0 +1,333 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_parallel_invoke_H
+#define __TBB_parallel_invoke_H
+
+#include "task.h"
+
+namespace tbb {
+
+//! @cond INTERNAL
+namespace internal {
+    // Simple task object, executing user method
+    template<typename function>
+    class function_invoker : public task{
+    public:
+        function_invoker(function& _function) : my_function(_function) {}
+    private:
+        function &my_function;
+        /*override*/
+        task* execute()
+        {
+            my_function();
+            return NULL;
+        }
+    };
+
+    // The class spawns two or three child tasks
+    template <size_t N, typename function1, typename function2, typename function3>
+    class spawner : public task {
+    private:
+        function1& my_func1;
+        function2& my_func2;
+        function3& my_func3;
+        bool is_recycled;
+
+        task* execute (){
+            if(is_recycled){
+                return NULL;
+            }else{
+                __TBB_ASSERT(N==2 || N==3, "Number of arguments passed to spawner is wrong");
+                set_ref_count(N);
+                recycle_as_safe_continuation();
+                internal::function_invoker<function2>* invoker2 = new (allocate_child()) internal::function_invoker<function2>(my_func2);
+                __TBB_ASSERT(invoker2, "Child task allocation failed");
+                spawn(*invoker2);
+                size_t n = N; // To prevent compiler warnings
+                if (n>2) {
+                    internal::function_invoker<function3>* invoker3 = new (allocate_child()) internal::function_invoker<function3>(my_func3);
+                    __TBB_ASSERT(invoker3, "Child task allocation failed");
+                    spawn(*invoker3);
+                }
+                my_func1();
+                is_recycled = true;
+                return NULL;
+            }
+        } // execute
+
+    public:
+        spawner(function1& _func1, function2& _func2, function3& _func3) : my_func1(_func1), my_func2(_func2), my_func3(_func3), is_recycled(false) {}
+    };
+
+    // Creates and spawns child tasks
+    class parallel_invoke_helper : public empty_task {
+    public:
+        // Dummy functor class
+        class parallel_invoke_noop {
+        public:
+            void operator() () const {}
+        };
+        // Creates a helper object with user-defined number of children expected
+        parallel_invoke_helper(int number_of_children)
+        {
+            set_ref_count(number_of_children + 1);
+        }
+        // Adds child task and spawns it
+        template <typename function>
+        void add_child (function &_func)
+        {
+            internal::function_invoker<function>* invoker = new (allocate_child()) internal::function_invoker<function>(_func);
+            __TBB_ASSERT(invoker, "Child task allocation failed");
+            spawn(*invoker);
+        }
+
+        // Adds a task with multiple child tasks and spawns it
+        // two arguments
+        template <typename function1, typename function2>
+        void add_children (function1& _func1, function2& _func2)
+        {
+            // The third argument is dummy, it is ignored actually.
+            parallel_invoke_noop noop;
+            internal::spawner<2, function1, function2, parallel_invoke_noop>& sub_root = *new(allocate_child())internal::spawner<2, function1, function2, parallel_invoke_noop>(_func1, _func2, noop);
+            spawn(sub_root);
+        }
+        // three arguments
+        template <typename function1, typename function2, typename function3>
+        void add_children (function1& _func1, function2& _func2, function3& _func3)
+        {
+            internal::spawner<3, function1, function2, function3>& sub_root = *new(allocate_child())internal::spawner<3, function1, function2, function3>(_func1, _func2, _func3);
+            spawn(sub_root);
+        }
+
+        // Waits for all child tasks
+        template <typename F0>
+        void run_and_finish(F0& f0)
+        {
+            internal::function_invoker<F0>* invoker = new (allocate_child()) internal::function_invoker<F0>(f0);
+            __TBB_ASSERT(invoker, "Child task allocation failed");
+            spawn_and_wait_for_all(*invoker);
+        }
+    };
+    // The class destroys root if exception occured as well as in normal case
+    class parallel_invoke_cleaner: internal::no_copy { 
+    public:
+        parallel_invoke_cleaner(int number_of_children, tbb::task_group_context& context) : root(*new(task::allocate_root(context)) internal::parallel_invoke_helper(number_of_children))
+        {}
+        ~parallel_invoke_cleaner(){
+            root.destroy(root);
+        }
+        internal::parallel_invoke_helper& root;
+    };
+} // namespace internal
+//! @endcond
+
+/** \name parallel_invoke
+    **/
+//@{
+//! Executes a list of tasks in parallel and waits for all tasks to complete.
+/** @ingroup algorithms */
+
+// parallel_invoke with user-defined context
+// two arguments
+template<typename F0, typename F1 >
+void parallel_invoke(F0 f0, F1 f1, tbb::task_group_context& context) {
+    internal::parallel_invoke_cleaner cleaner(2, context);
+    internal::parallel_invoke_helper& root = cleaner.root;
+
+    root.add_child(f1);
+
+    root.run_and_finish(f0);
+}
+
+// three arguments
+template<typename F0, typename F1, typename F2 >
+void parallel_invoke(F0 f0, F1 f1, F2 f2, tbb::task_group_context& context) {
+    internal::parallel_invoke_cleaner cleaner(3, context);
+    internal::parallel_invoke_helper& root = cleaner.root;
+
+    root.add_child(f2);
+    root.add_child(f1);
+
+    root.run_and_finish(f0);
+}
+
+// four arguments
+template<typename F0, typename F1, typename F2, typename F3>
+void parallel_invoke(F0 f0, F1 f1, F2 f2, F3 f3, tbb::task_group_context& context) {
+    internal::parallel_invoke_cleaner cleaner(4, context);
+    internal::parallel_invoke_helper& root = cleaner.root;
+
+    root.add_child(f3);
+    root.add_child(f2);
+    root.add_child(f1);
+
+    root.run_and_finish(f0);
+}
+
+// five arguments
+template<typename F0, typename F1, typename F2, typename F3, typename F4 >
+void parallel_invoke(F0 f0, F1 f1, F2 f2, F3 f3, F4 f4, tbb::task_group_context& context) {
+    internal::parallel_invoke_cleaner cleaner(3, context);
+    internal::parallel_invoke_helper& root = cleaner.root;
+
+    root.add_children(f4, f3);
+    root.add_children(f2, f1);
+
+    root.run_and_finish(f0);
+}
+
+// six arguments
+template<typename F0, typename F1, typename F2, typename F3, typename F4, typename F5 >
+void parallel_invoke(F0 f0, F1 f1, F2 f2, F3 f3, F4 f4, F5 f5, tbb::task_group_context& context) {
+    internal::parallel_invoke_cleaner cleaner(3, context);
+    internal::parallel_invoke_helper& root = cleaner.root;
+
+    root.add_children(f5, f4, f3);
+    root.add_children(f2, f1);
+
+    root.run_and_finish(f0);
+}
+
+// seven arguments
+template<typename F0, typename F1, typename F2, typename F3, typename F4, typename F5, typename F6 >
+void parallel_invoke(F0 f0, F1 f1, F2 f2, F3 f3, F4 f4, F5 f5, F6 f6, tbb::task_group_context& context) {
+    internal::parallel_invoke_cleaner cleaner(3, context);
+    internal::parallel_invoke_helper& root = cleaner.root;
+
+    root.add_children(f6, f5, f4);
+    root.add_children(f3, f2, f1);
+
+    root.run_and_finish(f0);
+}
+
+// eight arguments
+template<typename F0, typename F1, typename F2, typename F3, typename F4, typename F5, typename F6,
+    typename F7>
+void parallel_invoke(F0 f0, F1 f1, F2 f2, F3 f3, F4 f4, F5 f5, F6 f6, F7 f7, tbb::task_group_context& context) {
+    internal::parallel_invoke_cleaner cleaner(4, context);
+    internal::parallel_invoke_helper& root = cleaner.root;
+
+    root.add_children(f7, f6, f5);
+    root.add_children(f4, f3);
+    root.add_children(f2, f1);
+
+    root.run_and_finish(f0);
+}
+
+// nine arguments
+template<typename F0, typename F1, typename F2, typename F3, typename F4, typename F5, typename F6,
+        typename F7, typename F8>
+void parallel_invoke(F0 f0, F1 f1, F2 f2, F3 f3, F4 f4, F5 f5, F6 f6, F7 f7, F8 f8, tbb::task_group_context& context) {
+    internal::parallel_invoke_cleaner cleaner(4, context);
+    internal::parallel_invoke_helper& root = cleaner.root;
+
+    root.add_children(f8, f7, f6);
+    root.add_children(f5, f4, f3);
+    root.add_children(f2, f1);
+
+    root.run_and_finish(f0);
+}
+
+// ten arguments
+template<typename F0, typename F1, typename F2, typename F3, typename F4, typename F5, typename F6,
+        typename F7, typename F8, typename F9>
+void parallel_invoke(F0 f0, F1 f1, F2 f2, F3 f3, F4 f4, F5 f5, F6 f6, F7 f7, F8 f8, F9 f9, tbb::task_group_context& context) {
+    internal::parallel_invoke_cleaner cleaner(4, context);
+    internal::parallel_invoke_helper& root = cleaner.root;
+
+    root.add_children(f9, f8, f7);
+    root.add_children(f6, f5, f4);
+    root.add_children(f3, f2, f1);
+
+    root.run_and_finish(f0);
+}
+
+// two arguments
+template<typename F0, typename F1>
+void parallel_invoke(F0 f0, F1 f1) {
+    task_group_context context;
+    parallel_invoke<F0, F1>(f0, f1, context);
+}
+// three arguments
+template<typename F0, typename F1, typename F2>
+void parallel_invoke(F0 f0, F1 f1, F2 f2) {
+    task_group_context context;
+    parallel_invoke<F0, F1, F2>(f0, f1, f2, context);
+}
+// four arguments
+template<typename F0, typename F1, typename F2, typename F3 >
+void parallel_invoke(F0 f0, F1 f1, F2 f2, F3 f3) {
+    task_group_context context;
+    parallel_invoke<F0, F1, F2, F3>(f0, f1, f2, f3, context);
+}
+// five arguments
+template<typename F0, typename F1, typename F2, typename F3, typename F4>
+void parallel_invoke(F0 f0, F1 f1, F2 f2, F3 f3, F4 f4) {
+    task_group_context context;
+    parallel_invoke<F0, F1, F2, F3, F4>(f0, f1, f2, f3, f4, context);
+}
+// six arguments
+template<typename F0, typename F1, typename F2, typename F3, typename F4, typename F5>
+void parallel_invoke(F0 f0, F1 f1, F2 f2, F3 f3, F4 f4, F5 f5) {
+    task_group_context context;
+    parallel_invoke<F0, F1, F2, F3, F4, F5>(f0, f1, f2, f3, f4, f5, context);
+}
+// seven arguments
+template<typename F0, typename F1, typename F2, typename F3, typename F4, typename F5, typename F6>
+void parallel_invoke(F0 f0, F1 f1, F2 f2, F3 f3, F4 f4, F5 f5, F6 f6) {
+    task_group_context context;
+    parallel_invoke<F0, F1, F2, F3, F4, F5, F6>(f0, f1, f2, f3, f4, f5, f6, context);
+}
+// eigth arguments
+template<typename F0, typename F1, typename F2, typename F3, typename F4, typename F5, typename F6,
+        typename F7>
+void parallel_invoke(F0 f0, F1 f1, F2 f2, F3 f3, F4 f4, F5 f5, F6 f6, F7 f7) {
+    task_group_context context;
+    parallel_invoke<F0, F1, F2, F3, F4, F5, F6, F7>(f0, f1, f2, f3, f4, f5, f6, f7, context);
+}
+// nine arguments
+template<typename F0, typename F1, typename F2, typename F3, typename F4, typename F5, typename F6,
+        typename F7, typename F8>
+void parallel_invoke(F0 f0, F1 f1, F2 f2, F3 f3, F4 f4, F5 f5, F6 f6, F7 f7, F8 f8) {
+    task_group_context context;
+    parallel_invoke<F0, F1, F2, F3, F4, F5, F6, F7, F8>(f0, f1, f2, f3, f4, f5, f6, f7, f8, context);
+}
+// ten arguments
+template<typename F0, typename F1, typename F2, typename F3, typename F4, typename F5, typename F6,
+        typename F7, typename F8, typename F9>
+void parallel_invoke(F0 f0, F1 f1, F2 f2, F3 f3, F4 f4, F5 f5, F6 f6, F7 f7, F8 f8, F9 f9) {
+    task_group_context context;
+    parallel_invoke<F0, F1, F2, F3, F4, F5, F6, F7, F8, F9>(f0, f1, f2, f3, f4, f5, f6, f7, f8, f9, context);
+}
+
+//@}
+
+} // namespace
+
+#endif /* __TBB_parallel_invoke_H */
diff --git a/dep/tbb/include/tbb/parallel_reduce.h b/dep/tbb/include/tbb/parallel_reduce.h
new file mode 100644
index 000000000..030017394
--- /dev/null
+++ b/dep/tbb/include/tbb/parallel_reduce.h
@@ -0,0 +1,387 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_parallel_reduce_H
+#define __TBB_parallel_reduce_H
+
+#include "task.h"
+#include "aligned_space.h"
+#include "partitioner.h"
+#include <new>
+
+namespace tbb {
+
+//! @cond INTERNAL
+namespace internal {
+
+    //! ITT instrumented routine that stores src into location pointed to by dst.
+    void __TBB_EXPORTED_FUNC itt_store_pointer_with_release_v3( void* dst, void* src );
+
+    //! ITT instrumented routine that loads pointer from location pointed to by src.
+    void* __TBB_EXPORTED_FUNC itt_load_pointer_with_acquire_v3( const void* src );
+
+    template<typename T> inline void parallel_reduce_store_body( T*& dst, T* src ) {
+#if TBB_USE_THREADING_TOOLS
+        itt_store_pointer_with_release_v3(&dst,src);
+#else
+        __TBB_store_with_release(dst,src);
+#endif /* TBB_USE_THREADING_TOOLS */
+    }
+
+    template<typename T> inline T* parallel_reduce_load_body( T*& src ) {
+#if TBB_USE_THREADING_TOOLS
+        return static_cast<T*>(itt_load_pointer_with_acquire_v3(&src));
+#else
+        return __TBB_load_with_acquire(src);
+#endif /* TBB_USE_THREADING_TOOLS */
+    }
+
+    //! 0 if root, 1 if a left child, 2 if a right child.
+    /** Represented as a char, not enum, for compactness. */
+    typedef char reduction_context;
+
+    //! Task type use to combine the partial results of parallel_reduce with affinity_partitioner.
+    /** @ingroup algorithms */
+    template<typename Body>
+    class finish_reduce: public task {
+        //! Pointer to body, or NULL if the left child has not yet finished. 
+        Body* my_body;
+        bool has_right_zombie;
+        const reduction_context my_context;
+        aligned_space<Body,1> zombie_space;
+        finish_reduce( char context ) : 
+            my_body(NULL),
+            has_right_zombie(false),
+            my_context(context)
+        {
+        }
+        task* execute() {
+            if( has_right_zombie ) {
+                // Right child was stolen.
+                Body* s = zombie_space.begin();
+                my_body->join( *s );
+                s->~Body();
+            }
+            if( my_context==1 ) 
+                parallel_reduce_store_body( static_cast<finish_reduce*>(parent())->my_body, my_body );
+            return NULL;
+        }       
+        template<typename Range,typename Body_, typename Partitioner>
+        friend class start_reduce;
+    };
+
+    //! Task type used to split the work of parallel_reduce with affinity_partitioner.
+    /** @ingroup algorithms */
+    template<typename Range, typename Body, typename Partitioner>
+    class start_reduce: public task {
+        typedef finish_reduce<Body> finish_type;
+        Body* my_body;
+        Range my_range;
+        typename Partitioner::partition_type my_partition;
+        reduction_context my_context;
+        /*override*/ task* execute();
+        template<typename Body_>
+        friend class finish_reduce;
+    
+        //! Constructor used for root task
+        start_reduce( const Range& range, Body* body, Partitioner& partitioner ) :
+            my_body(body),
+            my_range(range),
+            my_partition(partitioner),
+            my_context(0)
+        {
+        }
+        //! Splitting constructor used to generate children.
+        /** this becomes left child.  Newly constructed object is right child. */
+        start_reduce( start_reduce& parent, split ) :
+            my_body(parent.my_body),
+            my_range(parent.my_range,split()),
+            my_partition(parent.my_partition,split()),
+            my_context(2)
+        {
+            my_partition.set_affinity(*this);
+            parent.my_context = 1;
+        }
+        //! Update affinity info, if any
+        /*override*/ void note_affinity( affinity_id id ) {
+            my_partition.note_affinity( id );
+        }
+
+public:
+        static void run( const Range& range, Body& body, Partitioner& partitioner ) {
+            if( !range.empty() ) {
+#if !__TBB_EXCEPTIONS || TBB_JOIN_OUTER_TASK_GROUP
+                task::spawn_root_and_wait( *new(task::allocate_root()) start_reduce(range,&body,partitioner) );
+#else
+                // Bound context prevents exceptions from body to affect nesting or sibling algorithms,
+                // and allows users to handle exceptions safely by wrapping parallel_for in the try-block.
+                task_group_context context;
+                task::spawn_root_and_wait( *new(task::allocate_root(context)) start_reduce(range,&body,partitioner) );
+#endif /* __TBB_EXCEPTIONS && !TBB_JOIN_OUTER_TASK_GROUP */
+            }
+        }
+#if __TBB_EXCEPTIONS
+        static void run( const Range& range, Body& body, Partitioner& partitioner, task_group_context& context ) {
+            if( !range.empty() ) 
+                task::spawn_root_and_wait( *new(task::allocate_root(context)) start_reduce(range,&body,partitioner) );
+        }
+#endif /* __TBB_EXCEPTIONS */
+    };
+
+    template<typename Range, typename Body, typename Partitioner>
+    task* start_reduce<Range,Body,Partitioner>::execute() {
+        if( my_context==2 ) {
+            finish_type* p = static_cast<finish_type*>(parent() );
+            if( !parallel_reduce_load_body(p->my_body) ) {
+                my_body = new( p->zombie_space.begin() ) Body(*my_body,split());
+                p->has_right_zombie = true;
+            } 
+        }
+        if( !my_range.is_divisible() || my_partition.should_execute_range(*this) ) {
+            (*my_body)( my_range );
+            if( my_context==1 ) 
+                parallel_reduce_store_body(static_cast<finish_type*>(parent())->my_body, my_body );
+            return my_partition.continue_after_execute_range(*this);
+        } else {
+            finish_type& c = *new( allocate_continuation()) finish_type(my_context);
+            recycle_as_child_of(c);
+            c.set_ref_count(2);    
+            bool delay = my_partition.decide_whether_to_delay();
+            start_reduce& b = *new( c.allocate_child() ) start_reduce(*this,split());
+            my_partition.spawn_or_delay(delay,*this,b);
+            return this;
+        }
+    } 
+
+    //! Auxiliary class for parallel_reduce; for internal use only.
+    /** The adaptor class that implements \ref parallel_reduce_body_req "parallel_reduce Body"
+        using given \ref parallel_reduce_lambda_req "anonymous function objects".
+     **/
+    /** @ingroup algorithms */
+    template<typename Range, typename Value, typename RealBody, typename Reduction>
+    class lambda_reduce_body {
+
+//FIXME: decide if my_real_body, my_reduction, and identity_element should be copied or referenced
+//       (might require some performance measurements)
+
+        const Value&     identity_element;
+        const RealBody&  my_real_body;
+        const Reduction& my_reduction;
+        Value            my_value;
+        lambda_reduce_body& operator= ( const lambda_reduce_body& other );
+    public:
+        lambda_reduce_body( const Value& identity, const RealBody& body, const Reduction& reduction )
+            : identity_element(identity)
+            , my_real_body(body)
+            , my_reduction(reduction)
+            , my_value(identity)
+        { }
+        lambda_reduce_body( const lambda_reduce_body& other )
+            : identity_element(other.identity_element)
+            , my_real_body(other.my_real_body)
+            , my_reduction(other.my_reduction)
+            , my_value(other.my_value)
+        { }
+        lambda_reduce_body( lambda_reduce_body& other, tbb::split )
+            : identity_element(other.identity_element)
+            , my_real_body(other.my_real_body)
+            , my_reduction(other.my_reduction)
+            , my_value(other.identity_element)
+        { }
+        void operator()(Range& range) {
+            my_value = my_real_body(range, const_cast<const Value&>(my_value));
+        }
+        void join( lambda_reduce_body& rhs ) {
+            my_value = my_reduction(const_cast<const Value&>(my_value), const_cast<const Value&>(rhs.my_value));
+        }
+        Value result() const {
+            return my_value;
+        }
+    };
+
+} // namespace internal
+//! @endcond
+
+// Requirements on Range concept are documented in blocked_range.h
+
+/** \page parallel_reduce_body_req Requirements on parallel_reduce body
+    Class \c Body implementing the concept of parallel_reduce body must define:
+    - \code Body::Body( Body&, split ); \endcode        Splitting constructor.
+                                                        Must be able to run concurrently with operator() and method \c join
+    - \code Body::~Body(); \endcode                     Destructor
+    - \code void Body::operator()( Range& r ); \endcode Function call operator applying body to range \c r
+                                                        and accumulating the result
+    - \code void Body::join( Body& b ); \endcode        Join results. 
+                                                        The result in \c b should be merged into the result of \c this
+**/
+
+/** \page parallel_reduce_lambda_req Requirements on parallel_reduce anonymous function objects (lambda functions)
+    TO BE DOCUMENTED
+**/
+
+/** \name parallel_reduce
+    See also requirements on \ref range_req "Range" and \ref parallel_reduce_body_req "parallel_reduce Body". **/
+//@{
+
+//! Parallel iteration with reduction and default partitioner.
+/** @ingroup algorithms **/
+template<typename Range, typename Body>
+void parallel_reduce( const Range& range, Body& body ) {
+    internal::start_reduce<Range,Body, const __TBB_DEFAULT_PARTITIONER>::run( range, body, __TBB_DEFAULT_PARTITIONER() );
+}
+
+//! Parallel iteration with reduction and simple_partitioner
+/** @ingroup algorithms **/
+template<typename Range, typename Body>
+void parallel_reduce( const Range& range, Body& body, const simple_partitioner& partitioner ) {
+    internal::start_reduce<Range,Body,const simple_partitioner>::run( range, body, partitioner );
+}
+
+//! Parallel iteration with reduction and auto_partitioner
+/** @ingroup algorithms **/
+template<typename Range, typename Body>
+void parallel_reduce( const Range& range, Body& body, const auto_partitioner& partitioner ) {
+    internal::start_reduce<Range,Body,const auto_partitioner>::run( range, body, partitioner );
+}
+
+//! Parallel iteration with reduction and affinity_partitioner
+/** @ingroup algorithms **/
+template<typename Range, typename Body>
+void parallel_reduce( const Range& range, Body& body, affinity_partitioner& partitioner ) {
+    internal::start_reduce<Range,Body,affinity_partitioner>::run( range, body, partitioner );
+}
+
+#if __TBB_EXCEPTIONS
+//! Parallel iteration with reduction, simple partitioner and user-supplied context.
+/** @ingroup algorithms **/
+template<typename Range, typename Body>
+void parallel_reduce( const Range& range, Body& body, const simple_partitioner& partitioner, task_group_context& context ) {
+    internal::start_reduce<Range,Body,const simple_partitioner>::run( range, body, partitioner, context );
+}
+
+//! Parallel iteration with reduction, auto_partitioner and user-supplied context
+/** @ingroup algorithms **/
+template<typename Range, typename Body>
+void parallel_reduce( const Range& range, Body& body, const auto_partitioner& partitioner, task_group_context& context ) {
+    internal::start_reduce<Range,Body,const auto_partitioner>::run( range, body, partitioner, context );
+}
+
+//! Parallel iteration with reduction, affinity_partitioner and user-supplied context
+/** @ingroup algorithms **/
+template<typename Range, typename Body>
+void parallel_reduce( const Range& range, Body& body, affinity_partitioner& partitioner, task_group_context& context ) {
+    internal::start_reduce<Range,Body,affinity_partitioner>::run( range, body, partitioner, context );
+}
+#endif /* __TBB_EXCEPTIONS */
+
+/** parallel_reduce overloads that work with anonymous function objects
+    (see also \ref parallel_reduce_lambda_req "requirements on parallel_reduce anonymous function objects"). **/
+
+//! Parallel iteration with reduction and default partitioner.
+/** @ingroup algorithms **/
+template<typename Range, typename Value, typename RealBody, typename Reduction>
+Value parallel_reduce( const Range& range, const Value& identity, const RealBody& real_body, const Reduction& reduction ) {
+    internal::lambda_reduce_body<Range,Value,RealBody,Reduction> body(identity, real_body, reduction);
+    internal::start_reduce<Range,internal::lambda_reduce_body<Range,Value,RealBody,Reduction>,const __TBB_DEFAULT_PARTITIONER>
+                          ::run(range, body, __TBB_DEFAULT_PARTITIONER() );
+    return body.result();
+}
+
+//! Parallel iteration with reduction and simple_partitioner.
+/** @ingroup algorithms **/
+template<typename Range, typename Value, typename RealBody, typename Reduction>
+Value parallel_reduce( const Range& range, const Value& identity, const RealBody& real_body, const Reduction& reduction,
+                       const simple_partitioner& partitioner ) {
+    internal::lambda_reduce_body<Range,Value,RealBody,Reduction> body(identity, real_body, reduction);
+    internal::start_reduce<Range,internal::lambda_reduce_body<Range,Value,RealBody,Reduction>,const simple_partitioner>
+                          ::run(range, body, partitioner );
+    return body.result();
+}
+
+//! Parallel iteration with reduction and auto_partitioner
+/** @ingroup algorithms **/
+template<typename Range, typename Value, typename RealBody, typename Reduction>
+Value parallel_reduce( const Range& range, const Value& identity, const RealBody& real_body, const Reduction& reduction,
+                       const auto_partitioner& partitioner ) {
+    internal::lambda_reduce_body<Range,Value,RealBody,Reduction> body(identity, real_body, reduction);
+    internal::start_reduce<Range,internal::lambda_reduce_body<Range,Value,RealBody,Reduction>,const auto_partitioner>
+                          ::run( range, body, partitioner );
+    return body.result();
+}
+
+//! Parallel iteration with reduction and affinity_partitioner
+/** @ingroup algorithms **/
+template<typename Range, typename Value, typename RealBody, typename Reduction>
+Value parallel_reduce( const Range& range, const Value& identity, const RealBody& real_body, const Reduction& reduction,
+                       affinity_partitioner& partitioner ) {
+    internal::lambda_reduce_body<Range,Value,RealBody,Reduction> body(identity, real_body, reduction);
+    internal::start_reduce<Range,internal::lambda_reduce_body<Range,Value,RealBody,Reduction>,affinity_partitioner>
+                                        ::run( range, body, partitioner );
+    return body.result();
+}
+
+#if __TBB_EXCEPTIONS
+//! Parallel iteration with reduction, simple partitioner and user-supplied context.
+/** @ingroup algorithms **/
+template<typename Range, typename Value, typename RealBody, typename Reduction>
+Value parallel_reduce( const Range& range, const Value& identity, const RealBody& real_body, const Reduction& reduction,
+                       const simple_partitioner& partitioner, task_group_context& context ) {
+    internal::lambda_reduce_body<Range,Value,RealBody,Reduction> body(identity, real_body, reduction);
+    internal::start_reduce<Range,internal::lambda_reduce_body<Range,Value,RealBody,Reduction>,const simple_partitioner>
+                          ::run( range, body, partitioner, context );
+    return body.result();
+}
+
+//! Parallel iteration with reduction, auto_partitioner and user-supplied context
+/** @ingroup algorithms **/
+template<typename Range, typename Value, typename RealBody, typename Reduction>
+Value parallel_reduce( const Range& range, const Value& identity, const RealBody& real_body, const Reduction& reduction,
+                       const auto_partitioner& partitioner, task_group_context& context ) {
+    internal::lambda_reduce_body<Range,Value,RealBody,Reduction> body(identity, real_body, reduction);
+    internal::start_reduce<Range,internal::lambda_reduce_body<Range,Value,RealBody,Reduction>,const auto_partitioner>
+                          ::run( range, body, partitioner, context );
+    return body.result();
+}
+
+//! Parallel iteration with reduction, affinity_partitioner and user-supplied context
+/** @ingroup algorithms **/
+template<typename Range, typename Value, typename RealBody, typename Reduction>
+Value parallel_reduce( const Range& range, const Value& identity, const RealBody& real_body, const Reduction& reduction,
+                       affinity_partitioner& partitioner, task_group_context& context ) {
+    internal::lambda_reduce_body<Range,Value,RealBody,Reduction> body(identity, real_body, reduction);
+    internal::start_reduce<Range,internal::lambda_reduce_body<Range,Value,RealBody,Reduction>,affinity_partitioner>
+                                        ::run( range, body, partitioner, context );
+    return body.result();
+}
+#endif /* __TBB_EXCEPTIONS */
+//@}
+
+} // namespace tbb
+
+#endif /* __TBB_parallel_reduce_H */
+
diff --git a/dep/tbb/include/tbb/parallel_scan.h b/dep/tbb/include/tbb/parallel_scan.h
new file mode 100644
index 000000000..1369bf733
--- /dev/null
+++ b/dep/tbb/include/tbb/parallel_scan.h
@@ -0,0 +1,351 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_parallel_scan_H
+#define __TBB_parallel_scan_H
+
+#include "task.h"
+#include "aligned_space.h"
+#include <new>
+#include "partitioner.h"
+
+namespace tbb {
+
+//! Used to indicate that the initial scan is being performed.
+/** @ingroup algorithms */
+struct pre_scan_tag {
+    static bool is_final_scan() {return false;}
+};
+
+//! Used to indicate that the final scan is being performed.
+/** @ingroup algorithms */
+struct final_scan_tag {
+    static bool is_final_scan() {return true;}
+};
+
+//! @cond INTERNAL
+namespace internal {
+
+    //! Performs final scan for a leaf 
+    /** @ingroup algorithms */
+    template<typename Range, typename Body>
+    class final_sum: public task {
+    public:
+        Body body;
+    private:
+        aligned_space<Range,1> range;
+        //! Where to put result of last subrange, or NULL if not last subrange.
+        Body* stuff_last;
+    public:
+        final_sum( Body& body_ ) :
+            body(body_,split())
+        {
+            poison_pointer(stuff_last);
+        }
+        ~final_sum() {
+            range.begin()->~Range();
+        }     
+        void finish_construction( const Range& range_, Body* stuff_last_ ) {
+            new( range.begin() ) Range(range_);
+            stuff_last = stuff_last_;
+        }
+    private:
+        /*override*/ task* execute() {
+            body( *range.begin(), final_scan_tag() );
+            if( stuff_last )
+                stuff_last->assign(body);
+            return NULL;
+        }
+    };       
+
+    //! Split work to be done in the scan.
+    /** @ingroup algorithms */
+    template<typename Range, typename Body>
+    class sum_node: public task {
+        typedef final_sum<Range,Body> final_sum_type;
+    public:
+        final_sum_type *incoming; 
+        final_sum_type *body;
+        Body *stuff_last;
+    private:
+        final_sum_type *left_sum;
+        sum_node *left;
+        sum_node *right;     
+        bool left_is_final;
+        Range range;
+        sum_node( const Range range_, bool left_is_final_ ) : 
+            left_sum(NULL), 
+            left(NULL), 
+            right(NULL), 
+            left_is_final(left_is_final_), 
+            range(range_)
+        {
+            // Poison fields that will be set by second pass.
+            poison_pointer(body);
+            poison_pointer(incoming);
+        }
+        task* create_child( const Range& range, final_sum_type& f, sum_node* n, final_sum_type* incoming, Body* stuff_last ) {
+            if( !n ) {
+                f.recycle_as_child_of( *this );
+                f.finish_construction( range, stuff_last );
+                return &f;
+            } else {
+                n->body = &f;
+                n->incoming = incoming;
+                n->stuff_last = stuff_last;
+                return n;
+            }
+        }
+        /*override*/ task* execute() {
+            if( body ) {
+                if( incoming )
+                    left_sum->body.reverse_join( incoming->body );
+                recycle_as_continuation();
+                sum_node& c = *this;
+                task* b = c.create_child(Range(range,split()),*left_sum,right,left_sum,stuff_last);
+                task* a = left_is_final ? NULL : c.create_child(range,*body,left,incoming,NULL);
+                set_ref_count( (a!=NULL)+(b!=NULL) );
+                body = NULL; 
+                if( a ) spawn(*b);
+                else a = b;
+                return a;
+            } else {
+                return NULL;
+            }
+        }
+        template<typename Range_,typename Body_,typename Partitioner_>
+        friend class start_scan;
+
+        template<typename Range_,typename Body_>
+        friend class finish_scan;
+    };
+
+    //! Combine partial results
+    /** @ingroup algorithms */
+    template<typename Range, typename Body>
+    class finish_scan: public task {
+        typedef sum_node<Range,Body> sum_node_type;
+        typedef final_sum<Range,Body> final_sum_type;
+        final_sum_type** const sum;
+        sum_node_type*& return_slot;
+    public:
+        final_sum_type* right_zombie;
+        sum_node_type& result;
+
+        /*override*/ task* execute() {
+            __TBB_ASSERT( result.ref_count()==(result.left!=NULL)+(result.right!=NULL), NULL );
+            if( result.left )
+                result.left_is_final = false;
+            if( right_zombie && sum ) 
+                ((*sum)->body).reverse_join(result.left_sum->body);
+            __TBB_ASSERT( !return_slot, NULL );
+            if( right_zombie || result.right ) {
+                return_slot = &result;
+            } else {
+                destroy( result );
+            }
+            if( right_zombie && !sum && !result.right ) destroy(*right_zombie);
+            return NULL;
+        }
+
+        finish_scan( sum_node_type*& return_slot_, final_sum_type** sum_, sum_node_type& result_ ) : 
+            sum(sum_),
+            return_slot(return_slot_), 
+            right_zombie(NULL),
+            result(result_)
+        {
+            __TBB_ASSERT( !return_slot, NULL );
+        }
+    };
+
+    //! Initial task to split the work
+    /** @ingroup algorithms */
+    template<typename Range, typename Body, typename Partitioner=simple_partitioner>
+    class start_scan: public task {
+        typedef sum_node<Range,Body> sum_node_type;
+        typedef final_sum<Range,Body> final_sum_type;
+        final_sum_type* body;
+        /** Non-null if caller is requesting total. */
+        final_sum_type** sum; 
+        sum_node_type** return_slot;
+        /** Null if computing root. */
+        sum_node_type* parent_sum;
+        bool is_final;
+        bool is_right_child;
+        Range range;
+        typename Partitioner::partition_type partition;
+        /*override*/ task* execute();
+    public:
+        start_scan( sum_node_type*& return_slot_, start_scan& parent, sum_node_type* parent_sum_ ) :
+            body(parent.body),
+            sum(parent.sum),
+            return_slot(&return_slot_),
+            parent_sum(parent_sum_),
+            is_final(parent.is_final),
+            is_right_child(false),
+            range(parent.range,split()),
+            partition(parent.partition,split())
+        {
+            __TBB_ASSERT( !*return_slot, NULL );
+        }
+
+        start_scan( sum_node_type*& return_slot_, const Range& range_, final_sum_type& body_, const Partitioner& partitioner_) :
+            body(&body_),
+            sum(NULL),
+            return_slot(&return_slot_),
+            parent_sum(NULL),
+            is_final(true),
+            is_right_child(false),
+            range(range_),
+            partition(partitioner_)
+        {
+            __TBB_ASSERT( !*return_slot, NULL );
+        }
+
+        static void run(  const Range& range, Body& body, const Partitioner& partitioner ) {
+            if( !range.empty() ) {
+                typedef internal::start_scan<Range,Body,Partitioner> start_pass1_type;
+                internal::sum_node<Range,Body>* root = NULL;
+                typedef internal::final_sum<Range,Body> final_sum_type;
+                final_sum_type* temp_body = new(task::allocate_root()) final_sum_type( body );
+                start_pass1_type& pass1 = *new(task::allocate_root()) start_pass1_type(
+                    /*return_slot=*/root,
+                    range,
+                    *temp_body,
+                    partitioner );
+                task::spawn_root_and_wait( pass1 );
+                if( root ) {
+                    root->body = temp_body;
+                    root->incoming = NULL;
+                    root->stuff_last = &body;
+                    task::spawn_root_and_wait( *root );
+                } else {
+                    body.assign(temp_body->body);
+                    temp_body->finish_construction( range, NULL );
+                    temp_body->destroy(*temp_body);
+                }
+            }
+        }
+    };
+
+    template<typename Range, typename Body, typename Partitioner>
+    task* start_scan<Range,Body,Partitioner>::execute() {
+        typedef internal::finish_scan<Range,Body> finish_pass1_type;
+        finish_pass1_type* p = parent_sum ? static_cast<finish_pass1_type*>( parent() ) : NULL;
+        // Inspecting p->result.left_sum would ordinarily be a race condition.
+        // But we inspect it only if we are not a stolen task, in which case we
+        // know that task assigning to p->result.left_sum has completed.
+        bool treat_as_stolen = is_right_child && (is_stolen_task() || body!=p->result.left_sum);
+        if( treat_as_stolen ) {
+            // Invocation is for right child that has been really stolen or needs to be virtually stolen
+            p->right_zombie = body = new( allocate_root() ) final_sum_type(body->body);
+            is_final = false;
+        }
+        task* next_task = NULL;
+        if( (is_right_child && !treat_as_stolen) || !range.is_divisible() || partition.should_execute_range(*this) ) {
+            if( is_final )
+                (body->body)( range, final_scan_tag() );
+            else if( sum )
+                (body->body)( range, pre_scan_tag() );
+            if( sum ) 
+                *sum = body;
+            __TBB_ASSERT( !*return_slot, NULL );
+        } else {
+            sum_node_type* result;
+            if( parent_sum ) 
+                result = new(allocate_additional_child_of(*parent_sum)) sum_node_type(range,/*left_is_final=*/is_final);
+            else
+                result = new(task::allocate_root()) sum_node_type(range,/*left_is_final=*/is_final);
+            finish_pass1_type& c = *new( allocate_continuation()) finish_pass1_type(*return_slot,sum,*result);
+            // Split off right child
+            start_scan& b = *new( c.allocate_child() ) start_scan( /*return_slot=*/result->right, *this, result );
+            b.is_right_child = true;    
+            // Left child is recycling of *this.  Must recycle this before spawning b, 
+            // otherwise b might complete and decrement c.ref_count() to zero, which
+            // would cause c.execute() to run prematurely.
+            recycle_as_child_of(c);
+            c.set_ref_count(2);
+            c.spawn(b);
+            sum = &result->left_sum;
+            return_slot = &result->left;
+            is_right_child = false;
+            next_task = this;
+            parent_sum = result; 
+            __TBB_ASSERT( !*return_slot, NULL );
+        }
+        return next_task;
+    } 
+} // namespace internal
+//! @endcond
+
+// Requirements on Range concept are documented in blocked_range.h
+
+/** \page parallel_scan_body_req Requirements on parallel_scan body
+    Class \c Body implementing the concept of parallel_reduce body must define:
+    - \code Body::Body( Body&, split ); \endcode    Splitting constructor.
+                                                    Split \c b so that \c this and \c b can accumulate separately
+    - \code Body::~Body(); \endcode                 Destructor
+    - \code void Body::operator()( const Range& r, pre_scan_tag ); \endcode
+                                                    Preprocess iterations for range \c r
+    - \code void Body::operator()( const Range& r, final_scan_tag ); \endcode 
+                                                    Do final processing for iterations of range \c r
+    - \code void Body::reverse_join( Body& a ); \endcode
+                                                    Merge preprocessing state of \c a into \c this, where \c a was 
+                                                    created earlier from \c b by b's splitting constructor
+**/
+
+/** \name parallel_scan
+    See also requirements on \ref range_req "Range" and \ref parallel_scan_body_req "parallel_scan Body". **/
+//@{
+
+//! Parallel prefix with default partitioner
+/** @ingroup algorithms **/
+template<typename Range, typename Body>
+void parallel_scan( const Range& range, Body& body ) {
+    internal::start_scan<Range,Body,__TBB_DEFAULT_PARTITIONER>::run(range,body,__TBB_DEFAULT_PARTITIONER());
+}
+
+//! Parallel prefix with simple_partitioner
+/** @ingroup algorithms **/
+template<typename Range, typename Body>
+void parallel_scan( const Range& range, Body& body, const simple_partitioner& partitioner ) {
+    internal::start_scan<Range,Body,simple_partitioner>::run(range,body,partitioner);
+}
+
+//! Parallel prefix with auto_partitioner
+/** @ingroup algorithms **/
+template<typename Range, typename Body>
+void parallel_scan( const Range& range, Body& body, const auto_partitioner& partitioner ) {
+    internal::start_scan<Range,Body,auto_partitioner>::run(range,body,partitioner);
+}
+//@}
+
+} // namespace tbb
+
+#endif /* __TBB_parallel_scan_H */
+
diff --git a/dep/tbb/include/tbb/parallel_sort.h b/dep/tbb/include/tbb/parallel_sort.h
new file mode 100644
index 000000000..38b380dea
--- /dev/null
+++ b/dep/tbb/include/tbb/parallel_sort.h
@@ -0,0 +1,227 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_parallel_sort_H
+#define __TBB_parallel_sort_H
+
+#include "parallel_for.h"
+#include "blocked_range.h"
+#include <algorithm>
+#include <iterator>
+#include <functional>
+
+namespace tbb {
+
+//! @cond INTERNAL
+namespace internal {
+
+//! Range used in quicksort to split elements into subranges based on a value.
+/** The split operation selects a splitter and places all elements less than or equal 
+    to the value in the first range and the remaining elements in the second range.
+    @ingroup algorithms */
+template<typename RandomAccessIterator, typename Compare>
+class quick_sort_range: private no_assign {
+
+    inline size_t median_of_three(const RandomAccessIterator &array, size_t l, size_t m, size_t r) const {
+        return comp(array[l], array[m]) ? ( comp(array[m], array[r]) ? m : ( comp( array[l], array[r]) ? r : l ) ) 
+                                        : ( comp(array[r], array[m]) ? m : ( comp( array[r], array[l] ) ? r : l ) );
+    }
+
+    inline size_t pseudo_median_of_nine( const RandomAccessIterator &array, const quick_sort_range &range ) const {
+        size_t offset = range.size/8u;
+        return median_of_three(array, 
+                               median_of_three(array, 0, offset, offset*2),
+                               median_of_three(array, offset*3, offset*4, offset*5),
+                               median_of_three(array, offset*6, offset*7, range.size - 1) );
+
+    }
+
+public:
+
+    static const size_t grainsize = 500;
+    const Compare &comp;
+    RandomAccessIterator begin;
+    size_t size;
+
+    quick_sort_range( RandomAccessIterator begin_, size_t size_, const Compare &comp_ ) :
+        comp(comp_), begin(begin_), size(size_) {}
+
+    bool empty() const {return size==0;}
+    bool is_divisible() const {return size>=grainsize;}
+
+    quick_sort_range( quick_sort_range& range, split ) : comp(range.comp) {
+        RandomAccessIterator array = range.begin;
+        RandomAccessIterator key0 = range.begin; 
+        size_t m = pseudo_median_of_nine(array, range);
+        if (m) std::swap ( array[0], array[m] );
+
+        size_t i=0;
+        size_t j=range.size;
+        // Partition interval [i+1,j-1] with key *key0.
+        for(;;) {
+            __TBB_ASSERT( i<j, NULL );
+            // Loop must terminate since array[l]==*key0.
+            do {
+                --j;
+                __TBB_ASSERT( i<=j, "bad ordering relation?" );
+            } while( comp( *key0, array[j] ));
+            do {
+                __TBB_ASSERT( i<=j, NULL );
+                if( i==j ) goto partition;
+                ++i;
+            } while( comp( array[i],*key0 ));
+            if( i==j ) goto partition;
+            std::swap( array[i], array[j] );
+        }
+partition:
+        // Put the partition key were it belongs
+        std::swap( array[j], *key0 );
+        // array[l..j) is less or equal to key.
+        // array(j..r) is greater or equal to key.
+        // array[j] is equal to key
+        i=j+1;
+        begin = array+i;
+        size = range.size-i;
+        range.size = j;
+    }
+};
+
+//! Body class used to test if elements in a range are presorted
+/** @ingroup algorithms */
+template<typename RandomAccessIterator, typename Compare>
+class quick_sort_pretest_body : internal::no_assign {
+    const Compare &comp;
+
+public:
+    quick_sort_pretest_body(const Compare &_comp) : comp(_comp) {}
+
+    void operator()( const blocked_range<RandomAccessIterator>& range ) const {
+        task &my_task = task::self();
+        RandomAccessIterator my_end = range.end();
+
+        int i = 0;
+        for (RandomAccessIterator k = range.begin(); k != my_end; ++k, ++i) {
+            if ( i%64 == 0 && my_task.is_cancelled() ) break;
+          
+            // The k-1 is never out-of-range because the first chunk starts at begin+serial_cutoff+1
+            if ( comp( *(k), *(k-1) ) ) {
+                my_task.cancel_group_execution();
+                break;
+            }
+        }
+    }
+
+};
+
+//! Body class used to sort elements in a range that is smaller than the grainsize.
+/** @ingroup algorithms */
+template<typename RandomAccessIterator, typename Compare>
+struct quick_sort_body {
+    void operator()( const quick_sort_range<RandomAccessIterator,Compare>& range ) const {
+        //SerialQuickSort( range.begin, range.size, range.comp );
+        std::sort( range.begin, range.begin + range.size, range.comp );
+    }
+};
+
+//! Wrapper method to initiate the sort by calling parallel_for.
+/** @ingroup algorithms */
+template<typename RandomAccessIterator, typename Compare>
+void parallel_quick_sort( RandomAccessIterator begin, RandomAccessIterator end, const Compare& comp ) {
+    task_group_context my_context;
+    const int serial_cutoff = 9;
+
+    __TBB_ASSERT( begin + serial_cutoff < end, "min_parallel_size is smaller than serial cutoff?" );
+    RandomAccessIterator k;
+    for ( k = begin ; k != begin + serial_cutoff; ++k ) {
+        if ( comp( *(k+1), *k ) ) {
+            goto do_parallel_quick_sort;
+        }
+    }
+
+    parallel_for( blocked_range<RandomAccessIterator>(k+1, end),
+                  quick_sort_pretest_body<RandomAccessIterator,Compare>(comp),
+                  auto_partitioner(),
+                  my_context);
+
+    if (my_context.is_group_execution_cancelled())
+do_parallel_quick_sort:
+        parallel_for( quick_sort_range<RandomAccessIterator,Compare>(begin, end-begin, comp ), 
+                      quick_sort_body<RandomAccessIterator,Compare>(),
+                      auto_partitioner() );
+}
+
+} // namespace internal
+//! @endcond
+
+/** \page parallel_sort_iter_req Requirements on iterators for parallel_sort
+    Requirements on value type \c T of \c RandomAccessIterator for \c parallel_sort:
+    - \code void swap( T& x, T& y ) \endcode        Swaps \c x and \c y
+    - \code bool Compare::operator()( const T& x, const T& y ) \endcode
+                                                    True if x comes before y;
+**/
+
+/** \name parallel_sort
+    See also requirements on \ref parallel_sort_iter_req "iterators for parallel_sort". **/
+//@{
+
+//! Sorts the data in [begin,end) using the given comparator 
+/** The compare function object is used for all comparisons between elements during sorting.
+    The compare object must define a bool operator() function.
+    @ingroup algorithms **/
+template<typename RandomAccessIterator, typename Compare>
+void parallel_sort( RandomAccessIterator begin, RandomAccessIterator end, const Compare& comp) { 
+    const int min_parallel_size = 500; 
+    if( end > begin ) {
+        if (end - begin < min_parallel_size) { 
+            std::sort(begin, end, comp);
+        } else {
+            internal::parallel_quick_sort(begin, end, comp);
+        }
+    }
+}
+
+//! Sorts the data in [begin,end) with a default comparator \c std::less<RandomAccessIterator>
+/** @ingroup algorithms **/
+template<typename RandomAccessIterator>
+inline void parallel_sort( RandomAccessIterator begin, RandomAccessIterator end ) { 
+    parallel_sort( begin, end, std::less< typename std::iterator_traits<RandomAccessIterator>::value_type >() );
+}
+
+//! Sorts the data in the range \c [begin,end) with a default comparator \c std::less<T>
+/** @ingroup algorithms **/
+template<typename T>
+inline void parallel_sort( T * begin, T * end ) {
+    parallel_sort( begin, end, std::less< T >() );
+}   
+//@}
+
+
+} // namespace tbb
+
+#endif
+
diff --git a/dep/tbb/include/tbb/parallel_while.h b/dep/tbb/include/tbb/parallel_while.h
new file mode 100644
index 000000000..a4ad9e6e2
--- /dev/null
+++ b/dep/tbb/include/tbb/parallel_while.h
@@ -0,0 +1,194 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_parallel_while
+#define __TBB_parallel_while
+
+#include "task.h"
+#include <new>
+
+namespace tbb {
+
+template<typename Body>
+class parallel_while;
+
+//! @cond INTERNAL
+namespace internal {
+
+    template<typename Stream, typename Body> class while_task;
+
+    //! For internal use only.
+    /** Executes one iteration of a while.
+        @ingroup algorithms */
+    template<typename Body>
+    class while_iteration_task: public task {
+        const Body& my_body;
+        typename Body::argument_type my_value;
+        /*override*/ task* execute() {
+            my_body(my_value); 
+            return NULL;
+        }
+        while_iteration_task( const typename Body::argument_type& value, const Body& body ) : 
+            my_body(body), my_value(value)
+        {}
+        template<typename Body_> friend class while_group_task;
+        friend class tbb::parallel_while<Body>;
+    };
+
+    //! For internal use only
+    /** Unpacks a block of iterations.
+        @ingroup algorithms */
+    template<typename Body>
+    class while_group_task: public task {
+        static const size_t max_arg_size = 4;         
+        const Body& my_body;
+        size_t size;
+        typename Body::argument_type my_arg[max_arg_size];
+        while_group_task( const Body& body ) : my_body(body), size(0) {} 
+        /*override*/ task* execute() {
+            typedef while_iteration_task<Body> iteration_type;
+            __TBB_ASSERT( size>0, NULL );
+            task_list list;
+            task* t; 
+            size_t k=0; 
+            for(;;) {
+                t = new( allocate_child() ) iteration_type(my_arg[k],my_body); 
+                if( ++k==size ) break;
+                list.push_back(*t);
+            }
+            set_ref_count(int(k+1));
+            spawn(list);
+            spawn_and_wait_for_all(*t);
+            return NULL;
+        }
+        template<typename Stream, typename Body_> friend class while_task;
+    };
+    
+    //! For internal use only.
+    /** Gets block of iterations from a stream and packages them into a while_group_task.
+        @ingroup algorithms */
+    template<typename Stream, typename Body>
+    class while_task: public task {
+        Stream& my_stream;
+        const Body& my_body;
+        empty_task& my_barrier;
+        /*override*/ task* execute() {
+            typedef while_group_task<Body> block_type;
+            block_type& t = *new( allocate_additional_child_of(my_barrier) ) block_type(my_body);
+            size_t k=0; 
+            while( my_stream.pop_if_present(t.my_arg[k]) ) {
+                if( ++k==block_type::max_arg_size ) {
+                    // There might be more iterations.
+                    recycle_to_reexecute();
+                    break;
+                }
+            }
+            if( k==0 ) {
+                destroy(t);
+                return NULL;
+            } else {
+                t.size = k;
+                return &t;
+            }
+        }
+        while_task( Stream& stream, const Body& body, empty_task& barrier ) : 
+            my_stream(stream),
+            my_body(body),
+            my_barrier(barrier)
+        {} 
+        friend class tbb::parallel_while<Body>;
+    };
+
+} // namespace internal
+//! @endcond
+
+//! Parallel iteration over a stream, with optional addition of more work.
+/** The Body b has the requirement: \n
+        "b(v)"                      \n
+        "b.argument_type"           \n
+    where v is an argument_type
+    @ingroup algorithms */
+template<typename Body>
+class parallel_while: internal::no_copy {
+public:
+    //! Construct empty non-running parallel while.
+    parallel_while() : my_body(NULL), my_barrier(NULL) {}
+
+    //! Destructor cleans up data members before returning.
+    ~parallel_while() {
+        if( my_barrier ) {
+            my_barrier->destroy(*my_barrier);    
+            my_barrier = NULL;
+        }
+    }
+
+    //! Type of items
+    typedef typename Body::argument_type value_type;
+
+    //! Apply body.apply to each item in the stream.
+    /** A Stream s has the requirements \n
+         "S::value_type"                \n
+         "s.pop_if_present(value) is convertible to bool */
+    template<typename Stream>
+    void run( Stream& stream, const Body& body );
+
+    //! Add a work item while running.
+    /** Should be executed only by body.apply or a thread spawned therefrom. */
+    void add( const value_type& item );
+
+private:
+    const Body* my_body;
+    empty_task* my_barrier;
+};
+
+template<typename Body>
+template<typename Stream>
+void parallel_while<Body>::run( Stream& stream, const Body& body ) {
+    using namespace internal;
+    empty_task& barrier = *new( task::allocate_root() ) empty_task();
+    my_body = &body;
+    my_barrier = &barrier;
+    my_barrier->set_ref_count(2);
+    while_task<Stream,Body>& w = *new( my_barrier->allocate_child() ) while_task<Stream,Body>( stream, body, barrier );
+    my_barrier->spawn_and_wait_for_all(w);
+    my_barrier->destroy(*my_barrier);
+    my_barrier = NULL;
+    my_body = NULL;
+}
+
+template<typename Body>
+void parallel_while<Body>::add( const value_type& item ) {
+    __TBB_ASSERT(my_barrier,"attempt to add to parallel_while that is not running");
+    typedef internal::while_iteration_task<Body> iteration_type;
+    iteration_type& i = *new( task::self().allocate_additional_child_of(*my_barrier) ) iteration_type(item,*my_body);
+    task::self().spawn( i );
+}
+
+} // namespace 
+
+#endif /* __TBB_parallel_while */
diff --git a/dep/tbb/include/tbb/partitioner.h b/dep/tbb/include/tbb/partitioner.h
new file mode 100644
index 000000000..53e2953c0
--- /dev/null
+++ b/dep/tbb/include/tbb/partitioner.h
@@ -0,0 +1,228 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_partitioner_H
+#define __TBB_partitioner_H
+
+#include "task.h"
+
+namespace tbb {
+class affinity_partitioner;
+
+//! @cond INTERNAL
+namespace internal {
+size_t __TBB_EXPORTED_FUNC get_initial_auto_partitioner_divisor();
+
+//! Defines entry points into tbb run-time library;
+/** The entry points are the constructor and destructor. */
+class affinity_partitioner_base_v3: no_copy {
+    friend class tbb::affinity_partitioner;
+    //! Array that remembers affinities of tree positions to affinity_id.
+    /** NULL if my_size==0. */
+    affinity_id* my_array;
+    //! Number of elements in my_array.
+    size_t my_size;
+    //! Zeros the fields.
+    affinity_partitioner_base_v3() : my_array(NULL), my_size(0) {}
+    //! Deallocates my_array.
+    ~affinity_partitioner_base_v3() {resize(0);}
+    //! Resize my_array.
+    /** Retains values if resulting size is the same. */
+    void __TBB_EXPORTED_METHOD resize( unsigned factor );
+    friend class affinity_partition_type;
+};
+
+//! Provides default methods for partition objects without affinity.
+class partition_type_base {
+public:
+    void set_affinity( task & ) {}
+    void note_affinity( task::affinity_id ) {}
+    task* continue_after_execute_range( task& ) {return NULL;}
+    bool decide_whether_to_delay() {return false;}
+    void spawn_or_delay( bool, task& a, task& b ) {
+        a.spawn(b);
+    }
+};
+
+class affinity_partition_type;
+
+template<typename Range, typename Body, typename Partitioner> class start_for;
+template<typename Range, typename Body, typename Partitioner> class start_reduce;
+template<typename Range, typename Body> class start_reduce_with_affinity;
+template<typename Range, typename Body, typename Partitioner> class start_scan;
+
+} // namespace internal
+//! @endcond
+
+//! A simple partitioner 
+/** Divides the range until the range is not divisible. 
+    @ingroup algorithms */
+class simple_partitioner {
+public:
+    simple_partitioner() {}
+private:
+    template<typename Range, typename Body, typename Partitioner> friend class internal::start_for;
+    template<typename Range, typename Body, typename Partitioner> friend class internal::start_reduce;
+    template<typename Range, typename Body, typename Partitioner> friend class internal::start_scan;
+
+    class partition_type: public internal::partition_type_base {
+    public:
+        bool should_execute_range(const task& ) {return false;}
+        partition_type( const simple_partitioner& ) {}
+        partition_type( const partition_type&, split ) {}
+    };
+};
+
+//! An auto partitioner 
+/** The range is initial divided into several large chunks.
+    Chunks are further subdivided into VICTIM_CHUNKS pieces if they are stolen and divisible.
+    @ingroup algorithms */
+class auto_partitioner {
+public:
+    auto_partitioner() {}
+
+private:
+    template<typename Range, typename Body, typename Partitioner> friend class internal::start_for;
+    template<typename Range, typename Body, typename Partitioner> friend class internal::start_reduce;
+    template<typename Range, typename Body, typename Partitioner> friend class internal::start_scan;
+
+    class partition_type: public internal::partition_type_base {
+        size_t num_chunks;
+        static const size_t VICTIM_CHUNKS = 4;
+public:
+        bool should_execute_range(const task &t) {
+            if( num_chunks<VICTIM_CHUNKS && t.is_stolen_task() )
+                num_chunks = VICTIM_CHUNKS;
+            return num_chunks==1;
+        }
+        partition_type( const auto_partitioner& ) : num_chunks(internal::get_initial_auto_partitioner_divisor()) {}
+        partition_type( partition_type& pt, split ) {
+            num_chunks = pt.num_chunks /= 2u;
+        }
+    };
+};
+
+//! An affinity partitioner
+class affinity_partitioner: internal::affinity_partitioner_base_v3 {
+public:
+    affinity_partitioner() {}
+
+private:
+    template<typename Range, typename Body, typename Partitioner> friend class internal::start_for;
+    template<typename Range, typename Body, typename Partitioner> friend class internal::start_reduce;
+    template<typename Range, typename Body> friend class internal::start_reduce_with_affinity;
+    template<typename Range, typename Body, typename Partitioner> friend class internal::start_scan;
+
+    typedef internal::affinity_partition_type partition_type;
+    friend class internal::affinity_partition_type;
+};
+
+//! @cond INTERNAL
+namespace internal {
+
+class affinity_partition_type: public no_copy {
+    //! Must be power of two
+    static const unsigned factor = 16;
+    static const size_t VICTIM_CHUNKS = 4;
+
+    internal::affinity_id* my_array;
+    task_list delay_list;
+    unsigned map_begin, map_end;
+    size_t num_chunks;
+public:
+    affinity_partition_type( affinity_partitioner& ap ) {
+        __TBB_ASSERT( (factor&(factor-1))==0, "factor must be power of two" ); 
+        ap.resize(factor);
+        my_array = ap.my_array;
+        map_begin = 0;
+        map_end = unsigned(ap.my_size);
+        num_chunks = internal::get_initial_auto_partitioner_divisor();
+    }
+    affinity_partition_type(affinity_partition_type& p, split) : my_array(p.my_array) {
+        __TBB_ASSERT( p.map_end-p.map_begin<factor || (p.map_end-p.map_begin)%factor==0, NULL );
+        num_chunks = p.num_chunks /= 2;
+        unsigned e = p.map_end;
+        unsigned d = (e - p.map_begin)/2;
+        if( d>factor ) 
+            d &= 0u-factor;
+        map_end = e;
+        map_begin = p.map_end = e-d;
+    }
+
+    bool should_execute_range(const task &t) {
+        if( num_chunks < VICTIM_CHUNKS && t.is_stolen_task() )
+            num_chunks = VICTIM_CHUNKS;
+        return num_chunks == 1;
+    }
+
+    void set_affinity( task &t ) {
+        if( map_begin<map_end )
+            t.set_affinity( my_array[map_begin] );
+    }
+    void note_affinity( task::affinity_id id ) {
+        if( map_begin<map_end ) 
+            my_array[map_begin] = id;
+    }
+    task* continue_after_execute_range( task& t ) {
+        task* first = NULL;
+        if( !delay_list.empty() ) {
+            first = &delay_list.pop_front();
+            while( !delay_list.empty() ) {
+                t.spawn(*first);
+                first = &delay_list.pop_front();
+            }
+        }
+        return first;
+    }
+    bool decide_whether_to_delay() {
+        // The possible underflow caused by "-1u" is deliberate
+        return (map_begin&(factor-1))==0 && map_end-map_begin-1u<factor;
+    }
+    void spawn_or_delay( bool delay, task& a, task& b ) {
+        if( delay )  
+            delay_list.push_back(b);
+        else 
+            a.spawn(b);
+    }
+
+    ~affinity_partition_type() {
+        // The delay_list can be non-empty if an exception is thrown.
+        while( !delay_list.empty() ) {
+            task& t = delay_list.pop_front();
+            t.destroy(t);
+        } 
+    }
+};
+
+} // namespace internal
+//! @endcond
+
+
+} // namespace tbb
+
+#endif /* __TBB_partitioner_H */
diff --git a/dep/tbb/include/tbb/pipeline.h b/dep/tbb/include/tbb/pipeline.h
new file mode 100644
index 000000000..4cfb293e6
--- /dev/null
+++ b/dep/tbb/include/tbb/pipeline.h
@@ -0,0 +1,269 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_pipeline_H 
+#define __TBB_pipeline_H 
+
+#include "atomic.h"
+#include "task.h"
+#include <cstddef>
+
+namespace tbb {
+
+class pipeline;
+class filter;
+
+//! @cond INTERNAL
+namespace internal {
+
+// The argument for PIPELINE_VERSION should be an integer between 2 and 9
+#define __TBB_PIPELINE_VERSION(x) (unsigned char)(x-2)<<1
+
+typedef unsigned long Token;
+typedef long tokendiff_t;
+class stage_task;
+class input_buffer;
+class pipeline_root_task;
+class pipeline_cleaner;
+
+} // namespace internal
+//! @endcond
+
+//! A stage in a pipeline.
+/** @ingroup algorithms */
+class filter: internal::no_copy {
+private:
+    //! Value used to mark "not in pipeline"
+    static filter* not_in_pipeline() {return reinterpret_cast<filter*>(internal::intptr(-1));}
+    
+    //! The lowest bit 0 is for parallel vs. serial
+    static const unsigned char filter_is_serial = 0x1; 
+
+    //! 4th bit distinguishes ordered vs unordered filters.
+    /** The bit was not set for parallel filters in TBB 2.1 and earlier,
+        but is_ordered() function always treats parallel filters as out of order. */
+    static const unsigned char filter_is_out_of_order = 0x1<<4;  
+
+    //! 5th bit distinguishes thread-bound and regular filters.
+    static const unsigned char filter_is_bound = 0x1<<5;  
+
+    static const unsigned char current_version = __TBB_PIPELINE_VERSION(5);
+    static const unsigned char version_mask = 0x7<<1; // bits 1-3 are for version
+public:
+    enum mode {
+        //! processes multiple items in parallel and in no particular order
+        parallel = current_version | filter_is_out_of_order, 
+        //! processes items one at a time; all such filters process items in the same order
+        serial_in_order = current_version | filter_is_serial,
+        //! processes items one at a time and in no particular order
+        serial_out_of_order = current_version | filter_is_serial | filter_is_out_of_order,
+        //! @deprecated use serial_in_order instead
+        serial = serial_in_order
+    };
+protected:
+    filter( bool is_serial_ ) : 
+        next_filter_in_pipeline(not_in_pipeline()),
+        my_input_buffer(NULL),
+        my_filter_mode(static_cast<unsigned char>(is_serial_ ? serial : parallel)),
+        prev_filter_in_pipeline(not_in_pipeline()),
+        my_pipeline(NULL),
+        next_segment(NULL)
+    {}
+    
+    filter( mode filter_mode ) :
+        next_filter_in_pipeline(not_in_pipeline()),
+        my_input_buffer(NULL),
+        my_filter_mode(static_cast<unsigned char>(filter_mode)),
+        prev_filter_in_pipeline(not_in_pipeline()),
+        my_pipeline(NULL),
+        next_segment(NULL)
+    {}
+
+public:
+    //! True if filter is serial.
+    bool is_serial() const {
+        return bool( my_filter_mode & filter_is_serial );
+    }  
+    
+    //! True if filter must receive stream in order.
+    bool is_ordered() const {
+        return (my_filter_mode & (filter_is_out_of_order|filter_is_serial))==filter_is_serial;
+    }
+
+    //! True if filter is thread-bound.
+    bool is_bound() const {
+        return ( my_filter_mode & filter_is_bound )==filter_is_bound;
+    }
+
+    //! Operate on an item from the input stream, and return item for output stream.
+    /** Returns NULL if filter is a sink. */
+    virtual void* operator()( void* item ) = 0;
+
+    //! Destroy filter.  
+    /** If the filter was added to a pipeline, the pipeline must be destroyed first. */
+    virtual __TBB_EXPORTED_METHOD ~filter();
+
+#if __TBB_EXCEPTIONS
+    //! Destroys item if pipeline was cancelled.
+    /** Required to prevent memory leaks.
+        Note it can be called concurrently even for serial filters.*/
+    virtual void finalize( void* /*item*/ ) {};
+#endif
+
+private:
+    //! Pointer to next filter in the pipeline.
+    filter* next_filter_in_pipeline;
+
+    //! Buffer for incoming tokens, or NULL if not required.
+    /** The buffer is required if the filter is serial or follows a thread-bound one. */
+    internal::input_buffer* my_input_buffer;
+
+    friend class internal::stage_task;
+    friend class internal::pipeline_root_task;
+    friend class pipeline;
+    friend class thread_bound_filter;
+
+    //! Storage for filter mode and dynamically checked implementation version.
+    const unsigned char my_filter_mode;
+
+    //! Pointer to previous filter in the pipeline.
+    filter* prev_filter_in_pipeline;
+
+    //! Pointer to the pipeline.
+    pipeline* my_pipeline;
+
+    //! Pointer to the next "segment" of filters, or NULL if not required.
+    /** In each segment, the first filter is not thread-bound but follows a thread-bound one. */
+    filter* next_segment;
+};
+
+//! A stage in a pipeline served by a user thread.
+/** @ingroup algorithms */
+class thread_bound_filter: public filter {
+public:
+    enum result_type {
+        // item was processed
+        success,
+        // item is currently not available
+        item_not_available,
+        // there are no more items to process
+        end_of_stream
+    };
+protected:
+    thread_bound_filter(mode filter_mode): 
+         filter(static_cast<mode>(filter_mode | filter::filter_is_bound))
+    {}
+public:
+    //! If a data item is available, invoke operator() on that item.  
+    /** This interface is non-blocking.
+        Returns 'success' if an item was processed.
+        Returns 'item_not_available' if no item can be processed now 
+        but more may arrive in the future, or if token limit is reached. 
+        Returns 'end_of_stream' if there are no more items to process. */
+    result_type __TBB_EXPORTED_METHOD try_process_item(); 
+
+    //! Wait until a data item becomes available, and invoke operator() on that item.
+    /** This interface is blocking.
+        Returns 'success' if an item was processed.
+        Returns 'end_of_stream' if there are no more items to process.
+        Never returns 'item_not_available', as it blocks until another return condition applies. */
+    result_type __TBB_EXPORTED_METHOD process_item();
+
+private:
+    //! Internal routine for item processing
+    result_type internal_process_item(bool is_blocking);
+};
+
+//! A processing pipeling that applies filters to items.
+/** @ingroup algorithms */
+class pipeline {
+public:
+    //! Construct empty pipeline.
+    __TBB_EXPORTED_METHOD pipeline();
+
+    /** Though the current implementation declares the destructor virtual, do not rely on this 
+        detail.  The virtualness is deprecated and may disappear in future versions of TBB. */
+    virtual __TBB_EXPORTED_METHOD ~pipeline();
+
+    //! Add filter to end of pipeline.
+    void __TBB_EXPORTED_METHOD add_filter( filter& filter_ );
+
+    //! Run the pipeline to completion.
+    void __TBB_EXPORTED_METHOD run( size_t max_number_of_live_tokens );
+
+#if __TBB_EXCEPTIONS
+    //! Run the pipeline to completion with user-supplied context.
+    void __TBB_EXPORTED_METHOD run( size_t max_number_of_live_tokens, tbb::task_group_context& context );
+#endif
+
+    //! Remove all filters from the pipeline.
+    void __TBB_EXPORTED_METHOD clear();
+
+private:
+    friend class internal::stage_task;
+    friend class internal::pipeline_root_task;
+    friend class filter;
+    friend class thread_bound_filter;
+    friend class internal::pipeline_cleaner;
+
+    //! Pointer to first filter in the pipeline.
+    filter* filter_list;
+
+    //! Pointer to location where address of next filter to be added should be stored.
+    filter* filter_end;
+
+    //! task who's reference count is used to determine when all stages are done.
+    task* end_counter;
+
+    //! Number of idle tokens waiting for input stage.
+    atomic<internal::Token> input_tokens;
+
+    //! Global counter of tokens 
+    atomic<internal::Token> token_counter;
+
+    //! False until fetch_input returns NULL.
+    bool end_of_input;
+
+    //! True if the pipeline contains a thread-bound filter; false otherwise.
+    bool has_thread_bound_filters;
+
+    //! Remove filter from pipeline.
+    void remove_filter( filter& filter_ );
+
+    //! Not used, but retained to satisfy old export files.
+    void __TBB_EXPORTED_METHOD inject_token( task& self );
+
+#if __TBB_EXCEPTIONS
+    //! Does clean up if pipeline is cancelled or exception occured
+    void clear_filters();
+#endif
+};
+
+} // tbb
+
+#endif /* __TBB_pipeline_H */
diff --git a/dep/tbb/include/tbb/queuing_mutex.h b/dep/tbb/include/tbb/queuing_mutex.h
new file mode 100644
index 000000000..a7cb71c1b
--- /dev/null
+++ b/dep/tbb/include/tbb/queuing_mutex.h
@@ -0,0 +1,119 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_queuing_mutex_H
+#define __TBB_queuing_mutex_H
+
+#include <cstring>
+#include "atomic.h"
+#include "tbb_profiling.h"
+
+namespace tbb {
+
+//! Queuing lock with local-only spinning.
+/** @ingroup synchronization */
+class queuing_mutex {
+public:
+    //! Construct unacquired mutex.
+    queuing_mutex() {
+        q_tail = NULL;
+#if TBB_USE_THREADING_TOOLS
+        internal_construct();
+#endif
+    }
+
+    //! The scoped locking pattern
+    /** It helps to avoid the common problem of forgetting to release lock.
+        It also nicely provides the "node" for queuing locks. */
+    class scoped_lock: internal::no_copy {
+        //! Initialize fields to mean "no lock held".
+        void initialize() {
+            mutex = NULL;
+#if TBB_USE_ASSERT
+            internal::poison_pointer(next);
+#endif /* TBB_USE_ASSERT */
+        }
+    public:
+        //! Construct lock that has not acquired a mutex.
+        /** Equivalent to zero-initialization of *this. */
+        scoped_lock() {initialize();}
+
+        //! Acquire lock on given mutex.
+        /** Upon entry, *this should not be in the "have acquired a mutex" state. */
+        scoped_lock( queuing_mutex& m ) {
+            initialize();
+            acquire(m);
+        }
+
+        //! Release lock (if lock is held).
+        ~scoped_lock() {
+            if( mutex ) release();
+        }
+
+        //! Acquire lock on given mutex.
+        void __TBB_EXPORTED_METHOD acquire( queuing_mutex& m );
+
+        //! Acquire lock on given mutex if free (i.e. non-blocking)
+        bool __TBB_EXPORTED_METHOD try_acquire( queuing_mutex& m );
+
+        //! Release lock.
+        void __TBB_EXPORTED_METHOD release();
+
+    private:
+        //! The pointer to the mutex owned, or NULL if not holding a mutex.
+        queuing_mutex* mutex;
+
+        //! The pointer to the next competitor for a mutex
+        scoped_lock *next;
+
+        //! The local spin-wait variable
+        /** Inverted (0 - blocked, 1 - acquired the mutex) for the sake of 
+            zero-initialization.  Defining it as an entire word instead of
+            a byte seems to help performance slightly. */
+        internal::uintptr going;
+    };
+
+    void __TBB_EXPORTED_METHOD internal_construct();
+
+    // Mutex traits
+    static const bool is_rw_mutex = false;
+    static const bool is_recursive_mutex = false;
+    static const bool is_fair_mutex = true;
+
+    friend class scoped_lock;
+private:
+    //! The last competitor requesting the lock
+    atomic<scoped_lock*> q_tail;
+
+};
+
+__TBB_DEFINE_PROFILING_SET_NAME(queuing_mutex)
+
+} // namespace tbb
+
+#endif /* __TBB_queuing_mutex_H */
diff --git a/dep/tbb/include/tbb/queuing_rw_mutex.h b/dep/tbb/include/tbb/queuing_rw_mutex.h
new file mode 100644
index 000000000..27456f685
--- /dev/null
+++ b/dep/tbb/include/tbb/queuing_rw_mutex.h
@@ -0,0 +1,161 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_queuing_rw_mutex_H
+#define __TBB_queuing_rw_mutex_H
+
+#include <cstring>
+#include "atomic.h"
+#include "tbb_profiling.h"
+
+namespace tbb {
+
+//! Reader-writer lock with local-only spinning.
+/** Adapted from Krieger, Stumm, et al. pseudocode at
+    http://www.eecg.toronto.edu/parallel/pubs_abs.html#Krieger_etal_ICPP93
+    @ingroup synchronization */
+class queuing_rw_mutex {
+public:
+    //! Construct unacquired mutex.
+    queuing_rw_mutex() {
+        q_tail = NULL;
+#if TBB_USE_THREADING_TOOLS
+        internal_construct();
+#endif
+    }
+
+    //! Destructor asserts if the mutex is acquired, i.e. q_tail is non-NULL
+    ~queuing_rw_mutex() {
+#if TBB_USE_ASSERT
+        __TBB_ASSERT( !q_tail, "destruction of an acquired mutex");
+#endif
+    }
+
+    class scoped_lock;
+    friend class scoped_lock;
+
+    //! The scoped locking pattern
+    /** It helps to avoid the common problem of forgetting to release lock.
+        It also nicely provides the "node" for queuing locks. */
+    class scoped_lock: internal::no_copy {
+        //! Initialize fields
+        void initialize() {
+            mutex = NULL;
+#if TBB_USE_ASSERT
+            state = 0xFF; // Set to invalid state
+            internal::poison_pointer(next);
+            internal::poison_pointer(prev);
+#endif /* TBB_USE_ASSERT */
+        }
+    public:
+        //! Construct lock that has not acquired a mutex.
+        /** Equivalent to zero-initialization of *this. */
+        scoped_lock() {initialize();}
+
+        //! Acquire lock on given mutex.
+        /** Upon entry, *this should not be in the "have acquired a mutex" state. */
+        scoped_lock( queuing_rw_mutex& m, bool write=true ) {
+            initialize();
+            acquire(m,write);
+        }
+
+        //! Release lock (if lock is held).
+        ~scoped_lock() {
+            if( mutex ) release();
+        }
+
+        //! Acquire lock on given mutex.
+        void acquire( queuing_rw_mutex& m, bool write=true );
+
+        //! Try acquire lock on given mutex.
+        bool try_acquire( queuing_rw_mutex& m, bool write=true );
+
+        //! Release lock.
+        void release();
+
+        //! Upgrade reader to become a writer.
+        /** Returns true if the upgrade happened without re-acquiring the lock and false if opposite */
+        bool upgrade_to_writer();
+
+        //! Downgrade writer to become a reader.
+        bool downgrade_to_reader();
+
+    private:
+        //! The pointer to the current mutex to work
+        queuing_rw_mutex* mutex;
+
+        //! The pointer to the previous and next competitors for a mutex
+        scoped_lock * prev, * next;
+
+        typedef unsigned char state_t;
+
+        //! State of the request: reader, writer, active reader, other service states
+        atomic<state_t> state;
+
+        //! The local spin-wait variable
+        /** Corresponds to "spin" in the pseudocode but inverted for the sake of zero-initialization */
+        unsigned char going;
+
+        //! A tiny internal lock
+        unsigned char internal_lock;
+
+        //! Acquire the internal lock
+        void acquire_internal_lock();
+
+        //! Try to acquire the internal lock
+        /** Returns true if lock was successfully acquired. */
+        bool try_acquire_internal_lock();
+
+        //! Release the internal lock
+        void release_internal_lock();
+
+        //! Wait for internal lock to be released
+        void wait_for_release_of_internal_lock();
+
+        //! A helper function
+        void unblock_or_wait_on_internal_lock( uintptr_t );
+    };
+
+    void __TBB_EXPORTED_METHOD internal_construct();
+
+    // Mutex traits
+    static const bool is_rw_mutex = true;
+    static const bool is_recursive_mutex = false;
+    static const bool is_fair_mutex = true;
+
+private:
+    //! The last competitor requesting the lock
+    atomic<scoped_lock*> q_tail;
+
+};
+
+__TBB_DEFINE_PROFILING_SET_NAME(queuing_rw_mutex)
+
+} // namespace tbb
+
+#endif /* __TBB_queuing_rw_mutex_H */
diff --git a/dep/tbb/include/tbb/recursive_mutex.h b/dep/tbb/include/tbb/recursive_mutex.h
new file mode 100644
index 000000000..1b7a82539
--- /dev/null
+++ b/dep/tbb/include/tbb/recursive_mutex.h
@@ -0,0 +1,245 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_recursive_mutex_H
+#define __TBB_recursive_mutex_H
+
+#if _WIN32||_WIN64
+
+#include <windows.h>
+#if !defined(_WIN32_WINNT)
+// The following Windows API function is declared explicitly;
+// otherwise any user would have to specify /D_WIN32_WINNT=0x0400
+extern "C" BOOL WINAPI TryEnterCriticalSection( LPCRITICAL_SECTION );
+#endif
+
+#else /* if not _WIN32||_WIN64 */
+
+#include <pthread.h>
+namespace tbb { namespace internal {
+// Use this internal TBB function to throw an exception
+  extern void handle_perror( int error_code, const char* what );
+} } //namespaces
+
+#endif /* _WIN32||_WIN64 */
+
+#include <new>
+#include "aligned_space.h"
+#include "tbb_stddef.h"
+#include "tbb_profiling.h"
+
+namespace tbb {
+//! Mutex that allows recursive mutex acquisition.
+/** Mutex that allows recursive mutex acquisition.
+    @ingroup synchronization */
+class recursive_mutex {
+public:
+    //! Construct unacquired recursive_mutex.
+    recursive_mutex() {
+#if TBB_USE_ASSERT || TBB_USE_THREADING_TOOLS
+        internal_construct();
+#else
+  #if _WIN32||_WIN64
+        InitializeCriticalSection(&impl);
+  #else
+        pthread_mutexattr_t mtx_attr;
+        int error_code = pthread_mutexattr_init( &mtx_attr );
+        if( error_code )
+            tbb::internal::handle_perror(error_code,"recursive_mutex: pthread_mutexattr_init failed");
+
+        pthread_mutexattr_settype( &mtx_attr, PTHREAD_MUTEX_RECURSIVE );
+        error_code = pthread_mutex_init( &impl, &mtx_attr );
+        if( error_code )
+            tbb::internal::handle_perror(error_code,"recursive_mutex: pthread_mutex_init failed");
+
+        pthread_mutexattr_destroy( &mtx_attr );
+  #endif /* _WIN32||_WIN64*/
+#endif /* TBB_USE_ASSERT */
+    };
+
+    ~recursive_mutex() {
+#if TBB_USE_ASSERT
+        internal_destroy();
+#else
+  #if _WIN32||_WIN64
+        DeleteCriticalSection(&impl);
+  #else
+        pthread_mutex_destroy(&impl); 
+
+  #endif /* _WIN32||_WIN64 */
+#endif /* TBB_USE_ASSERT */
+    };
+
+    class scoped_lock;
+    friend class scoped_lock;
+
+    //! The scoped locking pattern
+    /** It helps to avoid the common problem of forgetting to release lock.
+        It also nicely provides the "node" for queuing locks. */
+    class scoped_lock: internal::no_copy {
+    public:
+        //! Construct lock that has not acquired a recursive_mutex. 
+        scoped_lock() : my_mutex(NULL) {};
+
+        //! Acquire lock on given mutex.
+        scoped_lock( recursive_mutex& mutex ) {
+#if TBB_USE_ASSERT
+            my_mutex = &mutex; 
+#endif /* TBB_USE_ASSERT */
+            acquire( mutex );
+        }
+
+        //! Release lock (if lock is held).
+        ~scoped_lock() {
+            if( my_mutex ) 
+                release();
+        }
+
+        //! Acquire lock on given mutex.
+        void acquire( recursive_mutex& mutex ) {
+#if TBB_USE_ASSERT
+            internal_acquire( mutex );
+#else
+            my_mutex = &mutex;
+            mutex.lock();
+#endif /* TBB_USE_ASSERT */
+        }
+
+        //! Try acquire lock on given recursive_mutex.
+        bool try_acquire( recursive_mutex& mutex ) {
+#if TBB_USE_ASSERT
+            return internal_try_acquire( mutex );
+#else
+            bool result = mutex.try_lock();
+            if( result )
+                my_mutex = &mutex;
+            return result;
+#endif /* TBB_USE_ASSERT */
+        }
+
+        //! Release lock
+        void release() {
+#if TBB_USE_ASSERT
+            internal_release();
+#else
+            my_mutex->unlock();
+            my_mutex = NULL;
+#endif /* TBB_USE_ASSERT */
+        }
+
+    private:
+        //! The pointer to the current recursive_mutex to work
+        recursive_mutex* my_mutex;
+
+        //! All checks from acquire using mutex.state were moved here
+        void __TBB_EXPORTED_METHOD internal_acquire( recursive_mutex& m );
+
+        //! All checks from try_acquire using mutex.state were moved here
+        bool __TBB_EXPORTED_METHOD internal_try_acquire( recursive_mutex& m );
+
+        //! All checks from release using mutex.state were moved here
+        void __TBB_EXPORTED_METHOD internal_release();
+
+        friend class recursive_mutex;
+    };
+
+    // Mutex traits
+    static const bool is_rw_mutex = false;
+    static const bool is_recursive_mutex = true;
+    static const bool is_fair_mutex = false;
+
+    // C++0x compatibility interface
+    
+    //! Acquire lock
+    void lock() {
+#if TBB_USE_ASSERT
+        aligned_space<scoped_lock,1> tmp;
+        new(tmp.begin()) scoped_lock(*this);
+#else
+  #if _WIN32||_WIN64
+        EnterCriticalSection(&impl);
+  #else
+        pthread_mutex_lock(&impl);
+  #endif /* _WIN32||_WIN64 */
+#endif /* TBB_USE_ASSERT */
+    }
+
+    //! Try acquiring lock (non-blocking)
+    /** Return true if lock acquired; false otherwise. */
+    bool try_lock() {
+#if TBB_USE_ASSERT
+        aligned_space<scoped_lock,1> tmp;
+        return (new(tmp.begin()) scoped_lock)->internal_try_acquire(*this);
+#else        
+  #if _WIN32||_WIN64
+        return TryEnterCriticalSection(&impl)!=0;
+  #else
+        return pthread_mutex_trylock(&impl)==0;
+  #endif /* _WIN32||_WIN64 */
+#endif /* TBB_USE_ASSERT */
+    }
+
+    //! Release lock
+    void unlock() {
+#if TBB_USE_ASSERT
+        aligned_space<scoped_lock,1> tmp;
+        scoped_lock& s = *tmp.begin();
+        s.my_mutex = this;
+        s.internal_release();
+#else
+  #if _WIN32||_WIN64
+        LeaveCriticalSection(&impl);
+  #else
+        pthread_mutex_unlock(&impl);
+  #endif /* _WIN32||_WIN64 */
+#endif /* TBB_USE_ASSERT */
+    }
+
+private:
+#if _WIN32||_WIN64
+    CRITICAL_SECTION impl;
+    enum state_t {
+        INITIALIZED=0x1234,
+        DESTROYED=0x789A,
+    } state;
+#else
+    pthread_mutex_t impl;
+#endif /* _WIN32||_WIN64 */
+
+    //! All checks from mutex constructor using mutex.state were moved here
+    void __TBB_EXPORTED_METHOD internal_construct();
+
+    //! All checks from mutex destructor using mutex.state were moved here
+    void __TBB_EXPORTED_METHOD internal_destroy();
+};
+
+__TBB_DEFINE_PROFILING_SET_NAME(recursive_mutex)
+
+} // namespace tbb 
+
+#endif /* __TBB_recursive_mutex_H */
diff --git a/dep/tbb/include/tbb/scalable_allocator.h b/dep/tbb/include/tbb/scalable_allocator.h
new file mode 100644
index 000000000..aca27a736
--- /dev/null
+++ b/dep/tbb/include/tbb/scalable_allocator.h
@@ -0,0 +1,205 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_scalable_allocator_H
+#define __TBB_scalable_allocator_H
+/** @file */
+
+#include <stddef.h> /* Need ptrdiff_t and size_t from here. */
+
+#if !defined(__cplusplus) && __ICC==1100
+    #pragma warning (push)
+    #pragma warning (disable: 991)
+#endif
+
+#ifdef __cplusplus
+extern "C" {
+#endif /* __cplusplus */
+
+#if _MSC_VER >= 1400
+#define __TBB_EXPORTED_FUNC   __cdecl
+#else
+#define __TBB_EXPORTED_FUNC
+#endif
+
+/** The "malloc" analogue to allocate block of memory of size bytes.
+  * @ingroup memory_allocation */
+void * __TBB_EXPORTED_FUNC scalable_malloc (size_t size);
+
+/** The "free" analogue to discard a previously allocated piece of memory.
+    @ingroup memory_allocation */
+void   __TBB_EXPORTED_FUNC scalable_free (void* ptr);
+
+/** The "realloc" analogue complementing scalable_malloc.
+    @ingroup memory_allocation */
+void * __TBB_EXPORTED_FUNC scalable_realloc (void* ptr, size_t size);
+
+/** The "calloc" analogue complementing scalable_malloc.
+    @ingroup memory_allocation */
+void * __TBB_EXPORTED_FUNC scalable_calloc (size_t nobj, size_t size);
+
+/** The "posix_memalign" analogue.
+    @ingroup memory_allocation */
+int __TBB_EXPORTED_FUNC scalable_posix_memalign (void** memptr, size_t alignment, size_t size);
+
+/** The "_aligned_malloc" analogue.
+    @ingroup memory_allocation */
+void * __TBB_EXPORTED_FUNC scalable_aligned_malloc (size_t size, size_t alignment);
+
+/** The "_aligned_realloc" analogue.
+    @ingroup memory_allocation */
+void * __TBB_EXPORTED_FUNC scalable_aligned_realloc (void* ptr, size_t size, size_t alignment);
+
+/** The "_aligned_free" analogue.
+    @ingroup memory_allocation */
+void __TBB_EXPORTED_FUNC scalable_aligned_free (void* ptr);
+
+/** The analogue of _msize/malloc_size/malloc_usable_size.
+    Returns the usable size of a memory block previously allocated by scalable_*,
+    or 0 (zero) if ptr does not point to such a block.
+    @ingroup memory_allocation */
+size_t __TBB_EXPORTED_FUNC scalable_msize (void* ptr);
+
+#ifdef __cplusplus
+} /* extern "C" */
+#endif /* __cplusplus */
+
+#ifdef __cplusplus
+
+#include <new>      /* To use new with the placement argument */
+
+/* Ensure that including this header does not cause implicit linkage with TBB */
+#ifndef __TBB_NO_IMPLICIT_LINKAGE
+    #define __TBB_NO_IMPLICIT_LINKAGE 1
+    #include "tbb_stddef.h"
+    #undef  __TBB_NO_IMPLICIT_LINKAGE
+#else
+    #include "tbb_stddef.h"
+#endif
+
+
+namespace tbb {
+
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+    // Workaround for erroneous "unreferenced parameter" warning in method destroy.
+    #pragma warning (push)
+    #pragma warning (disable: 4100)
+#endif
+
+//! Meets "allocator" requirements of ISO C++ Standard, Section 20.1.5
+/** The members are ordered the same way they are in section 20.4.1
+    of the ISO C++ standard.
+    @ingroup memory_allocation */
+template<typename T>
+class scalable_allocator {
+public:
+    typedef typename internal::allocator_type<T>::value_type value_type;
+    typedef value_type* pointer;
+    typedef const value_type* const_pointer;
+    typedef value_type& reference;
+    typedef const value_type& const_reference;
+    typedef size_t size_type;
+    typedef ptrdiff_t difference_type;
+    template<class U> struct rebind {
+        typedef scalable_allocator<U> other;
+    };
+
+    scalable_allocator() throw() {}
+    scalable_allocator( const scalable_allocator& ) throw() {}
+    template<typename U> scalable_allocator(const scalable_allocator<U>&) throw() {}
+
+    pointer address(reference x) const {return &x;}
+    const_pointer address(const_reference x) const {return &x;}
+
+    //! Allocate space for n objects.
+    pointer allocate( size_type n, const void* /*hint*/ =0 ) {
+        return static_cast<pointer>( scalable_malloc( n * sizeof(value_type) ) );
+    }
+
+    //! Free previously allocated block of memory
+    void deallocate( pointer p, size_type ) {
+        scalable_free( p );
+    }
+
+    //! Largest value for which method allocate might succeed.
+    size_type max_size() const throw() {
+        size_type absolutemax = static_cast<size_type>(-1) / sizeof (value_type);
+        return (absolutemax > 0 ? absolutemax : 1);
+    }
+    void construct( pointer p, const value_type& val ) { new(static_cast<void*>(p)) value_type(val); }
+    void destroy( pointer p ) {p->~value_type();}
+};
+
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+    #pragma warning (pop)
+#endif // warning 4100 is back
+
+//! Analogous to std::allocator<void>, as defined in ISO C++ Standard, Section 20.4.1
+/** @ingroup memory_allocation */
+template<>
+class scalable_allocator<void> {
+public:
+    typedef void* pointer;
+    typedef const void* const_pointer;
+    typedef void value_type;
+    template<class U> struct rebind {
+        typedef scalable_allocator<U> other;
+    };
+};
+
+template<typename T, typename U>
+inline bool operator==( const scalable_allocator<T>&, const scalable_allocator<U>& ) {return true;}
+
+template<typename T, typename U>
+inline bool operator!=( const scalable_allocator<T>&, const scalable_allocator<U>& ) {return false;}
+
+} // namespace tbb
+
+#if _MSC_VER
+    #if __TBB_BUILD && !defined(__TBBMALLOC_NO_IMPLICIT_LINKAGE)
+        #define __TBBMALLOC_NO_IMPLICIT_LINKAGE 1
+    #endif
+
+    #if !__TBBMALLOC_NO_IMPLICIT_LINKAGE
+        #ifdef _DEBUG
+            #pragma comment(lib, "tbbmalloc_debug.lib")
+        #else
+            #pragma comment(lib, "tbbmalloc.lib")
+        #endif
+    #endif
+
+
+#endif
+
+#endif /* __cplusplus */
+
+#if !defined(__cplusplus) && __ICC==1100
+    #pragma warning (pop)
+#endif // ICC 11.0 warning 991 is back
+
+#endif /* __TBB_scalable_allocator_H */
diff --git a/dep/tbb/include/tbb/spin_mutex.h b/dep/tbb/include/tbb/spin_mutex.h
new file mode 100644
index 000000000..446821a70
--- /dev/null
+++ b/dep/tbb/include/tbb/spin_mutex.h
@@ -0,0 +1,192 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_spin_mutex_H
+#define __TBB_spin_mutex_H
+
+#include <cstddef>
+#include <new>
+#include "aligned_space.h"
+#include "tbb_stddef.h"
+#include "tbb_machine.h"
+#include "tbb_profiling.h"
+
+namespace tbb {
+
+//! A lock that occupies a single byte.
+/** A spin_mutex is a spin mutex that fits in a single byte.  
+    It should be used only for locking short critical sections 
+    (typically &lt;20 instructions) when fairness is not an issue.  
+    If zero-initialized, the mutex is considered unheld.
+    @ingroup synchronization */
+class spin_mutex {
+    //! 0 if lock is released, 1 if lock is acquired.
+    unsigned char flag;
+
+public:
+    //! Construct unacquired lock.
+    /** Equivalent to zero-initialization of *this. */
+    spin_mutex() : flag(0) {
+#if TBB_USE_THREADING_TOOLS
+        internal_construct();
+#endif
+    }
+
+    //! Represents acquisition of a mutex.
+    class scoped_lock : internal::no_copy {
+    private:
+        //! Points to currently held mutex, or NULL if no lock is held.
+        spin_mutex* my_mutex; 
+
+        //! Value to store into spin_mutex::flag to unlock the mutex.
+        internal::uintptr my_unlock_value;
+
+        //! Like acquire, but with ITT instrumentation.
+        void __TBB_EXPORTED_METHOD internal_acquire( spin_mutex& m );
+
+        //! Like try_acquire, but with ITT instrumentation.
+        bool __TBB_EXPORTED_METHOD internal_try_acquire( spin_mutex& m );
+
+        //! Like release, but with ITT instrumentation.
+        void __TBB_EXPORTED_METHOD internal_release();
+
+        friend class spin_mutex;
+
+    public:
+        //! Construct without acquiring a mutex.
+        scoped_lock() : my_mutex(NULL), my_unlock_value(0) {}
+
+        //! Construct and acquire lock on a mutex.
+        scoped_lock( spin_mutex& m ) { 
+#if TBB_USE_THREADING_TOOLS||TBB_USE_ASSERT
+            my_mutex=NULL;
+            internal_acquire(m);
+#else
+            my_unlock_value = __TBB_LockByte(m.flag);
+            my_mutex=&m;
+#endif /* TBB_USE_THREADING_TOOLS||TBB_USE_ASSERT*/
+        }
+
+        //! Acquire lock.
+        void acquire( spin_mutex& m ) {
+#if TBB_USE_THREADING_TOOLS||TBB_USE_ASSERT
+            internal_acquire(m);
+#else
+            my_unlock_value = __TBB_LockByte(m.flag);
+            my_mutex = &m;
+#endif /* TBB_USE_THREADING_TOOLS||TBB_USE_ASSERT*/
+        }
+
+        //! Try acquiring lock (non-blocking)
+        /** Return true if lock acquired; false otherwise. */
+        bool try_acquire( spin_mutex& m ) {
+#if TBB_USE_THREADING_TOOLS||TBB_USE_ASSERT
+            return internal_try_acquire(m);
+#else
+            bool result = __TBB_TryLockByte(m.flag);
+            if( result ) {
+                my_unlock_value = 0;
+                my_mutex = &m;
+            }
+            return result;
+#endif /* TBB_USE_THREADING_TOOLS||TBB_USE_ASSERT*/
+        }
+
+        //! Release lock
+        void release() {
+#if TBB_USE_THREADING_TOOLS||TBB_USE_ASSERT
+            internal_release();
+#else
+            __TBB_store_with_release(my_mutex->flag, static_cast<unsigned char>(my_unlock_value));
+            my_mutex = NULL;
+#endif /* TBB_USE_THREADING_TOOLS||TBB_USE_ASSERT */
+        }
+
+        //! Destroy lock.  If holding a lock, releases the lock first.
+        ~scoped_lock() {
+            if( my_mutex ) {
+#if TBB_USE_THREADING_TOOLS||TBB_USE_ASSERT
+                internal_release();
+#else
+                __TBB_store_with_release(my_mutex->flag, static_cast<unsigned char>(my_unlock_value));
+#endif /* TBB_USE_THREADING_TOOLS||TBB_USE_ASSERT */
+            }
+        }
+    };
+
+    void __TBB_EXPORTED_METHOD internal_construct();
+
+    // Mutex traits
+    static const bool is_rw_mutex = false;
+    static const bool is_recursive_mutex = false;
+    static const bool is_fair_mutex = false;
+
+    // ISO C++0x compatibility methods
+
+    //! Acquire lock
+    void lock() {
+#if TBB_USE_THREADING_TOOLS
+        aligned_space<scoped_lock,1> tmp;
+        new(tmp.begin()) scoped_lock(*this);
+#else
+        __TBB_LockByte(flag);
+#endif /* TBB_USE_THREADING_TOOLS*/
+    }
+
+    //! Try acquiring lock (non-blocking)
+    /** Return true if lock acquired; false otherwise. */
+    bool try_lock() {
+#if TBB_USE_THREADING_TOOLS
+        aligned_space<scoped_lock,1> tmp;
+        return (new(tmp.begin()) scoped_lock)->internal_try_acquire(*this);
+#else
+        return __TBB_TryLockByte(flag);
+#endif /* TBB_USE_THREADING_TOOLS*/
+    }
+
+    //! Release lock
+    void unlock() {
+#if TBB_USE_THREADING_TOOLS
+        aligned_space<scoped_lock,1> tmp;
+        scoped_lock& s = *tmp.begin();
+        s.my_mutex = this;
+        s.my_unlock_value = 0;
+        s.internal_release();
+#else
+        __TBB_store_with_release(flag, 0);
+#endif /* TBB_USE_THREADING_TOOLS */
+    }
+
+    friend class scoped_lock;
+};
+
+__TBB_DEFINE_PROFILING_SET_NAME(spin_mutex)
+
+} // namespace tbb
+
+#endif /* __TBB_spin_mutex_H */
diff --git a/dep/tbb/include/tbb/spin_rw_mutex.h b/dep/tbb/include/tbb/spin_rw_mutex.h
new file mode 100644
index 000000000..229745b52
--- /dev/null
+++ b/dep/tbb/include/tbb/spin_rw_mutex.h
@@ -0,0 +1,229 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_spin_rw_mutex_H
+#define __TBB_spin_rw_mutex_H
+
+#include "tbb_stddef.h"
+#include "tbb_machine.h"
+#include "tbb_profiling.h"
+
+namespace tbb {
+
+class spin_rw_mutex_v3;
+typedef spin_rw_mutex_v3 spin_rw_mutex;
+
+//! Fast, unfair, spinning reader-writer lock with backoff and writer-preference
+/** @ingroup synchronization */
+class spin_rw_mutex_v3 {
+    //! @cond INTERNAL
+
+    //! Internal acquire write lock.
+    bool __TBB_EXPORTED_METHOD internal_acquire_writer();
+
+    //! Out of line code for releasing a write lock.  
+    /** This code is has debug checking and instrumentation for Intel(R) Thread Checker and Intel(R) Thread Profiler. */
+    void __TBB_EXPORTED_METHOD internal_release_writer();
+
+    //! Internal acquire read lock.
+    void __TBB_EXPORTED_METHOD internal_acquire_reader();
+
+    //! Internal upgrade reader to become a writer.
+    bool __TBB_EXPORTED_METHOD internal_upgrade();
+
+    //! Out of line code for downgrading a writer to a reader.   
+    /** This code is has debug checking and instrumentation for Intel(R) Thread Checker and Intel(R) Thread Profiler. */
+    void __TBB_EXPORTED_METHOD internal_downgrade();
+
+    //! Internal release read lock.
+    void __TBB_EXPORTED_METHOD internal_release_reader();
+
+    //! Internal try_acquire write lock.
+    bool __TBB_EXPORTED_METHOD internal_try_acquire_writer();
+
+    //! Internal try_acquire read lock.
+    bool __TBB_EXPORTED_METHOD internal_try_acquire_reader();
+
+    //! @endcond
+public:
+    //! Construct unacquired mutex.
+    spin_rw_mutex_v3() : state(0) {
+#if TBB_USE_THREADING_TOOLS
+        internal_construct();
+#endif
+    }
+
+#if TBB_USE_ASSERT
+    //! Destructor asserts if the mutex is acquired, i.e. state is zero.
+    ~spin_rw_mutex_v3() {
+        __TBB_ASSERT( !state, "destruction of an acquired mutex");
+    };
+#endif /* TBB_USE_ASSERT */
+
+    //! The scoped locking pattern
+    /** It helps to avoid the common problem of forgetting to release lock.
+        It also nicely provides the "node" for queuing locks. */
+    class scoped_lock : internal::no_copy {
+    public:
+        //! Construct lock that has not acquired a mutex.
+        /** Equivalent to zero-initialization of *this. */
+        scoped_lock() : mutex(NULL), is_writer(false) {}
+
+        //! Acquire lock on given mutex.
+        /** Upon entry, *this should not be in the "have acquired a mutex" state. */
+        scoped_lock( spin_rw_mutex& m, bool write = true ) : mutex(NULL) {
+            acquire(m, write);
+        }
+
+        //! Release lock (if lock is held).
+        ~scoped_lock() {
+            if( mutex ) release();
+        }
+
+        //! Acquire lock on given mutex.
+        void acquire( spin_rw_mutex& m, bool write = true ) {
+            __TBB_ASSERT( !mutex, "holding mutex already" );
+            is_writer = write; 
+            mutex = &m;
+            if( write ) mutex->internal_acquire_writer();
+            else        mutex->internal_acquire_reader();
+        }
+
+        //! Upgrade reader to become a writer.
+        /** Returns true if the upgrade happened without re-acquiring the lock and false if opposite */
+        bool upgrade_to_writer() {
+            __TBB_ASSERT( mutex, "lock is not acquired" );
+            __TBB_ASSERT( !is_writer, "not a reader" );
+            is_writer = true; 
+            return mutex->internal_upgrade();
+        }
+
+        //! Release lock.
+        void release() {
+            __TBB_ASSERT( mutex, "lock is not acquired" );
+            spin_rw_mutex *m = mutex; 
+            mutex = NULL;
+#if TBB_USE_THREADING_TOOLS||TBB_USE_ASSERT
+            if( is_writer ) m->internal_release_writer();
+            else            m->internal_release_reader();
+#else
+            if( is_writer ) __TBB_AtomicAND( &m->state, READERS ); 
+            else            __TBB_FetchAndAddWrelease( &m->state, -(intptr_t)ONE_READER);
+#endif /* TBB_USE_THREADING_TOOLS||TBB_USE_ASSERT */
+        }
+
+        //! Downgrade writer to become a reader.
+        bool downgrade_to_reader() {
+#if TBB_USE_THREADING_TOOLS||TBB_USE_ASSERT
+            __TBB_ASSERT( mutex, "lock is not acquired" );
+            __TBB_ASSERT( is_writer, "not a writer" );
+            mutex->internal_downgrade();
+#else
+            __TBB_FetchAndAddW( &mutex->state, ((intptr_t)ONE_READER-WRITER));
+#endif /* TBB_USE_THREADING_TOOLS||TBB_USE_ASSERT */
+            is_writer = false;
+
+            return true;
+        }
+
+        //! Try acquire lock on given mutex.
+        bool try_acquire( spin_rw_mutex& m, bool write = true ) {
+            __TBB_ASSERT( !mutex, "holding mutex already" );
+            bool result;
+            is_writer = write; 
+            result = write? m.internal_try_acquire_writer()
+                          : m.internal_try_acquire_reader();
+            if( result ) 
+                mutex = &m;
+            return result;
+        }
+
+    private:
+        //! The pointer to the current mutex that is held, or NULL if no mutex is held.
+        spin_rw_mutex* mutex;
+
+        //! If mutex!=NULL, then is_writer is true if holding a writer lock, false if holding a reader lock.
+        /** Not defined if not holding a lock. */
+        bool is_writer;
+    };
+
+    // Mutex traits
+    static const bool is_rw_mutex = true;
+    static const bool is_recursive_mutex = false;
+    static const bool is_fair_mutex = false;
+
+    // ISO C++0x compatibility methods
+
+    //! Acquire writer lock
+    void lock() {internal_acquire_writer();}
+
+    //! Try acquiring writer lock (non-blocking)
+    /** Return true if lock acquired; false otherwise. */
+    bool try_lock() {return internal_try_acquire_writer();}
+
+    //! Release lock
+    void unlock() {
+#if TBB_USE_THREADING_TOOLS||TBB_USE_ASSERT
+        if( state&WRITER ) internal_release_writer();
+        else               internal_release_reader();
+#else
+        if( state&WRITER ) __TBB_AtomicAND( &state, READERS ); 
+        else               __TBB_FetchAndAddWrelease( &state, -(intptr_t)ONE_READER);
+#endif /* TBB_USE_THREADING_TOOLS||TBB_USE_ASSERT */
+    }
+
+    // Methods for reader locks that resemble ISO C++0x compatibility methods.
+
+    //! Acquire reader lock
+    void lock_read() {internal_acquire_reader();}
+
+    //! Try acquiring reader lock (non-blocking)
+    /** Return true if reader lock acquired; false otherwise. */
+    bool try_lock_read() {return internal_try_acquire_reader();}
+
+private:
+    typedef intptr_t state_t;
+    static const state_t WRITER = 1;
+    static const state_t WRITER_PENDING = 2;
+    static const state_t READERS = ~(WRITER | WRITER_PENDING);
+    static const state_t ONE_READER = 4;
+    static const state_t BUSY = WRITER | READERS;
+    //! State of lock
+    /** Bit 0 = writer is holding lock
+        Bit 1 = request by a writer to acquire lock (hint to readers to wait)
+        Bit 2..N = number of readers holding lock */
+    state_t state;
+
+    void __TBB_EXPORTED_METHOD internal_construct();
+};
+
+__TBB_DEFINE_PROFILING_SET_NAME(spin_rw_mutex)
+
+} // namespace tbb
+
+#endif /* __TBB_spin_rw_mutex_H */
diff --git a/dep/tbb/include/tbb/task.h b/dep/tbb/include/tbb/task.h
new file mode 100644
index 000000000..05a68985c
--- /dev/null
+++ b/dep/tbb/include/tbb/task.h
@@ -0,0 +1,787 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_task_H
+#define __TBB_task_H
+
+#include "tbb_stddef.h"
+#include "tbb_machine.h"
+
+namespace tbb {
+
+class task;
+class task_list;
+
+#if __TBB_EXCEPTIONS
+class task_group_context;
+#endif /* __TBB_EXCEPTIONS */
+
+//! @cond INTERNAL
+namespace internal {
+
+    class scheduler: no_copy {
+    public:
+        //! For internal use only
+        virtual void spawn( task& first, task*& next ) = 0;
+
+        //! For internal use only
+        virtual void wait_for_all( task& parent, task* child ) = 0;
+
+        //! For internal use only
+        virtual void spawn_root_and_wait( task& first, task*& next ) = 0;
+
+        //! Pure virtual destructor;
+        //  Have to have it just to shut up overzealous compilation warnings
+        virtual ~scheduler() = 0;
+    };
+
+    //! A reference count
+    /** Should always be non-negative.  A signed type is used so that underflow can be detected. */
+    typedef intptr reference_count;
+
+    //! An id as used for specifying affinity.
+    typedef unsigned short affinity_id;
+
+#if __TBB_EXCEPTIONS
+    struct context_list_node_t {
+        context_list_node_t *my_prev,
+                            *my_next;
+    };
+
+    class allocate_root_with_context_proxy: no_assign {
+        task_group_context& my_context;
+    public:
+        allocate_root_with_context_proxy ( task_group_context& ctx ) : my_context(ctx) {}
+        task& __TBB_EXPORTED_METHOD allocate( size_t size ) const;
+        void __TBB_EXPORTED_METHOD free( task& ) const;
+    };
+#endif /* __TBB_EXCEPTIONS */
+
+    class allocate_root_proxy: no_assign {
+    public:
+        static task& __TBB_EXPORTED_FUNC allocate( size_t size );
+        static void __TBB_EXPORTED_FUNC free( task& );
+    };
+
+    class allocate_continuation_proxy: no_assign {
+    public:
+        task& __TBB_EXPORTED_METHOD allocate( size_t size ) const;
+        void __TBB_EXPORTED_METHOD free( task& ) const;
+    };
+
+    class allocate_child_proxy: no_assign {
+    public:
+        task& __TBB_EXPORTED_METHOD allocate( size_t size ) const;
+        void __TBB_EXPORTED_METHOD free( task& ) const;
+    };
+
+    class allocate_additional_child_of_proxy: no_assign {
+        task& self;
+        task& parent;
+    public:
+        allocate_additional_child_of_proxy( task& self_, task& parent_ ) : self(self_), parent(parent_) {}
+        task& __TBB_EXPORTED_METHOD allocate( size_t size ) const;
+        void __TBB_EXPORTED_METHOD free( task& ) const;
+    };
+
+    class task_group_base;
+
+    //! Memory prefix to a task object.
+    /** This class is internal to the library.
+        Do not reference it directly, except within the library itself.
+        Fields are ordered in way that preserves backwards compatibility and yields 
+        good packing on typical 32-bit and 64-bit platforms.
+        @ingroup task_scheduling */
+    class task_prefix {
+    private:
+        friend class tbb::task;
+        friend class tbb::task_list;
+        friend class internal::scheduler;
+        friend class internal::allocate_root_proxy;
+        friend class internal::allocate_child_proxy;
+        friend class internal::allocate_continuation_proxy;
+        friend class internal::allocate_additional_child_of_proxy;
+        friend class internal::task_group_base;
+
+#if __TBB_EXCEPTIONS
+        //! Shared context that is used to communicate asynchronous state changes
+        /** Currently it is used to broadcast cancellation requests generated both 
+            by users and as the result of unhandled exceptions in the task::execute()
+            methods. */
+        task_group_context  *context;
+#endif /* __TBB_EXCEPTIONS */
+        
+        //! The scheduler that allocated the task, or NULL if the task is big.
+        /** Small tasks are pooled by the scheduler that allocated the task.
+            If a scheduler needs to free a small task allocated by another scheduler,
+            it returns the task to that other scheduler.  This policy avoids
+            memory space blowup issues for memory allocators that allocate from 
+            thread-specific pools. */
+        scheduler* origin;
+
+        //! The scheduler that owns the task.
+        scheduler* owner;
+
+        //! The task whose reference count includes me.
+        /** In the "blocking style" of programming, this field points to the parent task.
+            In the "continuation-passing style" of programming, this field points to the
+            continuation of the parent. */
+        tbb::task* parent;
+
+        //! Reference count used for synchronization.
+        /** In the "continuation-passing style" of programming, this field is
+            the difference of the number of allocated children minus the
+            number of children that have completed.
+            In the "blocking style" of programming, this field is one more than the difference. */
+        reference_count ref_count;
+
+        //! Obsolete. Used to be scheduling depth before TBB 2.2
+        /** Retained only for the sake of backward binary compatibility. **/
+        int depth;
+
+        //! A task::state_type, stored as a byte for compactness.
+        /** This state is exposed to users via method task::state(). */
+        unsigned char state;
+
+        //! Miscellaneous state that is not directly visible to users, stored as a byte for compactness.
+        /** 0x0 -> version 1.0 task
+            0x1 -> version 3.0 task
+            0x2 -> task_proxy
+            0x40 -> task has live ref_count */
+        unsigned char extra_state;
+
+        affinity_id affinity;
+
+        //! "next" field for list of task
+        tbb::task* next;
+
+        //! The task corresponding to this task_prefix.
+        tbb::task& task() {return *reinterpret_cast<tbb::task*>(this+1);}
+    };
+
+} // namespace internal
+//! @endcond
+
+#if __TBB_EXCEPTIONS
+
+#if TBB_USE_CAPTURED_EXCEPTION
+    class tbb_exception;
+#else
+    namespace internal {
+        class tbb_exception_ptr;
+    }
+#endif /* !TBB_USE_CAPTURED_EXCEPTION */
+
+//! Used to form groups of tasks 
+/** @ingroup task_scheduling 
+    The context services explicit cancellation requests from user code, and unhandled 
+    exceptions intercepted during tasks execution. Intercepting an exception results 
+    in generating internal cancellation requests (which is processed in exactly the 
+    same way as external ones). 
+
+    The context is associated with one or more root tasks and defines the cancellation 
+    group that includes all the descendants of the corresponding root task(s). Association 
+    is established when a context object is passed as an argument to the task::allocate_root()
+    method. See task_group_context::task_group_context for more details.
+    
+    The context can be bound to another one, and other contexts can be bound to it,
+    forming a tree-like structure: parent -> this -> children. Arrows here designate
+    cancellation propagation direction. If a task in a cancellation group is canceled
+    all the other tasks in this group and groups bound to it (as children) get canceled too.
+
+    IMPLEMENTATION NOTE: 
+    When adding new members to task_group_context or changing types of existing ones, 
+    update the size of both padding buffers (_leading_padding and _trailing_padding)
+    appropriately. See also VERSIONING NOTE at the constructor definition below. **/
+class task_group_context : internal::no_copy
+{
+private:
+#if TBB_USE_CAPTURED_EXCEPTION
+    typedef tbb_exception exception_container_type;
+#else
+    typedef internal::tbb_exception_ptr exception_container_type;
+#endif
+
+    enum version_traits_word_layout {
+        traits_offset = 16,
+        version_mask = 0xFFFF,
+        traits_mask = 0xFFFFul << traits_offset
+    };
+
+public:
+    enum kind_type {
+        isolated,
+        bound
+    };
+
+    enum traits_type {
+        exact_exception = 0x0001ul << traits_offset,
+        no_cancellation = 0x0002ul << traits_offset,
+        concurrent_wait = 0x0004ul << traits_offset,
+#if TBB_USE_CAPTURED_EXCEPTION
+        default_traits = 0
+#else
+        default_traits = exact_exception
+#endif /* !TBB_USE_CAPTURED_EXCEPTION */
+    };
+
+private:
+    union {
+        //! Flavor of this context: bound or isolated.
+        kind_type my_kind;
+        uintptr_t _my_kind_aligner;
+    };
+
+    //! Pointer to the context of the parent cancellation group. NULL for isolated contexts.
+    task_group_context *my_parent;
+
+    //! Used to form the thread specific list of contexts without additional memory allocation.
+    /** A context is included into the list of the current thread when its binding to 
+        its parent happens. Any context can be present in the list of one thread only. **/
+    internal::context_list_node_t my_node;
+
+    //! Leading padding protecting accesses to frequently used members from false sharing.
+    /** Read accesses to the field my_cancellation_requested are on the hot path inside
+        the scheduler. This padding ensures that this field never shares the same cache 
+        line with a local variable that is frequently written to. **/
+    char _leading_padding[internal::NFS_MaxLineSize - 
+                    2 * sizeof(uintptr_t)- sizeof(void*) - sizeof(internal::context_list_node_t)];
+    
+    //! Specifies whether cancellation was request for this task group.
+    uintptr_t my_cancellation_requested;
+    
+    //! Version for run-time checks and behavioral traits of the context.
+    /** Version occupies low 16 bits, and traits (zero or more ORed enumerators
+        from the traits_type enumerations) take the next 16 bits.
+        Original (zeroth) version of the context did not support any traits. **/
+    uintptr_t  my_version_and_traits;
+
+    //! Pointer to the container storing exception being propagated across this task group.
+    exception_container_type *my_exception;
+
+    //! Scheduler that registered this context in its thread specific list.
+    /** This field is not terribly necessary, but it allows to get a small performance 
+        benefit by getting us rid of using thread local storage. We do not care 
+        about extra memory it takes since this data structure is excessively padded anyway. **/
+    void *my_owner;
+
+    //! Trailing padding protecting accesses to frequently used members from false sharing
+    /** \sa _leading_padding **/
+    char _trailing_padding[internal::NFS_MaxLineSize - sizeof(intptr_t) - 2 * sizeof(void*)];
+
+public:
+    //! Default & binding constructor.
+    /** By default a bound context is created. That is this context will be bound 
+        (as child) to the context of the task calling task::allocate_root(this_context) 
+        method. Cancellation requests passed to the parent context are propagated
+        to all the contexts bound to it.
+
+        If task_group_context::isolated is used as the argument, then the tasks associated
+        with this context will never be affected by events in any other context.
+        
+        Creating isolated contexts involve much less overhead, but they have limited
+        utility. Normally when an exception occurs in an algorithm that has nested
+        ones running, it is desirably to have all the nested algorithms canceled 
+        as well. Such a behavior requires nested algorithms to use bound contexts.
+        
+        There is one good place where using isolated algorithms is beneficial. It is
+        a master thread. That is if a particular algorithm is invoked directly from
+        the master thread (not from a TBB task), supplying it with explicitly 
+        created isolated context will result in a faster algorithm startup.
+        
+        VERSIONING NOTE: 
+        Implementation(s) of task_group_context constructor(s) cannot be made 
+        entirely out-of-line because the run-time version must be set by the user 
+        code. This will become critically important for binary compatibility, if 
+        we ever have to change the size of the context object.
+
+        Boosting the runtime version will also be necessary whenever new fields
+        are introduced in the currently unused padding areas or the meaning of 
+        the existing fields is changed or extended. **/
+    task_group_context ( kind_type relation_with_parent = bound,
+                         uintptr_t traits = default_traits )
+        : my_kind(relation_with_parent)
+        , my_version_and_traits(1 | traits)
+    {
+        init();
+    }
+
+    __TBB_EXPORTED_METHOD ~task_group_context ();
+
+    //! Forcefully reinitializes the context after the task tree it was associated with is completed.
+    /** Because the method assumes that all the tasks that used to be associated with 
+        this context have already finished, calling it while the context is still 
+        in use somewhere in the task hierarchy leads to undefined behavior.
+        
+        IMPORTANT: This method is not thread safe!
+
+        The method does not change the context's parent if it is set. **/ 
+    void __TBB_EXPORTED_METHOD reset ();
+
+    //! Initiates cancellation of all tasks in this cancellation group and its subordinate groups.
+    /** \return false if cancellation has already been requested, true otherwise. 
+
+        Note that canceling never fails. When false is returned, it just means that 
+        another thread (or this one) has already sent cancellation request to this
+        context or to one of its ancestors (if this context is bound). It is guaranteed
+        that when this method is concurrently called on the same not yet cancelled 
+        context, true will be returned by one and only one invocation. **/
+    bool __TBB_EXPORTED_METHOD cancel_group_execution ();
+
+    //! Returns true if the context received cancellation request.
+    bool __TBB_EXPORTED_METHOD is_group_execution_cancelled () const;
+
+    //! Records the pending exception, and cancels the task group.
+    /** May be called only from inside a catch-block. If the context is already 
+        canceled, does nothing. 
+        The method brings the task group associated with this context exactly into 
+        the state it would be in, if one of its tasks threw the currently pending 
+        exception during its execution. In other words, it emulates the actions 
+        of the scheduler's dispatch loop exception handler. **/
+    void __TBB_EXPORTED_METHOD register_pending_exception ();
+
+protected:
+    //! Out-of-line part of the constructor. 
+    /** Singled out to ensure backward binary compatibility of the future versions. **/
+    void __TBB_EXPORTED_METHOD init ();
+
+private:
+    friend class task;
+    friend class internal::allocate_root_with_context_proxy;
+
+    static const kind_type binding_required = bound;
+    static const kind_type binding_completed = kind_type(bound+1);
+
+    //! Checks if any of the ancestors has a cancellation request outstanding, 
+    //! and propagates it back to descendants.
+    void propagate_cancellation_from_ancestors ();
+
+    //! For debugging purposes only.
+    bool is_alive () { 
+#if TBB_USE_DEBUG
+        return my_version_and_traits != 0xDeadBeef;
+#else
+        return true;
+#endif /* TBB_USE_DEBUG */
+    }
+}; // class task_group_context
+
+#endif /* __TBB_EXCEPTIONS */
+
+//! Base class for user-defined tasks.
+/** @ingroup task_scheduling */
+class task: internal::no_copy {
+    //! Set reference count
+    void __TBB_EXPORTED_METHOD internal_set_ref_count( int count );
+
+    //! Decrement reference count and return true if non-zero.
+    internal::reference_count __TBB_EXPORTED_METHOD internal_decrement_ref_count();
+
+protected:
+    //! Default constructor.
+    task() {prefix().extra_state=1;}
+
+public:
+    //! Destructor.
+    virtual ~task() {}
+
+    //! Should be overridden by derived classes.
+    virtual task* execute() = 0;
+
+    //! Enumeration of task states that the scheduler considers.
+    enum state_type {
+        //! task is running, and will be destroyed after method execute() completes.
+        executing,
+        //! task to be rescheduled.
+        reexecute,
+        //! task is in ready pool, or is going to be put there, or was just taken off.
+        ready,
+        //! task object is freshly allocated or recycled.
+        allocated,
+        //! task object is on free list, or is going to be put there, or was just taken off.
+        freed,
+        //! task to be recycled as continuation
+        recycle
+    };
+
+    //------------------------------------------------------------------------
+    // Allocating tasks
+    //------------------------------------------------------------------------
+
+    //! Returns proxy for overloaded new that allocates a root task.
+    static internal::allocate_root_proxy allocate_root() {
+        return internal::allocate_root_proxy();
+    }
+
+#if __TBB_EXCEPTIONS
+    //! Returns proxy for overloaded new that allocates a root task associated with user supplied context.
+    static internal::allocate_root_with_context_proxy allocate_root( task_group_context& ctx ) {
+        return internal::allocate_root_with_context_proxy(ctx);
+    }
+#endif /* __TBB_EXCEPTIONS */
+
+    //! Returns proxy for overloaded new that allocates a continuation task of *this.
+    /** The continuation's parent becomes the parent of *this. */
+    internal::allocate_continuation_proxy& allocate_continuation() {
+        return *reinterpret_cast<internal::allocate_continuation_proxy*>(this);
+    }
+
+    //! Returns proxy for overloaded new that allocates a child task of *this.
+    internal::allocate_child_proxy& allocate_child() {
+        return *reinterpret_cast<internal::allocate_child_proxy*>(this);
+    }
+
+    //! Like allocate_child, except that task's parent becomes "t", not this.
+    /** Typically used in conjunction with schedule_to_reexecute to implement while loops.
+        Atomically increments the reference count of t.parent() */
+    internal::allocate_additional_child_of_proxy allocate_additional_child_of( task& t ) {
+        return internal::allocate_additional_child_of_proxy(*this,t);
+    }
+
+    //! Destroy a task.
+    /** Usually, calling this method is unnecessary, because a task is
+        implicitly deleted after its execute() method runs.  However,
+        sometimes a task needs to be explicitly deallocated, such as
+        when a root task is used as the parent in spawn_and_wait_for_all. */
+    void __TBB_EXPORTED_METHOD destroy( task& victim );
+
+    //------------------------------------------------------------------------
+    // Recycling of tasks
+    //------------------------------------------------------------------------
+
+    //! Change this to be a continuation of its former self.
+    /** The caller must guarantee that the task's refcount does not become zero until
+        after the method execute() returns.  Typically, this is done by having
+        method execute() return a pointer to a child of the task.  If the guarantee
+        cannot be made, use method recycle_as_safe_continuation instead. 
+       
+        Because of the hazard, this method may be deprecated in the future. */
+    void recycle_as_continuation() {
+        __TBB_ASSERT( prefix().state==executing, "execute not running?" );
+        prefix().state = allocated;
+    }
+
+    //! Recommended to use, safe variant of recycle_as_continuation
+    /** For safety, it requires additional increment of ref_count. */
+    void recycle_as_safe_continuation() {
+        __TBB_ASSERT( prefix().state==executing, "execute not running?" );
+        prefix().state = recycle;
+    }
+
+    //! Change this to be a child of new_parent.
+    void recycle_as_child_of( task& new_parent ) {
+        internal::task_prefix& p = prefix();
+        __TBB_ASSERT( prefix().state==executing||prefix().state==allocated, "execute not running, or already recycled" );
+        __TBB_ASSERT( prefix().ref_count==0, "no child tasks allowed when recycled as a child" );
+        __TBB_ASSERT( p.parent==NULL, "parent must be null" );
+        __TBB_ASSERT( new_parent.prefix().state<=recycle, "corrupt parent's state" );
+        __TBB_ASSERT( new_parent.prefix().state!=freed, "parent already freed" );
+        p.state = allocated;
+        p.parent = &new_parent;
+#if __TBB_EXCEPTIONS
+        p.context = new_parent.prefix().context;
+#endif /* __TBB_EXCEPTIONS */
+    }
+
+    //! Schedule this for reexecution after current execute() returns.
+    /** Requires that this.execute() be running. */
+    void recycle_to_reexecute() {
+        __TBB_ASSERT( prefix().state==executing, "execute not running, or already recycled" );
+        __TBB_ASSERT( prefix().ref_count==0, "no child tasks allowed when recycled for reexecution" );
+        prefix().state = reexecute;
+    }
+
+    // All depth-related methods are obsolete, and are retained for the sake 
+    // of backward source compatibility only
+    intptr_t depth() const {return 0;}
+    void set_depth( intptr_t ) {}
+    void add_to_depth( int ) {}
+
+
+    //------------------------------------------------------------------------
+    // Spawning and blocking
+    //------------------------------------------------------------------------
+
+    //! Set reference count
+    void set_ref_count( int count ) {
+#if TBB_USE_THREADING_TOOLS||TBB_USE_ASSERT
+        internal_set_ref_count(count);
+#else
+        prefix().ref_count = count;
+#endif /* TBB_USE_THREADING_TOOLS||TBB_USE_ASSERT */
+    }
+
+    //! Atomically increment reference count.
+    /** Has acquire semantics */  
+    void increment_ref_count() {
+        __TBB_FetchAndIncrementWacquire( &prefix().ref_count );
+    }
+
+    //! Atomically decrement reference count.  
+    /** Has release semanics. */  
+    int decrement_ref_count() {
+#if TBB_USE_THREADING_TOOLS||TBB_USE_ASSERT
+        return int(internal_decrement_ref_count());
+#else
+        return int(__TBB_FetchAndDecrementWrelease( &prefix().ref_count ))-1;
+#endif /* TBB_USE_THREADING_TOOLS||TBB_USE_ASSERT */
+    }
+
+    //! Schedule task for execution when a worker becomes available.
+    /** After all children spawned so far finish their method task::execute,
+        their parent's method task::execute may start running.  Therefore, it
+        is important to ensure that at least one child has not completed until
+        the parent is ready to run. */
+    void spawn( task& child ) {
+        prefix().owner->spawn( child, child.prefix().next );
+    }
+
+    //! Spawn multiple tasks and clear list.
+    void spawn( task_list& list );
+
+    //! Similar to spawn followed by wait_for_all, but more efficient.
+    void spawn_and_wait_for_all( task& child ) {
+        prefix().owner->wait_for_all( *this, &child );
+    }
+
+    //! Similar to spawn followed by wait_for_all, but more efficient.
+    void __TBB_EXPORTED_METHOD spawn_and_wait_for_all( task_list& list );
+
+    //! Spawn task allocated by allocate_root, wait for it to complete, and deallocate it.
+    /** The thread that calls spawn_root_and_wait must be the same thread
+        that allocated the task. */
+    static void spawn_root_and_wait( task& root ) {
+        root.prefix().owner->spawn_root_and_wait( root, root.prefix().next );
+    }
+
+    //! Spawn root tasks on list and wait for all of them to finish.
+    /** If there are more tasks than worker threads, the tasks are spawned in
+        order of front to back. */
+    static void spawn_root_and_wait( task_list& root_list );
+
+    //! Wait for reference count to become one, and set reference count to zero.
+    /** Works on tasks while waiting. */
+    void wait_for_all() {
+        prefix().owner->wait_for_all( *this, NULL );
+    }
+
+    //! The innermost task being executed or destroyed by the current thread at the moment.
+    static task& __TBB_EXPORTED_FUNC self();
+
+    //! task on whose behalf this task is working, or NULL if this is a root.
+    task* parent() const {return prefix().parent;}
+
+#if __TBB_EXCEPTIONS
+    //! Shared context that is used to communicate asynchronous state changes
+    task_group_context* context() {return prefix().context;}
+#endif /* __TBB_EXCEPTIONS */   
+
+    //! True if task is owned by different thread than thread that owns its parent.
+    bool is_stolen_task() const {
+        internal::task_prefix& p = prefix();
+        internal::task_prefix& q = parent()->prefix();
+        return p.owner!=q.owner;
+    }
+
+    //------------------------------------------------------------------------
+    // Debugging
+    //------------------------------------------------------------------------
+
+    //! Current execution state
+    state_type state() const {return state_type(prefix().state);}
+
+    //! The internal reference count.
+    int ref_count() const {
+#if TBB_USE_ASSERT
+        internal::reference_count ref_count = prefix().ref_count;
+        __TBB_ASSERT( ref_count==int(ref_count), "integer overflow error");
+#endif
+        return int(prefix().ref_count);
+    }
+
+    //! Obsolete, and only retained for the sake of backward compatibility. Always returns true.
+    bool __TBB_EXPORTED_METHOD is_owned_by_current_thread() const;
+
+    //------------------------------------------------------------------------
+    // Affinity
+    //------------------------------------------------------------------------
+ 
+    //! An id as used for specifying affinity.
+    /** Guaranteed to be integral type.  Value of 0 means no affinity. */
+    typedef internal::affinity_id affinity_id;
+
+    //! Set affinity for this task.
+    void set_affinity( affinity_id id ) {prefix().affinity = id;}
+
+    //! Current affinity of this task
+    affinity_id affinity() const {return prefix().affinity;}
+
+    //! Invoked by scheduler to notify task that it ran on unexpected thread.
+    /** Invoked before method execute() runs, if task is stolen, or task has 
+        affinity but will be executed on another thread. 
+
+        The default action does nothing. */
+    virtual void __TBB_EXPORTED_METHOD note_affinity( affinity_id id );
+
+#if __TBB_EXCEPTIONS
+    //! Initiates cancellation of all tasks in this cancellation group and its subordinate groups.
+    /** \return false if cancellation has already been requested, true otherwise. **/
+    bool cancel_group_execution () { return prefix().context->cancel_group_execution(); }
+
+    //! Returns true if the context received cancellation request.
+    bool is_cancelled () const { return prefix().context->is_group_execution_cancelled(); }
+#endif /* __TBB_EXCEPTIONS */
+
+private:
+    friend class task_list;
+    friend class internal::scheduler;
+    friend class internal::allocate_root_proxy;
+#if __TBB_EXCEPTIONS
+    friend class internal::allocate_root_with_context_proxy;
+#endif /* __TBB_EXCEPTIONS */
+    friend class internal::allocate_continuation_proxy;
+    friend class internal::allocate_child_proxy;
+    friend class internal::allocate_additional_child_of_proxy;
+    
+    friend class internal::task_group_base;
+
+    //! Get reference to corresponding task_prefix.
+    /** Version tag prevents loader on Linux from using the wrong symbol in debug builds. **/
+    internal::task_prefix& prefix( internal::version_tag* = NULL ) const {
+        return reinterpret_cast<internal::task_prefix*>(const_cast<task*>(this))[-1];
+    }
+}; // class task
+
+//! task that does nothing.  Useful for synchronization.
+/** @ingroup task_scheduling */
+class empty_task: public task {
+    /*override*/ task* execute() {
+        return NULL;
+    }
+};
+
+//! A list of children.
+/** Used for method task::spawn_children
+    @ingroup task_scheduling */
+class task_list: internal::no_copy {
+private:
+    task* first;
+    task** next_ptr;
+    friend class task;
+public:
+    //! Construct empty list
+    task_list() : first(NULL), next_ptr(&first) {}
+
+    //! Destroys the list, but does not destroy the task objects.
+    ~task_list() {}
+
+    //! True if list if empty; false otherwise.
+    bool empty() const {return !first;}
+
+    //! Push task onto back of list.
+    void push_back( task& task ) {
+        task.prefix().next = NULL;
+        *next_ptr = &task;
+        next_ptr = &task.prefix().next;
+    }
+
+    //! Pop the front task from the list.
+    task& pop_front() {
+        __TBB_ASSERT( !empty(), "attempt to pop item from empty task_list" );
+        task* result = first;
+        first = result->prefix().next;
+        if( !first ) next_ptr = &first;
+        return *result;
+    }
+
+    //! Clear the list
+    void clear() {
+        first=NULL;
+        next_ptr=&first;
+    }
+};
+
+inline void task::spawn( task_list& list ) {
+    if( task* t = list.first ) {
+        prefix().owner->spawn( *t, *list.next_ptr );
+        list.clear();
+    }
+}
+
+inline void task::spawn_root_and_wait( task_list& root_list ) {
+    if( task* t = root_list.first ) {
+        t->prefix().owner->spawn_root_and_wait( *t, *root_list.next_ptr );
+        root_list.clear();
+    }
+}
+
+} // namespace tbb
+
+inline void *operator new( size_t bytes, const tbb::internal::allocate_root_proxy& ) {
+    return &tbb::internal::allocate_root_proxy::allocate(bytes);
+}
+
+inline void operator delete( void* task, const tbb::internal::allocate_root_proxy& ) {
+    tbb::internal::allocate_root_proxy::free( *static_cast<tbb::task*>(task) );
+}
+
+#if __TBB_EXCEPTIONS
+inline void *operator new( size_t bytes, const tbb::internal::allocate_root_with_context_proxy& p ) {
+    return &p.allocate(bytes);
+}
+
+inline void operator delete( void* task, const tbb::internal::allocate_root_with_context_proxy& p ) {
+    p.free( *static_cast<tbb::task*>(task) );
+}
+#endif /* __TBB_EXCEPTIONS */
+
+inline void *operator new( size_t bytes, const tbb::internal::allocate_continuation_proxy& p ) {
+    return &p.allocate(bytes);
+}
+
+inline void operator delete( void* task, const tbb::internal::allocate_continuation_proxy& p ) {
+    p.free( *static_cast<tbb::task*>(task) );
+}
+
+inline void *operator new( size_t bytes, const tbb::internal::allocate_child_proxy& p ) {
+    return &p.allocate(bytes);
+}
+
+inline void operator delete( void* task, const tbb::internal::allocate_child_proxy& p ) {
+    p.free( *static_cast<tbb::task*>(task) );
+}
+
+inline void *operator new( size_t bytes, const tbb::internal::allocate_additional_child_of_proxy& p ) {
+    return &p.allocate(bytes);
+}
+
+inline void operator delete( void* task, const tbb::internal::allocate_additional_child_of_proxy& p ) {
+    p.free( *static_cast<tbb::task*>(task) );
+}
+
+#endif /* __TBB_task_H */
diff --git a/dep/tbb/include/tbb/task_group.h b/dep/tbb/include/tbb/task_group.h
new file mode 100644
index 000000000..b3e6cf224
--- /dev/null
+++ b/dep/tbb/include/tbb/task_group.h
@@ -0,0 +1,228 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_task_group_H
+#define __TBB_task_group_H
+
+#include "task.h"
+#include <exception>
+
+namespace tbb {
+
+template<typename F>
+class task_handle {
+    F my_func;
+
+public:
+    task_handle( const F& f ) : my_func(f) {}
+
+    void operator()() { my_func(); }
+};
+
+enum task_group_status {
+    not_complete,
+    complete,
+    canceled
+};
+
+namespace internal {
+
+// Suppress gratuitous warnings from icc 11.0 when lambda expressions are used in instances of function_task.
+//#pragma warning(disable: 588)
+
+template<typename F>
+class function_task : public task {
+    F my_func;
+    /*override*/ task* execute() {
+        my_func();
+        return NULL;
+    }
+public:
+    function_task( const F& f ) : my_func(f) {}
+};
+
+template<typename F>
+class task_handle_task : public task {
+    task_handle<F>& my_handle;
+    /*override*/ task* execute() {
+        my_handle();
+        return NULL;
+    }
+public:
+    task_handle_task( task_handle<F>& h ) : my_handle(h) {}
+};
+
+class task_group_base : internal::no_copy {
+protected:
+    empty_task* my_root;
+    task_group_context my_context;
+
+    task& owner () { return *my_root; }
+
+    template<typename F>
+    task_group_status internal_run_and_wait( F& f ) {
+        try {
+            if ( !my_context.is_group_execution_cancelled() )
+                f();
+        } catch ( ... ) {
+            my_context.register_pending_exception();
+        }
+        return wait();
+    }
+
+    template<typename F, typename Task>
+    void internal_run( F& f ) {
+        owner().spawn( *new( owner().allocate_additional_child_of(*my_root) ) Task(f) );
+    }
+
+public:
+    task_group_base( uintptr_t traits = 0 )
+        : my_context(task_group_context::bound, task_group_context::default_traits | traits)
+    {
+        my_root = new( task::allocate_root(my_context) ) empty_task;
+        my_root->set_ref_count(1);
+    }
+
+    template<typename F>
+    void run( task_handle<F>& h ) {
+        internal_run< task_handle<F>, internal::task_handle_task<F> >( h );
+    }
+
+    task_group_status wait() {
+        try {
+            owner().prefix().owner->wait_for_all( *my_root, NULL );
+        } catch ( ... ) {
+            my_context.reset();
+            throw;
+        }
+        if ( my_context.is_group_execution_cancelled() ) {
+            my_context.reset();
+            return canceled;
+        }
+        return complete;
+    }
+
+    bool is_canceling() {
+        return my_context.is_group_execution_cancelled();
+    }
+
+    void cancel() {
+        my_context.cancel_group_execution();
+    }
+}; // class task_group_base
+
+} // namespace internal
+
+class task_group : public internal::task_group_base {
+public:
+    task_group () : task_group_base( task_group_context::concurrent_wait ) {}
+
+    ~task_group() try {
+        __TBB_ASSERT( my_root->ref_count() != 0, NULL );
+        if( my_root->ref_count() > 1 )
+            my_root->wait_for_all();
+        owner().destroy(*my_root);
+    }
+    catch (...) {
+        owner().destroy(*my_root);
+        throw;
+    }
+
+#if __SUNPRO_CC
+    template<typename F>
+    void run( task_handle<F>& h ) {
+        internal_run< task_handle<F>, internal::task_handle_task<F> >( h );
+    }
+#else
+    using task_group_base::run;
+#endif
+
+    template<typename F>
+    void run( const F& f ) {
+        internal_run< const F, internal::function_task<F> >( f );
+    }
+
+    template<typename F>
+    task_group_status run_and_wait( const F& f ) {
+        return internal_run_and_wait<const F>( f );
+    }
+
+    template<typename F>
+    task_group_status run_and_wait( task_handle<F>& h ) {
+      return internal_run_and_wait< task_handle<F> >( h );
+    }
+}; // class task_group
+
+class missing_wait : public std::exception {
+public:
+    /*override*/ 
+    const char* what() const throw() { return "wait() was not called on the structured_task_group"; }
+};
+
+class structured_task_group : public internal::task_group_base {
+public:
+    ~structured_task_group() {
+        if( my_root->ref_count() > 1 ) {
+            bool stack_unwinding_in_progress = std::uncaught_exception();
+            // Always attempt to do proper cleanup to avoid inevitable memory corruption 
+            // in case of missing wait (for the sake of better testability & debuggability)
+            if ( !is_canceling() )
+                cancel();
+            my_root->wait_for_all();
+            owner().destroy(*my_root);
+            if ( !stack_unwinding_in_progress )
+                throw missing_wait();
+        }
+        else
+            owner().destroy(*my_root);
+    }
+
+    template<typename F>
+    task_group_status run_and_wait ( task_handle<F>& h ) {
+        return internal_run_and_wait< task_handle<F> >( h );
+    }
+
+    task_group_status wait() {
+        __TBB_ASSERT ( my_root->ref_count() != 0, "wait() can be called only once during the structured_task_group lifetime" );
+        return task_group_base::wait();
+    }
+}; // class structured_task_group
+
+inline 
+bool is_current_task_group_canceling() {
+    return task::self().is_cancelled();
+}
+
+template<class F>
+task_handle<F> make_task( const F& f ) {
+    return task_handle<F>( f );
+}
+
+} // namespace tbb
+
+#endif /* __TBB_task_group_H */
diff --git a/dep/tbb/include/tbb/task_scheduler_init.h b/dep/tbb/include/tbb/task_scheduler_init.h
new file mode 100644
index 000000000..f817ccc37
--- /dev/null
+++ b/dep/tbb/include/tbb/task_scheduler_init.h
@@ -0,0 +1,106 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_task_scheduler_init_H
+#define __TBB_task_scheduler_init_H
+
+#include "tbb_stddef.h"
+
+namespace tbb {
+
+typedef std::size_t stack_size_type;
+
+//! @cond INTERNAL
+namespace internal {
+    //! Internal to library. Should not be used by clients.
+    /** @ingroup task_scheduling */
+    class scheduler;
+} // namespace internal
+//! @endcond
+
+//! Class representing reference to tbb scheduler.
+/** A thread must construct a task_scheduler_init, and keep it alive,
+    during the time that it uses the services of class task.
+    @ingroup task_scheduling */
+class task_scheduler_init: internal::no_copy {
+    /** NULL if not currently initialized. */
+    internal::scheduler* my_scheduler;
+public:
+
+    //! Typedef for number of threads that is automatic.
+    static const int automatic = -1;
+
+    //! Argument to initialize() or constructor that causes initialization to be deferred.
+    static const int deferred = -2;
+
+    //! Ensure that scheduler exists for this thread
+    /** A value of -1 lets tbb decide on the number of threads, which is typically 
+        the number of hardware threads. For production code, the default value of -1 
+        should be used, particularly if the client code is mixed with third party clients 
+        that might also use tbb.
+
+        The number_of_threads is ignored if any other task_scheduler_inits 
+        currently exist.  A thread may construct multiple task_scheduler_inits.  
+        Doing so does no harm because the underlying scheduler is reference counted. */
+    void __TBB_EXPORTED_METHOD initialize( int number_of_threads=automatic );
+
+    //! The overloaded method with stack size parameter
+    /** Overloading is necessary to preserve ABI compatibility */
+    void __TBB_EXPORTED_METHOD initialize( int number_of_threads, stack_size_type thread_stack_size );
+
+    //! Inverse of method initialize.
+    void __TBB_EXPORTED_METHOD terminate();
+
+    //! Shorthand for default constructor followed by call to intialize(number_of_threads).
+    task_scheduler_init( int number_of_threads=automatic, stack_size_type thread_stack_size=0 ) : my_scheduler(NULL)  {
+        initialize( number_of_threads, thread_stack_size );
+    }
+
+    //! Destroy scheduler for this thread if thread has no other live task_scheduler_inits.
+    ~task_scheduler_init() {
+        if( my_scheduler ) 
+            terminate();
+        internal::poison_pointer( my_scheduler );
+    }
+    //! Returns the number of threads tbb scheduler would create if initialized by default.
+    /** Result returned by this method does not depend on whether the scheduler 
+        has already been initialized.
+        
+        Because tbb 2.0 does not support blocking tasks yet, you may use this method
+        to boost the number of threads in the tbb's internal pool, if your tasks are 
+        doing I/O operations. The optimal number of additional threads depends on how
+        much time your tasks spend in the blocked state. */
+    static int __TBB_EXPORTED_FUNC default_num_threads ();
+
+    //! Returns true if scheduler is active (initialized); false otherwise
+    bool is_active() const { return my_scheduler != NULL; }
+};
+
+} // namespace tbb
+
+#endif /* __TBB_task_scheduler_init_H */
diff --git a/dep/tbb/include/tbb/task_scheduler_observer.h b/dep/tbb/include/tbb/task_scheduler_observer.h
new file mode 100644
index 000000000..ee8bd5df2
--- /dev/null
+++ b/dep/tbb/include/tbb/task_scheduler_observer.h
@@ -0,0 +1,74 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_task_scheduler_observer_H
+#define __TBB_task_scheduler_observer_H
+
+#include "atomic.h"
+
+#if __TBB_SCHEDULER_OBSERVER
+
+namespace tbb {
+
+namespace internal {
+
+class observer_proxy;
+
+class task_scheduler_observer_v3 {
+    friend class observer_proxy;
+    observer_proxy* my_proxy;
+    atomic<intptr> my_busy_count;
+public:
+    //! Enable or disable observation
+    void __TBB_EXPORTED_METHOD observe( bool state=true );
+
+    //! True if observation is enables; false otherwise.
+    bool is_observing() const {return my_proxy!=NULL;}
+
+    //! Construct observer with observation disabled.
+    task_scheduler_observer_v3() : my_proxy(NULL) {my_busy_count=0;}
+
+    //! Called by thread before first steal since observation became enabled
+    virtual void on_scheduler_entry( bool /*is_worker*/ ) {} 
+
+    //! Called by thread when it no longer takes part in task stealing.
+    virtual void on_scheduler_exit( bool /*is_worker*/ ) {}
+
+    //! Destructor
+    virtual ~task_scheduler_observer_v3() {observe(false);}
+};
+
+} // namespace internal
+
+typedef internal::task_scheduler_observer_v3 task_scheduler_observer;
+
+} // namespace tbb
+
+#endif /* __TBB_SCHEDULER_OBSERVER */
+
+#endif /* __TBB_task_scheduler_observer_H */
diff --git a/dep/tbb/include/tbb/tbb.h b/dep/tbb/include/tbb/tbb.h
new file mode 100644
index 000000000..4bac7bf48
--- /dev/null
+++ b/dep/tbb/include/tbb/tbb.h
@@ -0,0 +1,76 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_tbb_H
+#define __TBB_tbb_H
+
+/** 
+    This header bulk-includes declarations or definitions of all the functionality 
+    provided by TBB (save for malloc dependent headers). 
+
+    If you use only a few TBB constructs, consider including specific headers only.
+    Any header listed below can be included independently of others.
+**/
+
+#include "aligned_space.h"
+#include "atomic.h"
+#include "blocked_range.h"
+#include "blocked_range2d.h"
+#include "blocked_range3d.h"
+#include "cache_aligned_allocator.h"
+#include "concurrent_hash_map.h"
+#include "concurrent_queue.h"
+#include "concurrent_vector.h"
+#include "enumerable_thread_specific.h"
+#include "mutex.h"
+#include "null_mutex.h"
+#include "null_rw_mutex.h"
+#include "parallel_do.h"
+#include "parallel_for.h"
+#include "parallel_for_each.h"
+#include "parallel_invoke.h"
+#include "parallel_reduce.h"
+#include "parallel_scan.h"
+#include "parallel_sort.h"
+#include "partitioner.h"
+#include "pipeline.h"
+#include "queuing_mutex.h"
+#include "queuing_rw_mutex.h"
+#include "recursive_mutex.h"
+#include "spin_mutex.h"
+#include "spin_rw_mutex.h"
+#include "task.h"
+#include "task_group.h"
+#include "task_scheduler_init.h"
+#include "task_scheduler_observer.h"
+#include "tbb_allocator.h"
+#include "tbb_exception.h"
+#include "tbb_thread.h"
+#include "tick_count.h"
+
+#endif /* __TBB_tbb_H */
diff --git a/dep/tbb/include/tbb/tbb_allocator.h b/dep/tbb/include/tbb/tbb_allocator.h
new file mode 100644
index 000000000..aa1544b96
--- /dev/null
+++ b/dep/tbb/include/tbb/tbb_allocator.h
@@ -0,0 +1,203 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_tbb_allocator_H
+#define __TBB_tbb_allocator_H
+
+#include <new>
+#include <cstring>
+#include "tbb_stddef.h"
+
+namespace tbb {
+
+//! @cond INTERNAL
+namespace internal {
+
+    //! Deallocates memory using FreeHandler
+    /** The function uses scalable_free if scalable allocator is available and free if not*/
+    void __TBB_EXPORTED_FUNC deallocate_via_handler_v3( void *p );
+
+    //! Allocates memory using MallocHandler
+    /** The function uses scalable_malloc if scalable allocator is available and malloc if not*/
+    void* __TBB_EXPORTED_FUNC allocate_via_handler_v3( size_t n );
+
+    //! Returns true if standard malloc/free are used to work with memory.
+    bool __TBB_EXPORTED_FUNC is_malloc_used_v3();
+}
+//! @endcond
+
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+    // Workaround for erroneous "unreferenced parameter" warning in method destroy.
+    #pragma warning (push)
+    #pragma warning (disable: 4100)
+#endif
+
+//! Meets "allocator" requirements of ISO C++ Standard, Section 20.1.5
+/** The class selects the best memory allocation mechanism available 
+    from scalable_malloc and standard malloc.
+    The members are ordered the same way they are in section 20.4.1
+    of the ISO C++ standard.
+    @ingroup memory_allocation */
+template<typename T>
+class tbb_allocator {
+public:
+    typedef typename internal::allocator_type<T>::value_type value_type;
+    typedef value_type* pointer;
+    typedef const value_type* const_pointer;
+    typedef value_type& reference;
+    typedef const value_type& const_reference;
+    typedef size_t size_type;
+    typedef ptrdiff_t difference_type;
+    template<typename U> struct rebind {
+        typedef tbb_allocator<U> other;
+    };
+
+    //! Specifies current allocator
+    enum malloc_type {
+        scalable, 
+        standard
+    };
+
+    tbb_allocator() throw() {}
+    tbb_allocator( const tbb_allocator& ) throw() {}
+    template<typename U> tbb_allocator(const tbb_allocator<U>&) throw() {}
+
+    pointer address(reference x) const {return &x;}
+    const_pointer address(const_reference x) const {return &x;}
+    
+    //! Allocate space for n objects.
+    pointer allocate( size_type n, const void* /*hint*/ = 0) {
+        return pointer(internal::allocate_via_handler_v3( n * sizeof(value_type) ));
+    }
+
+    //! Free previously allocated block of memory.
+    void deallocate( pointer p, size_type ) {
+        internal::deallocate_via_handler_v3(p);        
+    }
+
+    //! Largest value for which method allocate might succeed.
+    size_type max_size() const throw() {
+        size_type max = static_cast<size_type>(-1) / sizeof (value_type);
+        return (max > 0 ? max : 1);
+    }
+    
+    //! Copy-construct value at location pointed to by p.
+    void construct( pointer p, const value_type& value ) {new(static_cast<void*>(p)) value_type(value);}
+
+    //! Destroy value at location pointed to by p.
+    void destroy( pointer p ) {p->~value_type();}
+
+    //! Returns current allocator
+    static malloc_type allocator_type() {
+        return internal::is_malloc_used_v3() ? standard : scalable;
+    }
+};
+
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+    #pragma warning (pop)
+#endif // warning 4100 is back
+
+//! Analogous to std::allocator<void>, as defined in ISO C++ Standard, Section 20.4.1
+/** @ingroup memory_allocation */
+template<> 
+class tbb_allocator<void> {
+public:
+    typedef void* pointer;
+    typedef const void* const_pointer;
+    typedef void value_type;
+    template<typename U> struct rebind {
+        typedef tbb_allocator<U> other;
+    };
+};
+
+template<typename T, typename U>
+inline bool operator==( const tbb_allocator<T>&, const tbb_allocator<U>& ) {return true;}
+
+template<typename T, typename U>
+inline bool operator!=( const tbb_allocator<T>&, const tbb_allocator<U>& ) {return false;}
+
+//! Meets "allocator" requirements of ISO C++ Standard, Section 20.1.5
+/** The class is an adapter over an actual allocator that fills the allocation
+    using memset function with template argument C as the value.
+    The members are ordered the same way they are in section 20.4.1
+    of the ISO C++ standard.
+    @ingroup memory_allocation */
+template <typename T, template<typename X> class Allocator = tbb_allocator>
+class zero_allocator : public Allocator<T>
+{
+public:
+    typedef Allocator<T> base_allocator_type;
+    typedef typename base_allocator_type::value_type value_type;
+    typedef typename base_allocator_type::pointer pointer;
+    typedef typename base_allocator_type::const_pointer const_pointer;
+    typedef typename base_allocator_type::reference reference;
+    typedef typename base_allocator_type::const_reference const_reference;
+    typedef typename base_allocator_type::size_type size_type;
+    typedef typename base_allocator_type::difference_type difference_type;
+    template<typename U> struct rebind {
+        typedef zero_allocator<U, Allocator> other;
+    };
+
+    zero_allocator() throw() { }
+    zero_allocator(const zero_allocator &a) throw() : base_allocator_type( a ) { }
+    template<typename U>
+    zero_allocator(const zero_allocator<U> &a) throw() : base_allocator_type( Allocator<U>( a ) ) { }
+
+    pointer allocate(const size_type n, const void *hint = 0 ) {
+        pointer ptr = base_allocator_type::allocate( n, hint );
+        std::memset( ptr, 0, n * sizeof(value_type) );
+        return ptr;
+    }
+};
+
+//! Analogous to std::allocator<void>, as defined in ISO C++ Standard, Section 20.4.1
+/** @ingroup memory_allocation */
+template<template<typename T> class Allocator> 
+class zero_allocator<void, Allocator> : public Allocator<void> {
+public:
+    typedef Allocator<void> base_allocator_type;
+    typedef typename base_allocator_type::value_type value_type;
+    typedef typename base_allocator_type::pointer pointer;
+    typedef typename base_allocator_type::const_pointer const_pointer;
+    template<typename U> struct rebind {
+        typedef zero_allocator<U, Allocator> other;
+    };
+};
+
+template<typename T1, template<typename X1> class B1, typename T2, template<typename X2> class B2>
+inline bool operator==( const zero_allocator<T1,B1> &a, const zero_allocator<T2,B2> &b) {
+    return static_cast< B1<T1> >(a) == static_cast< B2<T2> >(b);
+}
+template<typename T1, template<typename X1> class B1, typename T2, template<typename X2> class B2>
+inline bool operator!=( const zero_allocator<T1,B1> &a, const zero_allocator<T2,B2> &b) {
+    return static_cast< B1<T1> >(a) != static_cast< B2<T2> >(b);
+}
+
+} // namespace tbb 
+
+#endif /* __TBB_tbb_allocator_H */
diff --git a/dep/tbb/include/tbb/tbb_config.h b/dep/tbb/include/tbb/tbb_config.h
new file mode 100644
index 000000000..fad5bf214
--- /dev/null
+++ b/dep/tbb/include/tbb/tbb_config.h
@@ -0,0 +1,161 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_tbb_config_H
+#define __TBB_tbb_config_H
+
+/** This header is supposed to contain macro definitions and C style comments only.
+    The macros defined here are intended to control such aspects of TBB build as 
+    - compilation modes
+    - feature sets
+    - workarounds presence 
+**/
+
+/** Compilation modes **/
+
+#ifndef TBB_USE_DEBUG
+#ifdef TBB_DO_ASSERT
+#define TBB_USE_DEBUG TBB_DO_ASSERT
+#else
+#define TBB_USE_DEBUG 0
+#endif /* TBB_DO_ASSERT */
+#else
+#define TBB_DO_ASSERT TBB_USE_DEBUG
+#endif /* TBB_USE_DEBUG */
+
+#ifndef TBB_USE_ASSERT
+#ifdef TBB_DO_ASSERT
+#define TBB_USE_ASSERT TBB_DO_ASSERT
+#else 
+#define TBB_USE_ASSERT TBB_USE_DEBUG
+#endif /* TBB_DO_ASSERT */
+#endif /* TBB_USE_ASSERT */
+
+#ifndef TBB_USE_THREADING_TOOLS
+#ifdef TBB_DO_THREADING_TOOLS
+#define TBB_USE_THREADING_TOOLS TBB_DO_THREADING_TOOLS
+#else 
+#define TBB_USE_THREADING_TOOLS TBB_USE_DEBUG
+#endif /* TBB_DO_THREADING_TOOLS */
+#endif /* TBB_USE_THREADING_TOOLS */
+
+#ifndef TBB_USE_PERFORMANCE_WARNINGS
+#ifdef TBB_PERFORMANCE_WARNINGS
+#define TBB_USE_PERFORMANCE_WARNINGS TBB_PERFORMANCE_WARNINGS
+#else 
+#define TBB_USE_PERFORMANCE_WARNINGS TBB_USE_DEBUG
+#endif /* TBB_PEFORMANCE_WARNINGS */
+#endif /* TBB_USE_PERFORMANCE_WARNINGS */
+
+
+/** Feature sets **/
+
+#ifndef __TBB_EXCEPTIONS
+#define __TBB_EXCEPTIONS 1
+#endif /* __TBB_EXCEPTIONS */
+
+#ifndef __TBB_SCHEDULER_OBSERVER
+#define __TBB_SCHEDULER_OBSERVER 1
+#endif /* __TBB_SCHEDULER_OBSERVER */
+
+#ifndef __TBB_NEW_ITT_NOTIFY
+#define __TBB_NEW_ITT_NOTIFY 1
+#endif /* !__TBB_NEW_ITT_NOTIFY */
+
+
+/* TODO: The following condition should be extended as soon as new compilers/runtimes 
+         with std::exception_ptr support appear. */
+#define __TBB_EXCEPTION_PTR_PRESENT  (_MSC_VER >= 1600 || __GXX_EXPERIMENTAL_CXX0X__ && (__GNUC__==4 && __GNUC_MINOR__>=4))
+
+
+#ifndef TBB_USE_CAPTURED_EXCEPTION
+    #if __TBB_EXCEPTION_PTR_PRESENT
+        #define TBB_USE_CAPTURED_EXCEPTION 0
+    #else
+        #define TBB_USE_CAPTURED_EXCEPTION 1
+    #endif
+#else /* defined TBB_USE_CAPTURED_EXCEPTION */
+    #if !TBB_USE_CAPTURED_EXCEPTION && !__TBB_EXCEPTION_PTR_PRESENT
+        #error Current runtime does not support std::exception_ptr. Set TBB_USE_CAPTURED_EXCEPTION and make sure that your code is ready to catch tbb::captured_exception.
+    #endif
+#endif /* defined TBB_USE_CAPTURED_EXCEPTION */
+
+
+#ifndef __TBB_DEFAULT_PARTITIONER
+#if TBB_DEPRECATED
+/** Default partitioner for parallel loop templates in TBB 1.0-2.1 */
+#define __TBB_DEFAULT_PARTITIONER tbb::simple_partitioner
+#else
+/** Default partitioner for parallel loop templates in TBB 2.2 */
+#define __TBB_DEFAULT_PARTITIONER tbb::auto_partitioner
+#endif /* TBB_DEFAULT_PARTITIONER */
+#endif /* !defined(__TBB_DEFAULT_PARTITIONER */
+
+/** Workarounds presence **/
+
+#if __GNUC__==4 && __GNUC_MINOR__==4 && !defined(__INTEL_COMPILER)
+    #define __TBB_GCC_WARNING_SUPPRESSION_ENABLED 1
+#endif
+
+/** Macros of the form __TBB_XXX_BROKEN denote known issues that are caused by
+    the bugs in compilers, standard or OS specific libraries. They should be 
+    removed as soon as the corresponding bugs are fixed or the buggy OS/compiler
+    versions go out of the support list. 
+**/
+
+#if defined(_MSC_VER) && _MSC_VER < 0x1500 && !defined(__INTEL_COMPILER)
+    /** VS2005 and earlier does not allow to declare a template class as a friend 
+        of classes defined in other namespaces. **/
+    #define __TBB_TEMPLATE_FRIENDS_BROKEN 1
+#endif
+
+#if __GLIBC__==2 && __GLIBC_MINOR__==3 || __MINGW32__
+    /** Some older versions of glibc crash when exception handling happens concurrently. **/
+    #define __TBB_EXCEPTION_HANDLING_BROKEN 1
+#endif
+
+#if (_WIN32||_WIN64) && __INTEL_COMPILER == 1110
+    /** That's a bug in Intel compiler 11.1.044/IA-32/Windows, that leads to a worker thread crash on the thread's startup. **/
+    #define __TBB_ICL_11_1_CODE_GEN_BROKEN 1
+#endif
+
+#if __FreeBSD__
+    /** The bug in FreeBSD 8.0 results in kernel panic when there is contention 
+        on a mutex created with this attribute. **/
+    #define __TBB_PRIO_INHERIT_BROKEN 1
+
+    /** A bug in FreeBSD 8.0 results in test hanging when an exception occurs 
+        during (concurrent?) object construction by means of placement new operator. **/
+    #define __TBB_PLACEMENT_NEW_EXCEPTION_SAFETY_BROKEN 1
+#endif /* __FreeBSD__ */
+
+#if __LRB__
+#include "tbb_config_lrb.h"
+#endif
+
+#endif /* __TBB_tbb_config_H */
diff --git a/dep/tbb/include/tbb/tbb_exception.h b/dep/tbb/include/tbb/tbb_exception.h
new file mode 100644
index 000000000..621129eef
--- /dev/null
+++ b/dep/tbb/include/tbb/tbb_exception.h
@@ -0,0 +1,297 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_exception_H
+#define __TBB_exception_H
+
+#include "tbb_stddef.h"
+#include <stdexcept>
+
+#if __TBB_EXCEPTIONS && !defined(__EXCEPTIONS) && !defined(_CPPUNWIND) && !defined(__SUNPRO_CC)
+#error The current compilation environment does not support exception handling. Please set __TBB_EXCEPTIONS to 0 in tbb_config.h
+#endif
+
+namespace tbb {
+
+//! Exception for concurrent containers
+class bad_last_alloc : public std::bad_alloc {
+public:
+    virtual const char* what() const throw() { return "bad allocation in previous or concurrent attempt"; }
+    virtual ~bad_last_alloc() throw() {}
+};
+
+namespace internal {
+void __TBB_EXPORTED_FUNC throw_bad_last_alloc_exception_v4() ;
+} // namespace internal
+
+} // namespace tbb
+
+#if __TBB_EXCEPTIONS
+#include "tbb_allocator.h"
+#include <exception>
+#include <typeinfo>
+#include <new>
+
+namespace tbb {
+
+//! Interface to be implemented by all exceptions TBB recognizes and propagates across the threads.
+/** If an unhandled exception of the type derived from tbb::tbb_exception is intercepted
+    by the TBB scheduler in one of the worker threads, it is delivered to and re-thrown in
+    the root thread. The root thread is the thread that has started the outermost algorithm 
+    or root task sharing the same task_group_context with the guilty algorithm/task (the one
+    that threw the exception first).
+    
+    Note: when documentation mentions workers with respect to exception handling, 
+    masters are implied as well, because they are completely equivalent in this context.
+    Consequently a root thread can be master or worker thread. 
+
+    NOTE: In case of nested algorithms or complex task hierarchies when the nested 
+    levels share (explicitly or by means of implicit inheritance) the task group 
+    context of the outermost level, the exception may be (re-)thrown multiple times 
+    (ultimately - in each worker on each nesting level) before reaching the root 
+    thread at the outermost level. IMPORTANT: if you intercept an exception derived 
+    from this class on a nested level, you must re-throw it in the catch block by means
+    of the "throw;" operator. 
+    
+    TBB provides two implementations of this interface: tbb::captured_exception and 
+    template class tbb::movable_exception. See their declarations for more info. **/
+class tbb_exception : public std::exception
+{
+    /** No operator new is provided because the TBB usage model assumes dynamic 
+        creation of the TBB exception objects only by means of applying move()
+        operation on an exception thrown out of TBB scheduler. **/
+    void* operator new ( size_t );
+
+public:
+    //! Creates and returns pointer to the deep copy of this exception object. 
+    /** Move semantics is allowed. **/
+    virtual tbb_exception* move () throw() = 0;
+    
+    //! Destroys objects created by the move() method.
+    /** Frees memory and calls destructor for this exception object. 
+        Can and must be used only on objects created by the move method. **/
+    virtual void destroy () throw() = 0;
+
+    //! Throws this exception object.
+    /** Make sure that if you have several levels of derivation from this interface
+        you implement or override this method on the most derived level. The implementation 
+        is as simple as "throw *this;". Failure to do this will result in exception 
+        of a base class type being thrown. **/
+    virtual void throw_self () = 0;
+
+    //! Returns RTTI name of the originally intercepted exception
+    virtual const char* name() const throw() = 0;
+
+    //! Returns the result of originally intercepted exception's what() method.
+    virtual const char* what() const throw() = 0;
+
+    /** Operator delete is provided only to allow using existing smart pointers
+        with TBB exception objects obtained as the result of applying move()
+        operation on an exception thrown out of TBB scheduler. 
+        
+        When overriding method move() make sure to override operator delete as well
+        if memory is allocated not by TBB's scalable allocator. **/
+    void operator delete ( void* p ) {
+        internal::deallocate_via_handler_v3(p);
+    }
+};
+
+//! This class is used by TBB to propagate information about unhandled exceptions into the root thread.
+/** Exception of this type is thrown by TBB in the root thread (thread that started a parallel 
+    algorithm ) if an unhandled exception was intercepted during the algorithm execution in one 
+    of the workers.
+    \sa tbb::tbb_exception **/
+class captured_exception : public tbb_exception
+{
+public:
+    captured_exception ( const captured_exception& src )
+        : tbb_exception(src), my_dynamic(false)
+    {
+        set(src.my_exception_name, src.my_exception_info);
+    }
+
+    captured_exception ( const char* name, const char* info )
+        : my_dynamic(false)
+    {
+        set(name, info);
+    }
+
+    __TBB_EXPORTED_METHOD ~captured_exception () throw() {
+        clear();
+    }
+
+    captured_exception& operator= ( const captured_exception& src ) {
+        if ( this != &src ) {
+            clear();
+            set(src.my_exception_name, src.my_exception_info);
+        }
+        return *this;
+    }
+
+    /*override*/ 
+    captured_exception* __TBB_EXPORTED_METHOD move () throw();
+
+    /*override*/ 
+    void __TBB_EXPORTED_METHOD destroy () throw();
+
+    /*override*/ 
+    void throw_self () { throw *this; }
+
+    /*override*/ 
+    const char* __TBB_EXPORTED_METHOD name() const throw();
+
+    /*override*/ 
+    const char* __TBB_EXPORTED_METHOD what() const throw();
+
+    void __TBB_EXPORTED_METHOD set ( const char* name, const char* info ) throw();
+    void __TBB_EXPORTED_METHOD clear () throw();
+
+private:
+    //! Used only by method clone().  
+    captured_exception() {}
+
+    //! Functionally equivalent to {captured_exception e(name,info); return e.clone();}
+    static captured_exception* allocate ( const char* name, const char* info );
+
+    bool my_dynamic;
+    const char* my_exception_name;
+    const char* my_exception_info;
+};
+
+//! Template that can be used to implement exception that transfers arbitrary ExceptionData to the root thread
+/** Code using TBB can instantiate this template with an arbitrary ExceptionData type 
+    and throw this exception object. Such exceptions are intercepted by the TBB scheduler
+    and delivered to the root thread (). 
+    \sa tbb::tbb_exception **/
+template<typename ExceptionData>
+class movable_exception : public tbb_exception
+{
+    typedef movable_exception<ExceptionData> self_type;
+
+public:
+    movable_exception ( const ExceptionData& data ) 
+        : my_exception_data(data)
+        , my_dynamic(false)
+        , my_exception_name(typeid(self_type).name())
+    {}
+
+    movable_exception ( const movable_exception& src ) throw () 
+        : tbb_exception(src)
+        , my_exception_data(src.my_exception_data)
+        , my_dynamic(false)
+        , my_exception_name(src.my_exception_name)
+    {}
+
+    ~movable_exception () throw() {}
+
+    const movable_exception& operator= ( const movable_exception& src ) {
+        if ( this != &src ) {
+            my_exception_data = src.my_exception_data;
+            my_exception_name = src.my_exception_name;
+        }
+        return *this;
+    }
+
+    ExceptionData& data () throw() { return my_exception_data; }
+
+    const ExceptionData& data () const throw() { return my_exception_data; }
+
+    /*override*/ const char* name () const throw() { return my_exception_name; }
+
+    /*override*/ const char* what () const throw() { return "tbb::movable_exception"; }
+
+    /*override*/ 
+    movable_exception* move () throw() {
+        void* e = internal::allocate_via_handler_v3(sizeof(movable_exception));
+        if ( e ) {
+            ::new (e) movable_exception(*this);
+            ((movable_exception*)e)->my_dynamic = true;
+        }
+        return (movable_exception*)e;
+    }
+    /*override*/ 
+    void destroy () throw() {
+        __TBB_ASSERT ( my_dynamic, "Method destroy can be called only on dynamically allocated movable_exceptions" );
+        if ( my_dynamic ) {
+            this->~movable_exception();
+            internal::deallocate_via_handler_v3(this);
+        }
+    }
+    /*override*/ 
+    void throw_self () {
+        throw *this;
+    }
+
+protected:
+    //! User data
+    ExceptionData  my_exception_data;
+
+private:
+    //! Flag specifying whether this object has been dynamically allocated (by the move method)
+    bool my_dynamic;
+
+    //! RTTI name of this class
+    /** We rely on the fact that RTTI names are static string constants. **/
+    const char* my_exception_name;
+};
+
+#if !TBB_USE_CAPTURED_EXCEPTION
+namespace internal {
+
+//! Exception container that preserves the exact copy of the original exception
+/** This class can be used only when the appropriate runtime support (mandated 
+    by C++0x) is present **/
+class tbb_exception_ptr {
+    std::exception_ptr  my_ptr;
+
+public:
+    static tbb_exception_ptr* allocate ();
+    static tbb_exception_ptr* allocate ( const tbb_exception& tag );
+    //! This overload uses move semantics (i.e. it empties src)
+    static tbb_exception_ptr* allocate ( captured_exception& src );
+    
+    //! Destroys this objects
+    /** Note that objects of this type can be created only by the allocate() method. **/
+    void destroy () throw();
+
+    //! Throws the contained exception .
+    void throw_self () { std::rethrow_exception(my_ptr); }
+
+private:
+    tbb_exception_ptr ( const std::exception_ptr& src ) : my_ptr(src) {}
+    tbb_exception_ptr ( const captured_exception& src ) : my_ptr(std::copy_exception(src)) {}
+}; // class tbb::internal::tbb_exception_ptr
+
+} // namespace internal
+#endif /* !TBB_USE_CAPTURED_EXCEPTION */
+
+} // namespace tbb
+
+#endif /* __TBB_EXCEPTIONS */
+
+#endif /* __TBB_exception_H */
diff --git a/dep/tbb/include/tbb/tbb_machine.h b/dep/tbb/include/tbb/tbb_machine.h
new file mode 100644
index 000000000..0673f2424
--- /dev/null
+++ b/dep/tbb/include/tbb/tbb_machine.h
@@ -0,0 +1,592 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_machine_H
+#define __TBB_machine_H
+
+#include "tbb_stddef.h"
+
+#if _WIN32||_WIN64
+
+#ifdef _MANAGED
+#pragma managed(push, off)
+#endif
+
+#if __MINGW32__
+#include "machine/linux_ia32.h"
+extern "C" __declspec(dllimport) int __stdcall SwitchToThread( void );
+#define __TBB_Yield()  SwitchToThread()
+#elif defined(_M_IX86)
+#include "machine/windows_ia32.h"
+#elif defined(_M_AMD64) 
+#include "machine/windows_intel64.h"
+#else
+#error Unsupported platform
+#endif
+
+#ifdef _MANAGED
+#pragma managed(pop)
+#endif
+
+#elif __linux__ || __FreeBSD__
+
+#if __i386__
+#include "machine/linux_ia32.h"
+#elif __x86_64__
+#include "machine/linux_intel64.h"
+#elif __ia64__
+#include "machine/linux_ia64.h"
+#endif
+
+#elif __APPLE__
+
+#if __i386__
+#include "machine/linux_ia32.h"
+#elif __x86_64__
+#include "machine/linux_intel64.h"
+#elif __POWERPC__
+#include "machine/mac_ppc.h"
+#endif
+
+#elif _AIX
+
+#include "machine/ibm_aix51.h"
+
+#elif __sun || __SUNPRO_CC
+
+#define __asm__ asm 
+#define __volatile__ volatile
+#if __i386  || __i386__
+#include "machine/linux_ia32.h"
+#elif __x86_64__
+#include "machine/linux_intel64.h"
+#endif
+
+#endif
+
+#if    !defined(__TBB_CompareAndSwap4) \
+    || !defined(__TBB_CompareAndSwap8) \
+    || !defined(__TBB_Yield)           \
+    || !defined(__TBB_release_consistency_helper)
+#error Minimal requirements for tbb_machine.h not satisfied 
+#endif
+
+#ifndef __TBB_load_with_acquire
+    //! Load with acquire semantics; i.e., no following memory operation can move above the load.
+    template<typename T>
+    inline T __TBB_load_with_acquire(const volatile T& location) {
+        T temp = location;
+        __TBB_release_consistency_helper();
+        return temp;
+    }
+#endif
+
+#ifndef __TBB_store_with_release
+    //! Store with release semantics; i.e., no prior memory operation can move below the store.
+    template<typename T, typename V>
+    inline void __TBB_store_with_release(volatile T& location, V value) {
+        __TBB_release_consistency_helper();
+        location = T(value); 
+    }
+#endif
+
+#ifndef __TBB_Pause
+    inline void __TBB_Pause(int32_t) {
+        __TBB_Yield();
+    }
+#endif
+
+namespace tbb {
+namespace internal {
+
+//! Class that implements exponential backoff.
+/** See implementation of spin_wait_while_eq for an example. */
+class atomic_backoff {
+    //! Time delay, in units of "pause" instructions. 
+    /** Should be equal to approximately the number of "pause" instructions
+        that take the same time as an context switch. */
+    static const int32_t LOOPS_BEFORE_YIELD = 16;
+    int32_t count;
+public:
+    atomic_backoff() : count(1) {}
+
+    //! Pause for a while.
+    void pause() {
+        if( count<=LOOPS_BEFORE_YIELD ) {
+            __TBB_Pause(count);
+            // Pause twice as long the next time.
+            count*=2;
+        } else {
+            // Pause is so long that we might as well yield CPU to scheduler.
+            __TBB_Yield();
+        }
+    }
+
+    // pause for a few times and then return false immediately.
+    bool bounded_pause() {
+        if( count<=LOOPS_BEFORE_YIELD ) {
+            __TBB_Pause(count);
+            // Pause twice as long the next time.
+            count*=2;
+            return true;
+        } else {
+            return false;
+        }
+    }
+
+    void reset() {
+        count = 1;
+    }
+};
+
+//! Spin WHILE the value of the variable is equal to a given value
+/** T and U should be comparable types. */
+template<typename T, typename U>
+void spin_wait_while_eq( const volatile T& location, U value ) {
+    atomic_backoff backoff;
+    while( location==value ) backoff.pause();
+}
+
+//! Spin UNTIL the value of the variable is equal to a given value
+/** T and U should be comparable types. */
+template<typename T, typename U>
+void spin_wait_until_eq( const volatile T& location, const U value ) {
+    atomic_backoff backoff;
+    while( location!=value ) backoff.pause();
+}
+
+// T should be unsigned, otherwise sign propagation will break correctness of bit manipulations.
+// S should be either 1 or 2, for the mask calculation to work correctly.
+// Together, these rules limit applicability of Masked CAS to unsigned char and unsigned short.
+template<size_t S, typename T>
+inline T __TBB_MaskedCompareAndSwap (volatile T *ptr, T value, T comparand ) {
+    volatile uint32_t * base = (uint32_t*)( (uintptr_t)ptr & ~(uintptr_t)0x3 );
+#if __TBB_BIG_ENDIAN
+    const uint8_t bitoffset = uint8_t( 8*( 4-S - (uintptr_t(ptr) & 0x3) ) );
+#else
+    const uint8_t bitoffset = uint8_t( 8*((uintptr_t)ptr & 0x3) );
+#endif
+    const uint32_t mask = ( (1<<(S*8)) - 1 )<<bitoffset;
+    atomic_backoff b;
+    uint32_t result;
+    for(;;) {
+        result = *base; // reload the base value which might change during the pause
+        uint32_t old_value = ( result & ~mask ) | ( comparand << bitoffset );
+        uint32_t new_value = ( result & ~mask ) | ( value << bitoffset );
+        // __TBB_CompareAndSwap4 presumed to have full fence. 
+        result = __TBB_CompareAndSwap4( base, new_value, old_value );
+        if(  result==old_value               // CAS succeeded
+          || ((result^old_value)&mask)!=0 )  // CAS failed and the bits of interest have changed
+            break;
+        else                                 // CAS failed but the bits of interest left unchanged
+            b.pause();
+    }
+    return T((result & mask) >> bitoffset);
+}
+
+template<size_t S, typename T>
+inline T __TBB_CompareAndSwapGeneric (volatile void *ptr, T value, T comparand ) { 
+    return __TBB_CompareAndSwapW((T *)ptr,value,comparand);
+}
+
+template<>
+inline uint8_t __TBB_CompareAndSwapGeneric <1,uint8_t> (volatile void *ptr, uint8_t value, uint8_t comparand ) {
+#ifdef __TBB_CompareAndSwap1
+    return __TBB_CompareAndSwap1(ptr,value,comparand);
+#else
+    return __TBB_MaskedCompareAndSwap<1,uint8_t>((volatile uint8_t *)ptr,value,comparand);
+#endif
+}
+
+template<>
+inline uint16_t __TBB_CompareAndSwapGeneric <2,uint16_t> (volatile void *ptr, uint16_t value, uint16_t comparand ) {
+#ifdef __TBB_CompareAndSwap2
+    return __TBB_CompareAndSwap2(ptr,value,comparand);
+#else
+    return __TBB_MaskedCompareAndSwap<2,uint16_t>((volatile uint16_t *)ptr,value,comparand);
+#endif
+}
+
+template<>
+inline uint32_t __TBB_CompareAndSwapGeneric <4,uint32_t> (volatile void *ptr, uint32_t value, uint32_t comparand ) { 
+    return __TBB_CompareAndSwap4(ptr,value,comparand);
+}
+
+template<>
+inline uint64_t __TBB_CompareAndSwapGeneric <8,uint64_t> (volatile void *ptr, uint64_t value, uint64_t comparand ) { 
+    return __TBB_CompareAndSwap8(ptr,value,comparand);
+}
+
+template<size_t S, typename T>
+inline T __TBB_FetchAndAddGeneric (volatile void *ptr, T addend) {
+    atomic_backoff b;
+    T result;
+    for(;;) {
+        result = *reinterpret_cast<volatile T *>(ptr);
+        // __TBB_CompareAndSwapGeneric presumed to have full fence. 
+        if( __TBB_CompareAndSwapGeneric<S,T> ( ptr, result+addend, result )==result ) 
+            break;
+        b.pause();
+    }
+    return result;
+}
+
+template<size_t S, typename T>
+inline T __TBB_FetchAndStoreGeneric (volatile void *ptr, T value) {
+    atomic_backoff b;
+    T result;
+    for(;;) {
+        result = *reinterpret_cast<volatile T *>(ptr);
+        // __TBB_CompareAndSwapGeneric presumed to have full fence.
+        if( __TBB_CompareAndSwapGeneric<S,T> ( ptr, value, result )==result ) 
+            break;
+        b.pause();
+    }
+    return result;
+}
+
+// Macro __TBB_TypeWithAlignmentAtLeastAsStrict(T) should be a type with alignment at least as 
+// strict as type T.  Type type should have a trivial default constructor and destructor, so that
+// arrays of that type can be declared without initializers.  
+// It is correct (but perhaps a waste of space) if __TBB_TypeWithAlignmentAtLeastAsStrict(T) expands
+// to a type bigger than T.
+// The default definition here works on machines where integers are naturally aligned and the
+// strictest alignment is 16.
+#ifndef __TBB_TypeWithAlignmentAtLeastAsStrict
+
+#if __GNUC__ || __SUNPRO_CC
+struct __TBB_machine_type_with_strictest_alignment {
+    int member[4];
+} __attribute__((aligned(16)));
+#elif _MSC_VER
+__declspec(align(16)) struct __TBB_machine_type_with_strictest_alignment {
+    int member[4];
+};
+#else
+#error Must define __TBB_TypeWithAlignmentAtLeastAsStrict(T) or __TBB_machine_type_with_strictest_alignment
+#endif
+
+template<size_t N> struct type_with_alignment {__TBB_machine_type_with_strictest_alignment member;};
+template<> struct type_with_alignment<1> { char member; };
+template<> struct type_with_alignment<2> { uint16_t member; };
+template<> struct type_with_alignment<4> { uint32_t member; };
+template<> struct type_with_alignment<8> { uint64_t member; };
+
+#if _MSC_VER||defined(__GNUC__)&&__GNUC__==3 && __GNUC_MINOR__<=2  
+//! Work around for bug in GNU 3.2 and MSVC compilers.
+/** Bug is that compiler sometimes returns 0 for __alignof(T) when T has not yet been instantiated.
+    The work-around forces instantiation by forcing computation of sizeof(T) before __alignof(T). */
+template<size_t Size, typename T> 
+struct work_around_alignment_bug {
+#if _MSC_VER
+    static const size_t alignment = __alignof(T);
+#else
+    static const size_t alignment = __alignof__(T);
+#endif
+};
+#define __TBB_TypeWithAlignmentAtLeastAsStrict(T) tbb::internal::type_with_alignment<tbb::internal::work_around_alignment_bug<sizeof(T),T>::alignment>
+#elif __GNUC__ || __SUNPRO_CC
+#define __TBB_TypeWithAlignmentAtLeastAsStrict(T) tbb::internal::type_with_alignment<__alignof__(T)>
+#else
+#define __TBB_TypeWithAlignmentAtLeastAsStrict(T) __TBB_machine_type_with_strictest_alignment
+#endif
+#endif  /* ____TBB_TypeWithAlignmentAtLeastAsStrict */
+
+} // namespace internal
+} // namespace tbb
+
+#ifndef __TBB_CompareAndSwap1
+#define __TBB_CompareAndSwap1 tbb::internal::__TBB_CompareAndSwapGeneric<1,uint8_t>
+#endif
+
+#ifndef __TBB_CompareAndSwap2 
+#define __TBB_CompareAndSwap2 tbb::internal::__TBB_CompareAndSwapGeneric<2,uint16_t>
+#endif
+
+#ifndef __TBB_CompareAndSwapW
+#define __TBB_CompareAndSwapW tbb::internal::__TBB_CompareAndSwapGeneric<sizeof(ptrdiff_t),ptrdiff_t>
+#endif
+
+#ifndef __TBB_FetchAndAdd1
+#define __TBB_FetchAndAdd1 tbb::internal::__TBB_FetchAndAddGeneric<1,uint8_t>
+#endif
+
+#ifndef __TBB_FetchAndAdd2
+#define __TBB_FetchAndAdd2 tbb::internal::__TBB_FetchAndAddGeneric<2,uint16_t>
+#endif
+
+#ifndef __TBB_FetchAndAdd4
+#define __TBB_FetchAndAdd4 tbb::internal::__TBB_FetchAndAddGeneric<4,uint32_t>
+#endif
+
+#ifndef __TBB_FetchAndAdd8
+#define __TBB_FetchAndAdd8 tbb::internal::__TBB_FetchAndAddGeneric<8,uint64_t>
+#endif
+
+#ifndef __TBB_FetchAndAddW
+#define __TBB_FetchAndAddW tbb::internal::__TBB_FetchAndAddGeneric<sizeof(ptrdiff_t),ptrdiff_t>
+#endif
+
+#ifndef __TBB_FetchAndStore1
+#define __TBB_FetchAndStore1 tbb::internal::__TBB_FetchAndStoreGeneric<1,uint8_t>
+#endif
+
+#ifndef __TBB_FetchAndStore2
+#define __TBB_FetchAndStore2 tbb::internal::__TBB_FetchAndStoreGeneric<2,uint16_t>
+#endif
+
+#ifndef __TBB_FetchAndStore4
+#define __TBB_FetchAndStore4 tbb::internal::__TBB_FetchAndStoreGeneric<4,uint32_t>
+#endif
+
+#ifndef __TBB_FetchAndStore8
+#define __TBB_FetchAndStore8 tbb::internal::__TBB_FetchAndStoreGeneric<8,uint64_t>
+#endif
+
+#ifndef __TBB_FetchAndStoreW
+#define __TBB_FetchAndStoreW tbb::internal::__TBB_FetchAndStoreGeneric<sizeof(ptrdiff_t),ptrdiff_t>
+#endif
+
+#if __TBB_DECL_FENCED_ATOMICS
+
+#ifndef __TBB_CompareAndSwap1__TBB_full_fence
+#define __TBB_CompareAndSwap1__TBB_full_fence __TBB_CompareAndSwap1
+#endif 
+#ifndef __TBB_CompareAndSwap1acquire
+#define __TBB_CompareAndSwap1acquire __TBB_CompareAndSwap1__TBB_full_fence
+#endif 
+#ifndef __TBB_CompareAndSwap1release
+#define __TBB_CompareAndSwap1release __TBB_CompareAndSwap1__TBB_full_fence
+#endif 
+
+#ifndef __TBB_CompareAndSwap2__TBB_full_fence
+#define __TBB_CompareAndSwap2__TBB_full_fence __TBB_CompareAndSwap2
+#endif
+#ifndef __TBB_CompareAndSwap2acquire
+#define __TBB_CompareAndSwap2acquire __TBB_CompareAndSwap2__TBB_full_fence
+#endif
+#ifndef __TBB_CompareAndSwap2release
+#define __TBB_CompareAndSwap2release __TBB_CompareAndSwap2__TBB_full_fence
+#endif
+
+#ifndef __TBB_CompareAndSwap4__TBB_full_fence
+#define __TBB_CompareAndSwap4__TBB_full_fence __TBB_CompareAndSwap4
+#endif 
+#ifndef __TBB_CompareAndSwap4acquire
+#define __TBB_CompareAndSwap4acquire __TBB_CompareAndSwap4__TBB_full_fence
+#endif 
+#ifndef __TBB_CompareAndSwap4release
+#define __TBB_CompareAndSwap4release __TBB_CompareAndSwap4__TBB_full_fence
+#endif 
+
+#ifndef __TBB_CompareAndSwap8__TBB_full_fence
+#define __TBB_CompareAndSwap8__TBB_full_fence __TBB_CompareAndSwap8
+#endif
+#ifndef __TBB_CompareAndSwap8acquire
+#define __TBB_CompareAndSwap8acquire __TBB_CompareAndSwap8__TBB_full_fence
+#endif
+#ifndef __TBB_CompareAndSwap8release
+#define __TBB_CompareAndSwap8release __TBB_CompareAndSwap8__TBB_full_fence
+#endif
+
+#ifndef __TBB_FetchAndAdd1__TBB_full_fence
+#define __TBB_FetchAndAdd1__TBB_full_fence __TBB_FetchAndAdd1
+#endif
+#ifndef __TBB_FetchAndAdd1acquire
+#define __TBB_FetchAndAdd1acquire __TBB_FetchAndAdd1__TBB_full_fence
+#endif
+#ifndef __TBB_FetchAndAdd1release
+#define __TBB_FetchAndAdd1release __TBB_FetchAndAdd1__TBB_full_fence
+#endif
+
+#ifndef __TBB_FetchAndAdd2__TBB_full_fence
+#define __TBB_FetchAndAdd2__TBB_full_fence __TBB_FetchAndAdd2
+#endif
+#ifndef __TBB_FetchAndAdd2acquire
+#define __TBB_FetchAndAdd2acquire __TBB_FetchAndAdd2__TBB_full_fence
+#endif
+#ifndef __TBB_FetchAndAdd2release
+#define __TBB_FetchAndAdd2release __TBB_FetchAndAdd2__TBB_full_fence
+#endif
+
+#ifndef __TBB_FetchAndAdd4__TBB_full_fence
+#define __TBB_FetchAndAdd4__TBB_full_fence __TBB_FetchAndAdd4
+#endif
+#ifndef __TBB_FetchAndAdd4acquire
+#define __TBB_FetchAndAdd4acquire __TBB_FetchAndAdd4__TBB_full_fence
+#endif
+#ifndef __TBB_FetchAndAdd4release
+#define __TBB_FetchAndAdd4release __TBB_FetchAndAdd4__TBB_full_fence
+#endif
+
+#ifndef __TBB_FetchAndAdd8__TBB_full_fence
+#define __TBB_FetchAndAdd8__TBB_full_fence __TBB_FetchAndAdd8
+#endif
+#ifndef __TBB_FetchAndAdd8acquire
+#define __TBB_FetchAndAdd8acquire __TBB_FetchAndAdd8__TBB_full_fence
+#endif
+#ifndef __TBB_FetchAndAdd8release
+#define __TBB_FetchAndAdd8release __TBB_FetchAndAdd8__TBB_full_fence
+#endif
+
+#ifndef __TBB_FetchAndStore1__TBB_full_fence
+#define __TBB_FetchAndStore1__TBB_full_fence __TBB_FetchAndStore1
+#endif
+#ifndef __TBB_FetchAndStore1acquire
+#define __TBB_FetchAndStore1acquire __TBB_FetchAndStore1__TBB_full_fence
+#endif
+#ifndef __TBB_FetchAndStore1release
+#define __TBB_FetchAndStore1release __TBB_FetchAndStore1__TBB_full_fence
+#endif
+
+#ifndef __TBB_FetchAndStore2__TBB_full_fence
+#define __TBB_FetchAndStore2__TBB_full_fence __TBB_FetchAndStore2
+#endif
+#ifndef __TBB_FetchAndStore2acquire
+#define __TBB_FetchAndStore2acquire __TBB_FetchAndStore2__TBB_full_fence
+#endif
+#ifndef __TBB_FetchAndStore2release
+#define __TBB_FetchAndStore2release __TBB_FetchAndStore2__TBB_full_fence
+#endif
+
+#ifndef __TBB_FetchAndStore4__TBB_full_fence
+#define __TBB_FetchAndStore4__TBB_full_fence __TBB_FetchAndStore4
+#endif
+#ifndef __TBB_FetchAndStore4acquire
+#define __TBB_FetchAndStore4acquire __TBB_FetchAndStore4__TBB_full_fence
+#endif
+#ifndef __TBB_FetchAndStore4release
+#define __TBB_FetchAndStore4release __TBB_FetchAndStore4__TBB_full_fence
+#endif
+
+#ifndef __TBB_FetchAndStore8__TBB_full_fence
+#define __TBB_FetchAndStore8__TBB_full_fence __TBB_FetchAndStore8
+#endif
+#ifndef __TBB_FetchAndStore8acquire
+#define __TBB_FetchAndStore8acquire __TBB_FetchAndStore8__TBB_full_fence
+#endif
+#ifndef __TBB_FetchAndStore8release
+#define __TBB_FetchAndStore8release __TBB_FetchAndStore8__TBB_full_fence
+#endif
+
+#endif // __TBB_DECL_FENCED_ATOMICS
+
+// Special atomic functions
+#ifndef __TBB_FetchAndAddWrelease
+#define __TBB_FetchAndAddWrelease __TBB_FetchAndAddW
+#endif
+
+#ifndef __TBB_FetchAndIncrementWacquire
+#define __TBB_FetchAndIncrementWacquire(P) __TBB_FetchAndAddW(P,1)
+#endif
+
+#ifndef __TBB_FetchAndDecrementWrelease
+#define __TBB_FetchAndDecrementWrelease(P) __TBB_FetchAndAddW(P,(-1))
+#endif
+
+#if __TBB_WORDSIZE==4
+// On 32-bit platforms, "atomic.h" requires definition of __TBB_Store8 and __TBB_Load8
+#ifndef __TBB_Store8
+inline void __TBB_Store8 (volatile void *ptr, int64_t value) {
+    tbb::internal::atomic_backoff b;
+    for(;;) {
+        int64_t result = *(int64_t *)ptr;
+        if( __TBB_CompareAndSwap8(ptr,value,result)==result ) break;
+        b.pause();
+    }
+}
+#endif
+
+#ifndef __TBB_Load8
+inline int64_t __TBB_Load8 (const volatile void *ptr) {
+    int64_t result = *(int64_t *)ptr;
+    result = __TBB_CompareAndSwap8((volatile void *)ptr,result,result);
+    return result;
+}
+#endif
+#endif /* __TBB_WORDSIZE==4 */
+
+#ifndef __TBB_Log2
+inline intptr_t __TBB_Log2( uintptr_t x ) {
+    if( x==0 ) return -1;
+    intptr_t result = 0;
+    uintptr_t tmp;
+#if __TBB_WORDSIZE>=8
+    if( (tmp = x>>32) ) { x=tmp; result += 32; }
+#endif
+    if( (tmp = x>>16) ) { x=tmp; result += 16; }
+    if( (tmp = x>>8) )  { x=tmp; result += 8; }
+    if( (tmp = x>>4) )  { x=tmp; result += 4; }
+    if( (tmp = x>>2) )  { x=tmp; result += 2; }
+    return (x&2)? result+1: result;
+}
+#endif
+
+#ifndef __TBB_AtomicOR
+inline void __TBB_AtomicOR( volatile void *operand, uintptr_t addend ) {
+    tbb::internal::atomic_backoff b;
+    for(;;) {
+        uintptr_t tmp = *(volatile uintptr_t *)operand;
+        uintptr_t result = __TBB_CompareAndSwapW(operand, tmp|addend, tmp);
+        if( result==tmp ) break;
+        b.pause();
+    }
+}
+#endif
+
+#ifndef __TBB_AtomicAND
+inline void __TBB_AtomicAND( volatile void *operand, uintptr_t addend ) {
+    tbb::internal::atomic_backoff b;
+    for(;;) {
+        uintptr_t tmp = *(volatile uintptr_t *)operand;
+        uintptr_t result = __TBB_CompareAndSwapW(operand, tmp&addend, tmp);
+        if( result==tmp ) break;
+        b.pause();
+    }
+}
+#endif
+
+#ifndef __TBB_TryLockByte
+inline bool __TBB_TryLockByte( unsigned char &flag ) {
+    return __TBB_CompareAndSwap1(&flag,1,0)==0;
+}
+#endif
+
+#ifndef __TBB_LockByte
+inline uintptr_t __TBB_LockByte( unsigned char& flag ) {
+    if ( !__TBB_TryLockByte(flag) ) {
+        tbb::internal::atomic_backoff b;
+        do {
+            b.pause();
+        } while ( !__TBB_TryLockByte(flag) );
+    }
+    return 0;
+}
+#endif
+
+#endif /* __TBB_machine_H */
diff --git a/dep/tbb/include/tbb/tbb_profiling.h b/dep/tbb/include/tbb/tbb_profiling.h
new file mode 100644
index 000000000..f9c686d1d
--- /dev/null
+++ b/dep/tbb/include/tbb/tbb_profiling.h
@@ -0,0 +1,105 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_profiling_H
+#define __TBB_profiling_H
+
+// Check if the tools support is enabled
+#if (_WIN32||_WIN64||__linux__) && TBB_USE_THREADING_TOOLS
+
+#if _WIN32||_WIN64
+#include <stdlib.h>  /* mbstowcs_s */
+#endif
+#include "tbb_stddef.h"
+
+namespace tbb {
+    namespace internal {
+#if _WIN32||_WIN64
+        void __TBB_EXPORTED_FUNC itt_set_sync_name_v3( void *obj, const wchar_t* name ); 
+        inline size_t multibyte_to_widechar( wchar_t* wcs, const char* mbs, size_t bufsize) {
+#if _MSC_VER>=1400
+            size_t len;
+            mbstowcs_s( &len, wcs, bufsize, mbs, _TRUNCATE );
+            return len;   // mbstowcs_s counts null terminator
+#else
+            size_t len = mbstowcs( wcs, mbs, bufsize );
+            if(wcs && len!=size_t(-1) )
+                wcs[len<bufsize-1? len: bufsize-1] = wchar_t('\0');
+            return len+1; // mbstowcs does not count null terminator
+#endif
+        }
+#else
+        void __TBB_EXPORTED_FUNC itt_set_sync_name_v3( void *obj, const char* name ); 
+#endif
+    } // namespace internal
+} // namespace tbb
+
+//! Macro __TBB_DEFINE_PROFILING_SET_NAME(T) defines "set_name" methods for sync objects of type T
+/** Should be used in the "tbb" namespace only. 
+    Don't place semicolon after it to avoid compiler warnings. **/
+#if _WIN32||_WIN64
+    #define __TBB_DEFINE_PROFILING_SET_NAME(sync_object_type)    \
+        namespace profiling {                                                       \
+            inline void set_name( sync_object_type& obj, const wchar_t* name ) {    \
+                tbb::internal::itt_set_sync_name_v3( &obj, name );                  \
+            }                                                                       \
+            inline void set_name( sync_object_type& obj, const char* name ) {       \
+                size_t len = tbb::internal::multibyte_to_widechar(NULL, name, 0);   \
+                wchar_t *wname = new wchar_t[len];                                  \
+                tbb::internal::multibyte_to_widechar(wname, name, len);             \
+                set_name( obj, wname );                                             \
+                delete[] wname;                                                     \
+            }                                                                       \
+        }
+#else /* !WIN */
+    #define __TBB_DEFINE_PROFILING_SET_NAME(sync_object_type)    \
+        namespace profiling {                                                       \
+            inline void set_name( sync_object_type& obj, const char* name ) {       \
+                tbb::internal::itt_set_sync_name_v3( &obj, name );                  \
+            }                                                                       \
+        }
+#endif /* !WIN */
+
+#else /* no tools support */
+
+#if _WIN32||_WIN64
+    #define __TBB_DEFINE_PROFILING_SET_NAME(sync_object_type)    \
+        namespace profiling {                                               \
+            inline void set_name( sync_object_type&, const wchar_t* ) {}    \
+            inline void set_name( sync_object_type&, const char* ) {}       \
+        }
+#else /* !WIN */
+    #define __TBB_DEFINE_PROFILING_SET_NAME(sync_object_type)    \
+        namespace profiling {                                               \
+            inline void set_name( sync_object_type&, const char* ) {}       \
+        }
+#endif /* !WIN */
+
+#endif /* no tools support */
+
+#endif /* __TBB_profiling_H */
diff --git a/dep/tbb/include/tbb/tbb_stddef.h b/dep/tbb/include/tbb/tbb_stddef.h
new file mode 100644
index 000000000..a1e24dfd9
--- /dev/null
+++ b/dep/tbb/include/tbb/tbb_stddef.h
@@ -0,0 +1,299 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_tbb_stddef_H
+#define __TBB_tbb_stddef_H
+
+// Marketing-driven product version
+#define TBB_VERSION_MAJOR 2
+#define TBB_VERSION_MINOR 2
+
+// Engineering-focused interface version
+#define TBB_INTERFACE_VERSION 4001
+#define TBB_INTERFACE_VERSION_MAJOR TBB_INTERFACE_VERSION/1000
+
+// The oldest major interface version still supported
+// To be used in SONAME, manifests, etc.
+#define TBB_COMPATIBLE_INTERFACE_VERSION 2
+
+#define __TBB_STRING_AUX(x) #x
+#define __TBB_STRING(x) __TBB_STRING_AUX(x)
+
+// We do not need defines below for resource processing on windows
+#if !defined RC_INVOKED
+
+// Define groups for Doxygen documentation
+/**
+ * @defgroup algorithms         Algorithms
+ * @defgroup containers         Containers
+ * @defgroup memory_allocation  Memory Allocation
+ * @defgroup synchronization    Synchronization
+ * @defgroup timing             Timing
+ * @defgroup task_scheduling    Task Scheduling
+ */
+
+// Simple text that is displayed on the main page of Doxygen documentation.
+/**
+ * \mainpage Main Page
+ *
+ * Click the tabs above for information about the
+ * - <a href="./modules.html">Modules</a> (groups of functionality) implemented by the library 
+ * - <a href="./annotated.html">Classes</a> provided by the library
+ * - <a href="./files.html">Files</a> constituting the library.
+ * .
+ * Please note that significant part of TBB functionality is implemented in the form of
+ * template functions, descriptions of which are not accessible on the <a href="./annotated.html">Classes</a>
+ * tab. Use <a href="./modules.html">Modules</a> or <a href="./namespacemembers.html">Namespace/Namespace Members</a>
+ * tabs to find them.
+ *
+ * Additional pieces of information can be found here
+ * - \subpage concepts
+ * .
+ */
+
+/** \page concepts TBB concepts
+    
+    A concept is a set of requirements to a type, which are necessary and sufficient
+    for the type to model a particular behavior or a set of behaviors. Some concepts 
+    are specific to a particular algorithm (e.g. algorithm body), while other ones 
+    are common to several algorithms (e.g. range concept). 
+
+    All TBB algorithms make use of different classes implementing various concepts.
+    Implementation classes are supplied by the user as type arguments of template 
+    parameters and/or as objects passed as function call arguments. The library 
+    provides predefined  implementations of some concepts (e.g. several kinds of 
+    \ref range_req "ranges"), while other ones must always be implemented by the user. 
+    
+    TBB defines a set of minimal requirements each concept must conform to. Here is 
+    the list of different concepts hyperlinked to the corresponding requirements specifications:
+    - \subpage range_req
+    - \subpage parallel_do_body_req
+    - \subpage parallel_for_body_req
+    - \subpage parallel_reduce_body_req
+    - \subpage parallel_scan_body_req
+    - \subpage parallel_sort_iter_req
+**/
+
+// Define preprocessor symbols used to determine architecture
+#if _WIN32||_WIN64
+#   if defined(_M_AMD64)
+#       define __TBB_x86_64 1
+#   elif defined(_M_IA64)
+#       define __TBB_ipf 1
+#   elif defined(_M_IX86)||defined(__i386__) // the latter for MinGW support
+#       define __TBB_x86_32 1
+#   endif
+#else /* Assume generic Unix */
+#   if !__linux__ && !__APPLE__
+#       define __TBB_generic_os 1
+#   endif
+#   if __x86_64__
+#       define __TBB_x86_64 1
+#   elif __ia64__
+#       define __TBB_ipf 1
+#   elif __i386__||__i386  // __i386 is for Sun OS
+#       define __TBB_x86_32 1
+#   else
+#       define __TBB_generic_arch 1
+#   endif
+#endif
+
+#if _MSC_VER
+// define the parts of stdint.h that are needed, but put them inside tbb::internal
+namespace tbb {
+namespace internal {
+    typedef __int8 int8_t;
+    typedef __int16 int16_t;
+    typedef __int32 int32_t;
+    typedef __int64 int64_t;
+    typedef unsigned __int8 uint8_t;
+    typedef unsigned __int16 uint16_t;
+    typedef unsigned __int32 uint32_t;
+    typedef unsigned __int64 uint64_t;
+} // namespace internal
+} // namespace tbb
+#else
+#include <stdint.h>
+#endif /* _MSC_VER */
+
+#if _MSC_VER >=1400
+#define __TBB_EXPORTED_FUNC   __cdecl
+#define __TBB_EXPORTED_METHOD __thiscall
+#else
+#define __TBB_EXPORTED_FUNC
+#define __TBB_EXPORTED_METHOD
+#endif
+
+#include <cstddef>      /* Need size_t and ptrdiff_t (the latter on Windows only) from here. */
+
+#if _MSC_VER
+#define __TBB_tbb_windef_H
+#include "_tbb_windef.h"
+#undef __TBB_tbb_windef_H
+#endif
+
+#include "tbb_config.h"
+
+namespace tbb {
+    //! Type for an assertion handler
+    typedef void(*assertion_handler_type)( const char* filename, int line, const char* expression, const char * comment );
+}
+
+#if TBB_USE_ASSERT
+
+//! Assert that x is true.
+/** If x is false, print assertion failure message.  
+    If the comment argument is not NULL, it is printed as part of the failure message.  
+    The comment argument has no other effect. */
+#define __TBB_ASSERT(predicate,message) ((predicate)?((void)0):tbb::assertion_failure(__FILE__,__LINE__,#predicate,message))
+#define __TBB_ASSERT_EX __TBB_ASSERT
+
+namespace tbb {
+    //! Set assertion handler and return previous value of it.
+    assertion_handler_type __TBB_EXPORTED_FUNC set_assertion_handler( assertion_handler_type new_handler );
+
+    //! Process an assertion failure.
+    /** Normally called from __TBB_ASSERT macro.
+        If assertion handler is null, print message for assertion failure and abort.
+        Otherwise call the assertion handler. */
+    void __TBB_EXPORTED_FUNC assertion_failure( const char* filename, int line, const char* expression, const char* comment );
+} // namespace tbb
+
+#else
+
+//! No-op version of __TBB_ASSERT.
+#define __TBB_ASSERT(predicate,comment) ((void)0)
+//! "Extended" version is useful to suppress warnings if a variable is only used with an assert
+#define __TBB_ASSERT_EX(predicate,comment) ((void)(1 && (predicate)))
+
+#endif /* TBB_USE_ASSERT */
+
+//! The namespace tbb contains all components of the library.
+namespace tbb {
+
+//! The function returns the interface version of the TBB shared library being used.
+/**
+ * The version it returns is determined at runtime, not at compile/link time.
+ * So it can be different than the value of TBB_INTERFACE_VERSION obtained at compile time.
+ */
+extern "C" int __TBB_EXPORTED_FUNC TBB_runtime_interface_version();
+
+//! Dummy type that distinguishes splitting constructor from copy constructor.
+/**
+ * See description of parallel_for and parallel_reduce for example usages.
+ * @ingroup algorithms
+ */
+class split {
+};
+
+/**
+ * @cond INTERNAL
+ * @brief Identifiers declared inside namespace internal should never be used directly by client code.
+ */
+namespace internal {
+
+using std::size_t;
+
+//! An unsigned integral type big enough to hold a pointer.
+/** There's no guarantee by the C++ standard that a size_t is really big enough,
+    but it happens to be for all platforms of interest. */
+typedef size_t uintptr;
+
+//! A signed integral type big enough to hold a pointer.
+/** There's no guarantee by the C++ standard that a ptrdiff_t is really big enough,
+    but it happens to be for all platforms of interest. */
+typedef std::ptrdiff_t intptr;
+
+//! Compile-time constant that is upper bound on cache line/sector size.
+/** It should be used only in situations where having a compile-time upper 
+    bound is more useful than a run-time exact answer.
+    @ingroup memory_allocation */
+const size_t NFS_MaxLineSize = 128;
+
+//! Report a runtime warning.
+void __TBB_EXPORTED_FUNC runtime_warning( const char* format, ... );
+
+#if TBB_USE_ASSERT
+//! Set p to invalid pointer value.
+template<typename T>
+inline void poison_pointer( T* & p ) {
+    p = reinterpret_cast<T*>(-1);
+}
+#else
+template<typename T>
+inline void poison_pointer( T* ) {/*do nothing*/}
+#endif /* TBB_USE_ASSERT */
+
+//! Base class for types that should not be assigned.
+class no_assign {
+    // Deny assignment
+    void operator=( const no_assign& );
+public:
+#if __GNUC__
+    //! Explicitly define default construction, because otherwise gcc issues gratuitous warning.
+    no_assign() {}
+#endif /* __GNUC__ */
+};
+
+//! Base class for types that should not be copied or assigned.
+class no_copy: no_assign {
+    //! Deny copy construction
+    no_copy( const no_copy& );
+public:
+    //! Allow default construction
+    no_copy() {}
+};
+
+//! Class for determining type of std::allocator<T>::value_type.
+template<typename T>
+struct allocator_type {
+    typedef T value_type;
+};
+
+#if _MSC_VER
+//! Microsoft std::allocator has non-standard extension that strips const from a type. 
+template<typename T>
+struct allocator_type<const T> {
+    typedef T value_type;
+};
+#endif
+
+// Struct to be used as a version tag for inline functions.
+/** Version tag can be necessary to prevent loader on Linux from using the wrong 
+    symbol in debug builds (when inline functions are compiled as out-of-line). **/
+struct version_tag_v3 {};
+
+typedef version_tag_v3 version_tag;
+
+} // internal
+//! @endcond
+
+} // tbb
+
+#endif /* RC_INVOKED */
+#endif /* __TBB_tbb_stddef_H */
diff --git a/dep/tbb/include/tbb/tbb_thread.h b/dep/tbb/include/tbb/tbb_thread.h
new file mode 100644
index 000000000..6b40a9c04
--- /dev/null
+++ b/dep/tbb/include/tbb/tbb_thread.h
@@ -0,0 +1,294 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_tbb_thread_H
+#define __TBB_tbb_thread_H
+
+#if _WIN32||_WIN64
+#include <windows.h>
+#define __TBB_NATIVE_THREAD_ROUTINE unsigned WINAPI
+#define __TBB_NATIVE_THREAD_ROUTINE_PTR(r) unsigned (WINAPI* r)( void* )
+#else
+#define __TBB_NATIVE_THREAD_ROUTINE void*
+#define __TBB_NATIVE_THREAD_ROUTINE_PTR(r) void* (*r)( void* )
+#include <pthread.h>
+#endif // _WIN32||_WIN64
+
+#include <iosfwd>
+#include <exception>             // Need std::terminate from here.
+#include "tbb_stddef.h"
+#include "tick_count.h"
+
+namespace tbb {
+
+//! @cond INTERNAL
+namespace internal {
+    
+    class tbb_thread_v3;
+
+} // namespace internal
+
+void swap( internal::tbb_thread_v3& t1, internal::tbb_thread_v3& t2 ); 
+
+namespace internal {
+
+    //! Allocate a closure
+    void* __TBB_EXPORTED_FUNC allocate_closure_v3( size_t size );
+    //! Free a closure allocated by allocate_closure_v3
+    void __TBB_EXPORTED_FUNC free_closure_v3( void* );
+   
+    struct thread_closure_base {
+        void* operator new( size_t size ) {return allocate_closure_v3(size);}
+        void operator delete( void* ptr ) {free_closure_v3(ptr);}
+    };
+
+    template<class F> struct thread_closure_0: thread_closure_base {
+        F function;
+
+        static __TBB_NATIVE_THREAD_ROUTINE start_routine( void* c ) {
+            thread_closure_0 *self = static_cast<thread_closure_0*>(c);
+            try {
+                self->function();
+            } catch ( ... ) {
+                std::terminate();
+            }
+            delete self;
+            return 0;
+        }
+        thread_closure_0( const F& f ) : function(f) {}
+    };
+    //! Structure used to pass user function with 1 argument to thread.  
+    template<class F, class X> struct thread_closure_1: thread_closure_base {
+        F function;
+        X arg1;
+        //! Routine passed to Windows's _beginthreadex by thread::internal_start() inside tbb.dll
+        static __TBB_NATIVE_THREAD_ROUTINE start_routine( void* c ) {
+            thread_closure_1 *self = static_cast<thread_closure_1*>(c);
+            try {
+                self->function(self->arg1);
+            } catch ( ... ) {
+                std::terminate();
+            }
+            delete self;
+            return 0;
+        }
+        thread_closure_1( const F& f, const X& x ) : function(f), arg1(x) {}
+    };
+    template<class F, class X, class Y> struct thread_closure_2: thread_closure_base {
+        F function;
+        X arg1;
+        Y arg2;
+        //! Routine passed to Windows's _beginthreadex by thread::internal_start() inside tbb.dll
+        static __TBB_NATIVE_THREAD_ROUTINE start_routine( void* c ) {
+            thread_closure_2 *self = static_cast<thread_closure_2*>(c);
+            try {
+                self->function(self->arg1, self->arg2);
+            } catch ( ... ) {
+                std::terminate();
+            }
+            delete self;
+            return 0;
+        }
+        thread_closure_2( const F& f, const X& x, const Y& y ) : function(f), arg1(x), arg2(y) {}
+    };
+
+    //! Versioned thread class.
+    class tbb_thread_v3 {
+        tbb_thread_v3(const tbb_thread_v3&); // = delete;   // Deny access
+    public:
+#if _WIN32||_WIN64
+        typedef HANDLE native_handle_type; 
+#else
+        typedef pthread_t native_handle_type; 
+#endif // _WIN32||_WIN64
+
+        class id;
+        //! Constructs a thread object that does not represent a thread of execution. 
+        tbb_thread_v3() : my_handle(0)
+#if _WIN32||_WIN64
+            , my_thread_id(0)
+#endif // _WIN32||_WIN64
+        {}
+        
+        //! Constructs an object and executes f() in a new thread
+        template <class F> explicit tbb_thread_v3(F f) {
+            typedef internal::thread_closure_0<F> closure_type;
+            internal_start(closure_type::start_routine, new closure_type(f));
+        }
+        //! Constructs an object and executes f(x) in a new thread
+        template <class F, class X> tbb_thread_v3(F f, X x) {
+            typedef internal::thread_closure_1<F,X> closure_type;
+            internal_start(closure_type::start_routine, new closure_type(f,x));
+        }
+        //! Constructs an object and executes f(x,y) in a new thread
+        template <class F, class X, class Y> tbb_thread_v3(F f, X x, Y y) {
+            typedef internal::thread_closure_2<F,X,Y> closure_type;
+            internal_start(closure_type::start_routine, new closure_type(f,x,y));
+        }
+
+        tbb_thread_v3& operator=(tbb_thread_v3& x) {
+            if (joinable()) detach();
+            my_handle = x.my_handle;
+            x.my_handle = 0;
+#if _WIN32||_WIN64
+            my_thread_id = x.my_thread_id;
+            x.my_thread_id = 0;
+#endif // _WIN32||_WIN64
+            return *this;
+        }
+        bool joinable() const {return my_handle!=0; }
+        //! The completion of the thread represented by *this happens before join() returns.
+        void __TBB_EXPORTED_METHOD join();
+        //! When detach() returns, *this no longer represents the possibly continuing thread of execution.
+        void __TBB_EXPORTED_METHOD detach();
+        ~tbb_thread_v3() {if( joinable() ) detach();}
+        inline id get_id() const;
+        native_handle_type native_handle() { return my_handle; }
+    
+        //! The number of hardware thread contexts.
+        static unsigned __TBB_EXPORTED_FUNC hardware_concurrency();
+    private:
+        native_handle_type my_handle; 
+#if _WIN32||_WIN64
+        DWORD my_thread_id;
+#endif // _WIN32||_WIN64
+
+        /** Runs start_routine(closure) on another thread and sets my_handle to the handle of the created thread. */
+        void __TBB_EXPORTED_METHOD internal_start( __TBB_NATIVE_THREAD_ROUTINE_PTR(start_routine), 
+                             void* closure );
+        friend void __TBB_EXPORTED_FUNC move_v3( tbb_thread_v3& t1, tbb_thread_v3& t2 );
+        friend void tbb::swap( tbb_thread_v3& t1, tbb_thread_v3& t2 ); 
+    };
+        
+    class tbb_thread_v3::id { 
+#if _WIN32||_WIN64
+        DWORD my_id;
+        id( DWORD my_id ) : my_id(my_id) {}
+#else
+        pthread_t my_id;
+        id( pthread_t my_id ) : my_id(my_id) {}
+#endif // _WIN32||_WIN64
+        friend class tbb_thread_v3;
+    public:
+        id() : my_id(0) {}
+
+        friend bool operator==( tbb_thread_v3::id x, tbb_thread_v3::id y );
+        friend bool operator!=( tbb_thread_v3::id x, tbb_thread_v3::id y );
+        friend bool operator<( tbb_thread_v3::id x, tbb_thread_v3::id y );
+        friend bool operator<=( tbb_thread_v3::id x, tbb_thread_v3::id y );
+        friend bool operator>( tbb_thread_v3::id x, tbb_thread_v3::id y );
+        friend bool operator>=( tbb_thread_v3::id x, tbb_thread_v3::id y );
+        
+        template<class charT, class traits>
+        friend std::basic_ostream<charT, traits>&
+        operator<< (std::basic_ostream<charT, traits> &out, 
+                    tbb_thread_v3::id id)
+        {
+            out << id.my_id;
+            return out;
+        }
+        friend tbb_thread_v3::id __TBB_EXPORTED_FUNC thread_get_id_v3();
+    }; // tbb_thread_v3::id
+
+    tbb_thread_v3::id tbb_thread_v3::get_id() const {
+#if _WIN32||_WIN64
+        return id(my_thread_id);
+#else
+        return id(my_handle);
+#endif // _WIN32||_WIN64
+    }
+    void __TBB_EXPORTED_FUNC move_v3( tbb_thread_v3& t1, tbb_thread_v3& t2 );
+    tbb_thread_v3::id __TBB_EXPORTED_FUNC thread_get_id_v3();
+    void __TBB_EXPORTED_FUNC thread_yield_v3();
+    void __TBB_EXPORTED_FUNC thread_sleep_v3(const tick_count::interval_t &i);
+
+    inline bool operator==(tbb_thread_v3::id x, tbb_thread_v3::id y)
+    {
+        return x.my_id == y.my_id;
+    }
+    inline bool operator!=(tbb_thread_v3::id x, tbb_thread_v3::id y)
+    {
+        return x.my_id != y.my_id;
+    }
+    inline bool operator<(tbb_thread_v3::id x, tbb_thread_v3::id y)
+    {
+        return x.my_id < y.my_id;
+    }
+    inline bool operator<=(tbb_thread_v3::id x, tbb_thread_v3::id y)
+    {
+        return x.my_id <= y.my_id;
+    }
+    inline bool operator>(tbb_thread_v3::id x, tbb_thread_v3::id y)
+    {
+        return x.my_id > y.my_id;
+    }
+    inline bool operator>=(tbb_thread_v3::id x, tbb_thread_v3::id y)
+    {
+        return x.my_id >= y.my_id;
+    }
+
+} // namespace internal;
+
+//! Users reference thread class by name tbb_thread
+typedef internal::tbb_thread_v3 tbb_thread;
+
+using internal::operator==;
+using internal::operator!=;
+using internal::operator<;
+using internal::operator>;
+using internal::operator<=;
+using internal::operator>=;
+
+inline void move( tbb_thread& t1, tbb_thread& t2 ) {
+    internal::move_v3(t1, t2);
+}
+
+inline void swap( internal::tbb_thread_v3& t1, internal::tbb_thread_v3& t2 ) {
+    tbb::tbb_thread::native_handle_type h = t1.my_handle;
+    t1.my_handle = t2.my_handle;
+    t2.my_handle = h;
+#if _WIN32||_WIN64
+    DWORD i = t1.my_thread_id;
+    t1.my_thread_id = t2.my_thread_id;
+    t2.my_thread_id = i;
+#endif /* _WIN32||_WIN64 */
+}
+
+namespace this_tbb_thread {
+    inline tbb_thread::id get_id() { return internal::thread_get_id_v3(); }
+    //! Offers the operating system the opportunity to schedule another thread.
+    inline void yield() { internal::thread_yield_v3(); }
+    //! The current thread blocks at least until the time specified.
+    inline void sleep(const tick_count::interval_t &i) { 
+        internal::thread_sleep_v3(i);  
+    }
+}  // namespace this_tbb_thread
+
+} // namespace tbb
+
+#endif /* __TBB_tbb_thread_H */
diff --git a/dep/tbb/include/tbb/tbbmalloc_proxy.h b/dep/tbb/include/tbb/tbbmalloc_proxy.h
new file mode 100644
index 000000000..ebde35886
--- /dev/null
+++ b/dep/tbb/include/tbb/tbbmalloc_proxy.h
@@ -0,0 +1,74 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+/*
+Replacing the standard memory allocation routines in Microsoft* C/C++ RTL 
+(malloc/free, global new/delete, etc.) with the TBB memory allocator. 
+
+Include the following header to a source of any binary which is loaded during 
+application startup
+
+#include "tbb/tbbmalloc_proxy.h"
+
+or add following parameters to the linker options for the binary which is 
+loaded during application startup. It can be either exe-file or dll.
+
+For win32
+tbbmalloc_proxy.lib /INCLUDE:"___TBB_malloc_proxy"
+win64
+tbbmalloc_proxy.lib /INCLUDE:"__TBB_malloc_proxy"
+*/
+
+#ifndef __TBB_tbbmalloc_proxy_H
+#define __TBB_tbbmalloc_proxy_H
+
+#if _MSC_VER
+
+#ifdef _DEBUG
+    #pragma comment(lib, "tbbmalloc_proxy_debug.lib")
+#else
+    #pragma comment(lib, "tbbmalloc_proxy.lib")
+#endif
+
+#if defined(_WIN64)
+    #pragma comment(linker, "/include:__TBB_malloc_proxy")
+#else
+    #pragma comment(linker, "/include:___TBB_malloc_proxy")
+#endif
+
+#else
+/* Primarily to support MinGW */
+
+extern "C" void __TBB_malloc_proxy();
+struct __TBB_malloc_proxy_caller {
+    __TBB_malloc_proxy_caller() { __TBB_malloc_proxy(); }
+} volatile __TBB_malloc_proxy_helper_object;
+
+#endif // _MSC_VER
+
+#endif //__TBB_tbbmalloc_proxy_H
diff --git a/dep/tbb/include/tbb/tick_count.h b/dep/tbb/include/tbb/tick_count.h
new file mode 100644
index 000000000..495618278
--- /dev/null
+++ b/dep/tbb/include/tbb/tick_count.h
@@ -0,0 +1,155 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_tick_count_H
+#define __TBB_tick_count_H
+
+#include "tbb_stddef.h"
+
+#if _WIN32||_WIN64
+#include <windows.h>
+#elif __linux__
+#include <ctime>
+#else /* generic Unix */
+#include <sys/time.h>
+#endif /* (choice of OS) */
+
+namespace tbb {
+
+//! Absolute timestamp
+/** @ingroup timing */
+class tick_count {
+public:
+    //! Relative time interval.
+    class interval_t {
+        long long value;
+        explicit interval_t( long long value_ ) : value(value_) {}
+    public:
+        //! Construct a time interval representing zero time duration
+        interval_t() : value(0) {};
+
+        //! Construct a time interval representing sec seconds time  duration
+        explicit interval_t( double sec );
+
+        //! Return the length of a time interval in seconds
+        double seconds() const;
+
+        friend class tbb::tick_count;
+
+        //! Extract the intervals from the tick_counts and subtract them.
+        friend interval_t operator-( const tick_count& t1, const tick_count& t0 );
+
+        //! Add two intervals.
+        friend interval_t operator+( const interval_t& i, const interval_t& j ) {
+            return interval_t(i.value+j.value);
+        }
+
+        //! Subtract two intervals.
+        friend interval_t operator-( const interval_t& i, const interval_t& j ) {
+            return interval_t(i.value-j.value);
+        }
+
+        //! Accumulation operator
+        interval_t& operator+=( const interval_t& i ) {value += i.value; return *this;}
+
+        //! Subtraction operator
+        interval_t& operator-=( const interval_t& i ) {value -= i.value; return *this;}
+    };
+    
+    //! Construct an absolute timestamp initialized to zero.
+    tick_count() : my_count(0) {};
+
+    //! Return current time.
+    static tick_count now();
+    
+    //! Subtract two timestamps to get the time interval between
+    friend interval_t operator-( const tick_count& t1, const tick_count& t0 );
+
+private:
+    long long my_count;
+};
+
+inline tick_count tick_count::now() {
+    tick_count result;
+#if _WIN32||_WIN64
+    LARGE_INTEGER qpcnt;
+    QueryPerformanceCounter(&qpcnt);
+    result.my_count = qpcnt.QuadPart;
+#elif __linux__
+    struct timespec ts;
+#if TBB_USE_ASSERT
+    int status = 
+#endif /* TBB_USE_ASSERT */
+        clock_gettime( CLOCK_REALTIME, &ts );
+    __TBB_ASSERT( status==0, "CLOCK_REALTIME not supported" );
+    result.my_count = static_cast<long long>(1000000000UL)*static_cast<long long>(ts.tv_sec) + static_cast<long long>(ts.tv_nsec);
+#else /* generic Unix */
+    struct timeval tv;
+#if TBB_USE_ASSERT
+    int status = 
+#endif /* TBB_USE_ASSERT */
+        gettimeofday(&tv, NULL);
+    __TBB_ASSERT( status==0, "gettimeofday failed" );
+    result.my_count = static_cast<long long>(1000000)*static_cast<long long>(tv.tv_sec) + static_cast<long long>(tv.tv_usec);
+#endif /*(choice of OS) */
+    return result;
+}
+
+inline tick_count::interval_t::interval_t( double sec )
+{
+#if _WIN32||_WIN64
+    LARGE_INTEGER qpfreq;
+    QueryPerformanceFrequency(&qpfreq);
+    value = static_cast<long long>(sec*qpfreq.QuadPart);
+#elif __linux__
+    value = static_cast<long long>(sec*1E9);
+#else /* generic Unix */
+    value = static_cast<long long>(sec*1E6);
+#endif /* (choice of OS) */
+}
+
+inline tick_count::interval_t operator-( const tick_count& t1, const tick_count& t0 ) {
+    return tick_count::interval_t( t1.my_count-t0.my_count );
+}
+
+inline double tick_count::interval_t::seconds() const {
+#if _WIN32||_WIN64
+    LARGE_INTEGER qpfreq;
+    QueryPerformanceFrequency(&qpfreq);
+    return value/(double)qpfreq.QuadPart;
+#elif __linux__
+    return value*1E-9;
+#else /* generic Unix */
+    return value*1E-6;
+#endif /* (choice of OS) */
+}
+
+} // namespace tbb
+
+#endif /* __TBB_tick_count_H */
+
diff --git a/dep/tbb/index.html b/dep/tbb/index.html
new file mode 100644
index 000000000..d35f39238
--- /dev/null
+++ b/dep/tbb/index.html
@@ -0,0 +1,44 @@
+<HTML>
+<BODY>
+
+<H2>Overview</H2>
+Top level directory for Threading Building Blocks (TBB).
+<P>
+To build TBB, use the <A HREF=Makefile>top-level Makefile</A>; see also the <A HREF=build/index.html#build>build directions</A>.
+To port TBB to a new platform, operating system or architecture, see the <A HREF=build/index.html#port>porting directions</A>.
+</P>
+
+<H2>Files</H2>
+<DL>
+<DT><A HREF="Makefile">Makefile</A>
+<DD>Top-level Makefile for TBB.  See also the <A HREF=build/index.html#build>build directions</A>.
+</DL>
+
+<H2>Directories</H2>
+<DL>
+<DT><A HREF="doc/index.html">doc</A>
+<DD>Documentation for the library.
+<DT><A HREF="include/index.html">include</A>
+<DD>Include files required for compiling code that uses the library.
+<DT><A HREF="examples/index.html">examples</A>
+<DD>Examples of how to use the library.
+<DT><A HREF="src/index.html">src</A>
+<DD>Source code for the library.
+<DT><A HREF="build/index.html">build</A>
+<DD>Internal Makefile infrastructure for TBB.  Do not use directly; see the <A HREF=build/index.html#build>build directions</A>.
+<DT><A HREF="ia32">ia32</A>, <A HREF="intel64">intel64</A>, <A HREF="ia64">ia64</A>
+<DD>Platform-specific binary files for the library.
+</DL>
+
+<HR>
+<p></p>
+Copyright &copy; 2005-2009 Intel Corporation.  All Rights Reserved.
+<p></p>
+Intel, Pentium, Intel Xeon, Itanium, Intel XScale and VTune are 
+registered trademarks or trademarks of Intel Corporation or its 
+subsidiaries in the United States and other countries. 
+<p></p>
+* Other names and brands may be claimed as the property of others.
+</BODY>
+</HTML>
+
diff --git a/dep/tbb/src/Makefile b/dep/tbb/src/Makefile
new file mode 100644
index 000000000..c4ff8da30
--- /dev/null
+++ b/dep/tbb/src/Makefile
@@ -0,0 +1,219 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+tbb_root?=..
+examples_root:=$(tbb_root)/examples
+include $(tbb_root)/build/common.inc
+.PHONY: all tbb tbbmalloc test test_no_depends release debug examples clean
+
+all: release debug examples
+
+tbb: tbb_release tbb_debug
+
+tbbmalloc: tbbmalloc_release tbbmalloc_debug
+
+rml: rml_release rml_debug
+
+test: tbbmalloc_test_release test_release tbbmalloc_test_debug test_debug
+
+# Suffix _ni stands for "no ingnore", meaning that the first error during the test session will stop it
+test_ni: tbbmalloc_test_release_ni test_release_ni tbbmalloc_test_debug_ni test_debug_ni
+
+test_no_depends: tbbmalloc_test_release_no_depends test_release_no_depends tbbmalloc_test_debug_no_depends test_debug_no_depends
+	@echo done
+
+release: tbb_release tbbmalloc_release
+release: $(call cross_cfg,tbbmalloc_test_release) $(call cross_cfg,test_release)
+
+debug: tbb_debug tbbmalloc_debug
+debug: $(call cross_cfg,tbbmalloc_test_debug) $(call cross_cfg, test_debug)
+
+examples: tbb tbbmalloc examples_debug clean_examples examples_release
+
+clean: clean_release clean_debug clean_examples
+	@echo clean done
+
+.PHONY: full
+full:
+	$(MAKE) -s -i -r --no-print-directory -f Makefile tbb_root=. clean all
+ifeq ($(tbb_os),windows)
+	$(MAKE) -s -i -r --no-print-directory -f Makefile tbb_root=. compiler=icl clean all native_examples
+else
+	$(MAKE) -s -i -r --no-print-directory -f Makefile tbb_root=. compiler=icc clean all native_examples
+endif
+ifeq ($(arch),intel64)
+	$(MAKE) -s -i -r --no-print-directory -f Makefile tbb_root=. arch=ia32 clean all
+endif
+# it doesn't test compiler=icc arch=ia32 on intel64 systems due to enviroment settings of icc
+
+native_examples: tbb tbbmalloc
+	$(MAKE) -C $(examples_root) -r -f Makefile tbb_root=.. compiler=$(native_compiler) tbb_build_prefix=$(tbb_build_prefix) debug test
+	$(MAKE) -C $(examples_root) -r -f Makefile tbb_root=.. compiler=$(native_compiler) tbb_build_prefix=$(tbb_build_prefix) clean release test
+
+../examples/% examples/%::
+	$(MAKE) -C $(examples_root) -r -f Makefile tbb_root=.. $(subst examples/,,$(subst ../,,$@))
+
+debug_%:: cfg?=debug
+debug_%:: run_cmd=$(debugger)
+test_% stress_% time_%:: cfg?=release
+debug_% test_% stress_% time_%::
+	$(MAKE) -C "$(work_dir)_$(cfg)"  -r -f $(tbb_root)/build/Makefile.test cfg=$(cfg) run_cmd="$(run_cmd)" tbb_root=$(tbb_root) $@
+
+clean_%::
+ifeq ($(cfg),)
+	@$(MAKE) -C "$(work_dir)_release"  -r -f $(tbb_root)/build/Makefile.test cfg=release tbb_root=$(tbb_root) $@
+	@$(MAKE) -C "$(work_dir)_debug"  -r -f $(tbb_root)/build/Makefile.test cfg=debug tbb_root=$(tbb_root) $@
+else
+	@$(MAKE) -C "$(work_dir)_$(cfg)"  -r -f $(tbb_root)/build/Makefile.test cfg=$(cfg) tbb_root=$(tbb_root) $@
+endif
+
+.PHONY: tbb_release tbb_debug test_release test_debug test_release_no_depends test_debug_no_depends
+
+# do not delete double-space after -C option
+tbb_release: mkdir_release
+	$(MAKE) -C "$(work_dir)_release"  -r -f $(tbb_root)/build/Makefile.tbb cfg=release tbb_root=$(tbb_root)
+
+tbb_debug: mkdir_debug
+	$(MAKE) -C "$(work_dir)_debug"  -r -f $(tbb_root)/build/Makefile.tbb cfg=debug tbb_root=$(tbb_root)
+
+test_release: $(call cross_cfg,mkdir_release) $(call cross_cfg,tbb_release) test_release_no_depends
+test_release_no_depends: 
+	-$(MAKE) -C "$(call cross_cfg,$(work_dir)_release)"  -r -f $(tbb_root)/build/Makefile.test cfg=release tbb_root=$(tbb_root) 
+
+test_debug: $(call cross_cfg,mkdir_debug) $(call cross_cfg,tbb_debug) test_debug_no_depends
+test_debug_no_depends:
+	-$(MAKE) -C "$(call cross_cfg,$(work_dir)_debug)"  -r -f $(tbb_root)/build/Makefile.test cfg=debug tbb_root=$(tbb_root)
+
+test_release_ni: 
+	$(MAKE) -C "$(call cross_cfg,$(work_dir)_release)"  -r -f $(tbb_root)/build/Makefile.test cfg=release tbb_root=$(tbb_root) 
+
+test_debug_ni:
+	$(MAKE) -C "$(call cross_cfg,$(work_dir)_debug)"  -r -f $(tbb_root)/build/Makefile.test cfg=debug tbb_root=$(tbb_root)
+
+.PHONY: tbbmalloc_release tbbmalloc_debug
+.PHONY: tbbmalloc_dll_release tbbmalloc_dll_debug tbbmalloc_proxy_dll_release tbbmalloc_proxy_dll_debug
+.PHONY: tbbmalloc_test_release tbbmalloc_test_debug tbbmalloc_test_release_no_depends tbbmalloc_test_debug_no_depends
+
+tbbmalloc_release: mkdir_release
+	$(MAKE) -C "$(work_dir)_release"  -r -f $(tbb_root)/build/Makefile.tbbmalloc cfg=release malloc tbb_root=$(tbb_root)
+
+tbbmalloc_debug: mkdir_debug
+	$(MAKE) -C "$(work_dir)_debug"  -r -f $(tbb_root)/build/Makefile.tbbmalloc cfg=debug malloc tbb_root=$(tbb_root)
+
+tbbmalloc_dll_release: mkdir_release
+	$(MAKE) -C "$(work_dir)_release"  -r -f $(tbb_root)/build/Makefile.tbbmalloc cfg=release malloc_dll tbb_root=$(tbb_root)
+
+tbbmalloc_proxy_dll_release: mkdir_release
+	$(MAKE) -C "$(work_dir)_release"  -r -f $(tbb_root)/build/Makefile.tbbmalloc cfg=release  malloc_proxy_dll tbb_root=$(tbb_root)
+
+tbbmalloc_dll_debug: mkdir_debug
+	$(MAKE) -C "$(work_dir)_debug"  -r -f $(tbb_root)/build/Makefile.tbbmalloc cfg=debug malloc_dll tbb_root=$(tbb_root)
+
+tbbmalloc_proxy_dll_debug: mkdir_debug
+	$(MAKE) -C "$(work_dir)_debug"  -r -f $(tbb_root)/build/Makefile.tbbmalloc cfg=debug malloc_proxy_dll tbb_root=$(tbb_root)
+
+tbbmalloc_test_release: $(call cross_cfg,mkdir_release) $(call cross_cfg,tbbmalloc_release) tbbmalloc_test_release_no_depends
+tbbmalloc_test_release_no_depends:
+	-$(MAKE) -C "$(call cross_cfg,$(work_dir)_release)"  -r -f $(tbb_root)/build/Makefile.tbbmalloc cfg=release malloc_test tbb_root=$(tbb_root)
+
+tbbmalloc_test_debug: $(call cross_cfg,mkdir_debug) $(call cross_cfg,tbbmalloc_debug) tbbmalloc_test_debug_no_depends
+tbbmalloc_test_debug_no_depends:
+	-$(MAKE) -C "$(call cross_cfg,$(work_dir)_debug)"  -r -f $(tbb_root)/build/Makefile.tbbmalloc cfg=debug malloc_test tbb_root=$(tbb_root)
+
+tbbmalloc_test_release_ni: $(call cross_cfg,mkdir_release) $(call cross_cfg,tbbmalloc_release) tbbmalloc_test_release_no_depends
+	$(MAKE) -C "$(call cross_cfg,$(work_dir)_release)"  -r -f $(tbb_root)/build/Makefile.tbbmalloc cfg=release malloc_test tbb_root=$(tbb_root)
+
+tbbmalloc_test_debug_ni: $(call cross_cfg,mkdir_debug) $(call cross_cfg,tbbmalloc_debug) tbbmalloc_test_debug_no_depends
+	$(MAKE) -C "$(call cross_cfg,$(work_dir)_debug)"  -r -f $(tbb_root)/build/Makefile.tbbmalloc cfg=debug malloc_test tbb_root=$(tbb_root)
+
+.PHONY: rml_release rml_debug rml_test_release rml_test_debug
+.PHONY: rml_test_release_no_depends rml_test_debug_no_depends
+
+rml_release: mkdir_release
+	$(MAKE) -C "$(work_dir)_release"  -r -f $(tbb_root)/build/Makefile.rml cfg=release tbb_root=$(tbb_root) rml
+
+rml_debug: mkdir_debug
+	$(MAKE) -C "$(work_dir)_debug"  -r -f $(tbb_root)/build/Makefile.rml cfg=debug tbb_root=$(tbb_root) rml
+
+rml_test_release: $(call cross_cfg,mkdir_release) $(call cross_cfg,rml_release) rml_test_release_no_depends
+rml_test_release_no_depends:
+	-$(MAKE) -C "$(call cross_cfg,$(work_dir)_release)"  -r -f $(tbb_root)/build/Makefile.rml cfg=release rml_test tbb_root=$(tbb_root)
+
+rml_test_debug: $(call cross_cfg,mkdir_debug) $(call cross_cfg,rml_debug) rml_test_debug_no_depends
+rml_test_debug_no_depends:
+	-$(MAKE) -C "$(call cross_cfg,$(work_dir)_debug)"  -r -f $(tbb_root)/build/Makefile.rml cfg=debug rml_test tbb_root=$(tbb_root)
+
+.PHONY: examples_release examples_debug
+
+examples_release: tbb_release tbbmalloc_release
+	$(MAKE) -C $(examples_root) -r -f Makefile tbb_root=.. release test
+
+examples_debug: tbb_debug tbbmalloc_debug
+	$(MAKE) -C $(examples_root) -r -f Makefile tbb_root=.. debug test
+
+.PHONY: clean_release clean_debug clean_examples
+
+clean_release:
+	$(shell $(RM) $(work_dir)_release$(SLASH)*.* >$(NUL) 2>$(NUL))
+	$(shell $(RD) $(work_dir)_release >$(NUL) 2>$(NUL))
+
+clean_debug:
+	$(shell $(RM) $(work_dir)_debug$(SLASH)*.* >$(NUL) 2>$(NUL))
+	$(shell $(RD) $(work_dir)_debug >$(NUL) 2>$(NUL))
+
+clean_examples:
+	$(shell $(MAKE) -s -i -r -C $(examples_root) -f Makefile tbb_root=.. clean >$(NUL) 2>$(NUL))
+
+.PHONY: mkdir_release mkdir_debug codecov do_codecov info
+
+mkdir_release:
+	$(shell $(MD) "$(work_dir)_release" >$(NUL) 2>$(NUL))
+	$(if $(subst undefined,,$(origin_build_dir)),,cd "$(work_dir)_release" && $(MAKE_TBBVARS) $(tbb_build_prefix)_release)
+
+mkdir_debug:
+	$(shell $(MD) "$(work_dir)_debug" >$(NUL) 2>$(NUL))
+	$(if $(subst undefined,,$(origin_build_dir)),,cd "$(work_dir)_debug" && $(MAKE_TBBVARS) $(tbb_build_prefix)_debug)
+
+codecov: compiler=$(if $(findstring windows,$(tbb_os)),icl,icc)
+codecov:
+	$(MAKE) tbb_root=.. codecov=yes do_codecov
+
+do_codecov:
+	$(MAKE) RML=yes tbbmalloc_test_release test_release
+	$(MAKE) clean_test_* cfg=release
+	$(MAKE) RML=yes crosstest=yes tbbmalloc_test_debug test_debug
+	$(MAKE) clean_test_* cfg=release
+	$(MAKE) rml_test_release
+	$(MAKE) clean_test_* cfg=release
+	$(MAKE) crosstest=yes rml_test_debug
+	$(MAKE) -C "$(work_dir)_release" -r -f $(tbb_root)/build/Makefile.test tbb_root=$(tbb_root) cfg=release codecov=yes codecov_gen
+
+info:
+	@echo OS: $(tbb_os)
+	@echo arch=$(arch)
+	@echo compiler=$(compiler)
+	@echo runtime=$(runtime)
+	@echo tbb_build_prefix=$(tbb_build_prefix)
diff --git a/dep/tbb/src/index.html b/dep/tbb/src/index.html
new file mode 100644
index 000000000..5e53ce787
--- /dev/null
+++ b/dep/tbb/src/index.html
@@ -0,0 +1,32 @@
+<HTML>
+<BODY>
+
+<H2>Overview</H2>
+This directory contains the source code and unit tests for Threading Building Blocks.
+
+<H2>Directories</H2>
+<DL>
+<DT><A HREF="tbb">tbb</A>
+<DD>Source code of the TBB library core.
+<DT><A HREF="tbbmalloc">tbbmalloc</A>
+<DD>Source code of the TBB scalable memory allocator.
+<DT><A HREF="test">test</A>
+<DD>Source code of the TBB unit tests.
+<DT><A HREF="old">old</A>
+<DD>Source code of deprecated TBB entities that are still shipped as part of the TBB library for the sake of backward compatibility.
+<DT><A HREF="rml">rml</A>
+<DD>Source code of the Resource Management Layer (RML).
+</DL>
+
+<HR>
+<A HREF="../index.html">Up to parent directory</A>
+<p></p>
+Copyright &copy; 2005-2009 Intel Corporation.  All Rights Reserved.
+<p></p>
+Intel, Pentium, Intel Xeon, Itanium, Intel XScale and VTune are 
+registered trademarks or trademarks of Intel Corporation or its 
+subsidiaries in the United States and other countries. 
+<p></p>
+* Other names and brands may be claimed as the property of others.
+</BODY>
+</HTML>
diff --git a/dep/tbb/src/old/concurrent_queue_v2.cpp b/dep/tbb/src/old/concurrent_queue_v2.cpp
new file mode 100644
index 000000000..a6d0d6f4f
--- /dev/null
+++ b/dep/tbb/src/old/concurrent_queue_v2.cpp
@@ -0,0 +1,382 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "concurrent_queue_v2.h"
+#include "tbb/cache_aligned_allocator.h"
+#include "tbb/spin_mutex.h"
+#include "tbb/atomic.h"
+#include <cstring>
+#include <stdio.h>
+
+#if defined(_MSC_VER) && defined(_Wp64)
+    // Workaround for overzealous compiler warnings in /Wp64 mode
+    #pragma warning (disable: 4267)
+#endif
+
+#define RECORD_EVENTS 0
+
+using namespace std;
+
+namespace tbb {
+
+namespace internal {
+
+class concurrent_queue_rep;
+
+//! A queue using simple locking.
+/** For efficient, this class has no constructor.  
+    The caller is expected to zero-initialize it. */
+struct micro_queue {
+    typedef concurrent_queue_base::page page;
+    typedef size_t ticket;
+
+    atomic<page*> head_page;
+    atomic<ticket> head_counter;
+
+    atomic<page*> tail_page;
+    atomic<ticket> tail_counter;
+
+    spin_mutex page_mutex;
+    
+    class push_finalizer: no_copy {
+        ticket my_ticket;
+        micro_queue& my_queue;
+    public:
+        push_finalizer( micro_queue& queue, ticket k ) :
+            my_ticket(k), my_queue(queue)
+        {}
+        ~push_finalizer() {
+            my_queue.tail_counter = my_ticket;
+        }
+    };
+
+    void push( const void* item, ticket k, concurrent_queue_base& base );
+
+    class pop_finalizer: no_copy {
+        ticket my_ticket;
+        micro_queue& my_queue;
+        page* my_page; 
+    public:
+        pop_finalizer( micro_queue& queue, ticket k, page* p ) :
+            my_ticket(k), my_queue(queue), my_page(p)
+        {}
+        ~pop_finalizer() {
+            page* p = my_page;
+            if( p ) {
+                spin_mutex::scoped_lock lock( my_queue.page_mutex );
+                page* q = p->next;
+                my_queue.head_page = q;
+                if( !q ) {
+                    my_queue.tail_page = NULL;
+                }
+            }
+            my_queue.head_counter = my_ticket;
+            if( p ) 
+                operator delete(p);
+        }
+    };
+
+    bool pop( void* dst, ticket k, concurrent_queue_base& base );
+};
+
+//! Internal representation of a ConcurrentQueue.
+/** For efficient, this class has no constructor.  
+    The caller is expected to zero-initialize it. */
+class concurrent_queue_rep {
+public:
+    typedef size_t ticket;
+
+private:
+    friend struct micro_queue;
+
+    //! Approximately n_queue/golden ratio
+    static const size_t phi = 3;
+
+public:
+    //! Must be power of 2
+    static const size_t n_queue = 8; 
+
+    //! Map ticket to an array index
+    static size_t index( ticket k ) {
+        return k*phi%n_queue;
+    }
+
+    atomic<ticket> head_counter;
+    char pad1[NFS_MaxLineSize-sizeof(size_t)];
+
+    atomic<ticket> tail_counter;
+    char pad2[NFS_MaxLineSize-sizeof(ticket)];
+    micro_queue array[n_queue];    
+
+    micro_queue& choose( ticket k ) {
+        // The formula here approximates LRU in a cache-oblivious way.
+        return array[index(k)];
+    }
+
+    //! Value for effective_capacity that denotes unbounded queue.
+    static const ptrdiff_t infinite_capacity = ptrdiff_t(~size_t(0)/2);
+};
+
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+    // unary minus operator applied to unsigned type, result still unsigned
+    #pragma warning( push )
+    #pragma warning( disable: 4146 )
+#endif
+
+//------------------------------------------------------------------------
+// micro_queue
+//------------------------------------------------------------------------
+void micro_queue::push( const void* item, ticket k, concurrent_queue_base& base ) {
+    k &= -concurrent_queue_rep::n_queue;
+    page* p = NULL;
+    size_t index = (k/concurrent_queue_rep::n_queue & base.items_per_page-1);
+    if( !index ) {
+        size_t n = sizeof(page) + base.items_per_page*base.item_size;
+        p = static_cast<page*>(operator new( n ));
+        p->mask = 0;
+        p->next = NULL;
+    }
+    {
+        push_finalizer finalizer( *this, k+concurrent_queue_rep::n_queue ); 
+        spin_wait_until_eq( tail_counter, k );
+        if( p ) {
+            spin_mutex::scoped_lock lock( page_mutex );
+            if( page* q = tail_page )
+                q->next = p;
+            else
+                head_page = p; 
+            tail_page = p;
+        } else {
+            p = tail_page;
+        }
+        base.copy_item( *p, index, item );
+        // If no exception was thrown, mark item as present.
+        p->mask |= uintptr(1)<<index;
+    } 
+}
+
+bool micro_queue::pop( void* dst, ticket k, concurrent_queue_base& base ) {
+    k &= -concurrent_queue_rep::n_queue;
+    spin_wait_until_eq( head_counter, k );
+    spin_wait_while_eq( tail_counter, k );
+    page& p = *head_page;
+    __TBB_ASSERT( &p, NULL );
+    size_t index = (k/concurrent_queue_rep::n_queue & base.items_per_page-1);
+    bool success = false; 
+    {
+        pop_finalizer finalizer( *this, k+concurrent_queue_rep::n_queue, index==base.items_per_page-1 ? &p : NULL ); 
+        if( p.mask & uintptr(1)<<index ) {
+            success = true;
+            base.assign_and_destroy_item( dst, p, index );
+        }
+    }
+    return success;
+}
+
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+    #pragma warning( pop )
+#endif
+
+//------------------------------------------------------------------------
+// concurrent_queue_base
+//------------------------------------------------------------------------
+concurrent_queue_base::concurrent_queue_base( size_t item_size ) {
+    items_per_page = item_size<=8 ? 32 :
+                     item_size<=16 ? 16 : 
+                     item_size<=32 ? 8 :
+                     item_size<=64 ? 4 :
+                     item_size<=128 ? 2 :
+                     1;
+    my_capacity = size_t(-1)/(item_size>1 ? item_size : 2); 
+    my_rep = cache_aligned_allocator<concurrent_queue_rep>().allocate(1);
+    __TBB_ASSERT( (size_t)my_rep % NFS_GetLineSize()==0, "alignment error" );
+    __TBB_ASSERT( (size_t)&my_rep->head_counter % NFS_GetLineSize()==0, "alignment error" );
+    __TBB_ASSERT( (size_t)&my_rep->tail_counter % NFS_GetLineSize()==0, "alignment error" );
+    __TBB_ASSERT( (size_t)&my_rep->array % NFS_GetLineSize()==0, "alignment error" );
+    memset(my_rep,0,sizeof(concurrent_queue_rep));
+    this->item_size = item_size;
+}
+
+concurrent_queue_base::~concurrent_queue_base() {
+    size_t nq = my_rep->n_queue;
+    for( size_t i=0; i<nq; i++ ) {
+        page* tp = my_rep->array[i].tail_page;
+        __TBB_ASSERT( my_rep->array[i].head_page==tp, "at most one page should remain" );
+        if( tp!=NULL )
+            delete tp;
+    }
+    cache_aligned_allocator<concurrent_queue_rep>().deallocate(my_rep,1);
+}
+
+void concurrent_queue_base::internal_push( const void* src ) {
+    concurrent_queue_rep& r = *my_rep;
+    concurrent_queue_rep::ticket k  = r.tail_counter++;
+    ptrdiff_t e = my_capacity;
+    if( e<concurrent_queue_rep::infinite_capacity ) {
+        atomic_backoff backoff;
+        for(;;) {
+            if( (ptrdiff_t)(k-r.head_counter)<e ) break;
+            backoff.pause();
+            e = const_cast<volatile ptrdiff_t&>(my_capacity);
+        }
+    } 
+    r.choose(k).push(src,k,*this);
+}
+
+void concurrent_queue_base::internal_pop( void* dst ) {
+    concurrent_queue_rep& r = *my_rep;
+    concurrent_queue_rep::ticket k;
+    do {
+        k = r.head_counter++;
+    } while( !r.choose(k).pop(dst,k,*this) );
+}
+
+bool concurrent_queue_base::internal_pop_if_present( void* dst ) {
+    concurrent_queue_rep& r = *my_rep;
+    concurrent_queue_rep::ticket k;
+    do {
+        atomic_backoff backoff;
+        for(;;) {
+            k = r.head_counter;
+            if( r.tail_counter<=k ) {
+                // Queue is empty 
+                return false;
+            }
+            // Queue had item with ticket k when we looked.  Attempt to get that item.
+            if( r.head_counter.compare_and_swap(k+1,k)==k ) {
+                break;
+            }
+            // Another thread snatched the item, so pause and retry.
+            backoff.pause();
+        }
+    } while( !r.choose(k).pop(dst,k,*this) );
+    return true;
+}
+
+bool concurrent_queue_base::internal_push_if_not_full( const void* src ) {
+    concurrent_queue_rep& r = *my_rep;
+    atomic_backoff backoff;
+    concurrent_queue_rep::ticket k;
+    for(;;) {
+        k = r.tail_counter;
+        if( (ptrdiff_t)(k-r.head_counter)>=my_capacity ) {
+            // Queue is full
+            return false;
+        }
+        // Queue had empty slot with ticket k when we looked.  Attempt to claim that slot.
+        if( r.tail_counter.compare_and_swap(k+1,k)==k ) 
+            break;
+        // Another thread claimed the slot, so pause and retry.
+        backoff.pause();
+    }
+    r.choose(k).push(src,k,*this);
+    return true;
+}
+
+ptrdiff_t concurrent_queue_base::internal_size() const {
+    __TBB_ASSERT( sizeof(ptrdiff_t)<=sizeof(size_t), NULL );
+    return ptrdiff_t(my_rep->tail_counter-my_rep->head_counter);
+}
+
+void concurrent_queue_base::internal_set_capacity( ptrdiff_t capacity, size_t /*item_size*/ ) {
+    my_capacity = capacity<0 ? concurrent_queue_rep::infinite_capacity : capacity;
+}
+
+//------------------------------------------------------------------------
+// concurrent_queue_iterator_rep
+//------------------------------------------------------------------------
+class  concurrent_queue_iterator_rep: no_assign {
+public:
+    typedef concurrent_queue_rep::ticket ticket;
+    ticket head_counter;   
+    const concurrent_queue_base& my_queue;
+    concurrent_queue_base::page* array[concurrent_queue_rep::n_queue];
+    concurrent_queue_iterator_rep( const concurrent_queue_base& queue ) : 
+        head_counter(queue.my_rep->head_counter),
+        my_queue(queue)
+    {
+        const concurrent_queue_rep& rep = *queue.my_rep;
+        for( size_t k=0; k<concurrent_queue_rep::n_queue; ++k )
+            array[k] = rep.array[k].head_page;
+    }
+    //! Get pointer to kth element
+    void* choose( size_t k ) {
+        if( k==my_queue.my_rep->tail_counter )
+            return NULL;
+        else {
+            concurrent_queue_base::page* p = array[concurrent_queue_rep::index(k)];
+            __TBB_ASSERT(p,NULL);
+            size_t i = k/concurrent_queue_rep::n_queue & my_queue.items_per_page-1;
+            return static_cast<unsigned char*>(static_cast<void*>(p+1)) + my_queue.item_size*i;
+        }
+    }
+};
+
+//------------------------------------------------------------------------
+// concurrent_queue_iterator_base
+//------------------------------------------------------------------------
+concurrent_queue_iterator_base::concurrent_queue_iterator_base( const concurrent_queue_base& queue ) {
+    my_rep = new concurrent_queue_iterator_rep(queue);
+    my_item = my_rep->choose(my_rep->head_counter);
+}
+
+void concurrent_queue_iterator_base::assign( const concurrent_queue_iterator_base& other ) {
+    if( my_rep!=other.my_rep ) {
+        if( my_rep ) {
+            delete my_rep;
+            my_rep = NULL;
+        }
+        if( other.my_rep ) {
+            my_rep = new concurrent_queue_iterator_rep( *other.my_rep );
+        }
+    }
+    my_item = other.my_item;
+}
+
+void concurrent_queue_iterator_base::advance() {
+    __TBB_ASSERT( my_item, "attempt to increment iterator past end of queue" );  
+    size_t k = my_rep->head_counter;
+    const concurrent_queue_base& queue = my_rep->my_queue;
+    __TBB_ASSERT( my_item==my_rep->choose(k), NULL );
+    size_t i = k/concurrent_queue_rep::n_queue & queue.items_per_page-1;
+    if( i==queue.items_per_page-1 ) {
+        concurrent_queue_base::page*& root = my_rep->array[concurrent_queue_rep::index(k)];
+        root = root->next;
+    }
+    my_rep->head_counter = k+1;
+    my_item = my_rep->choose(k+1);
+}
+
+concurrent_queue_iterator_base::~concurrent_queue_iterator_base() {
+    delete my_rep;
+    my_rep = NULL;
+}
+
+} // namespace internal
+
+} // namespace tbb
diff --git a/dep/tbb/src/old/concurrent_queue_v2.h b/dep/tbb/src/old/concurrent_queue_v2.h
new file mode 100644
index 000000000..862384e32
--- /dev/null
+++ b/dep/tbb/src/old/concurrent_queue_v2.h
@@ -0,0 +1,328 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_concurrent_queue_H
+#define __TBB_concurrent_queue_H
+
+#include "tbb/tbb_stddef.h"
+#include <new>
+
+namespace tbb {
+
+template<typename T> class concurrent_queue;
+
+//! @cond INTERNAL
+namespace internal {
+
+class concurrent_queue_rep;
+class concurrent_queue_iterator_rep;
+class concurrent_queue_iterator_base;
+template<typename Container, typename Value> class concurrent_queue_iterator;
+
+//! For internal use only.
+/** Type-independent portion of concurrent_queue.
+    @ingroup containers */
+class concurrent_queue_base: no_copy {
+    //! Internal representation
+    concurrent_queue_rep* my_rep;
+
+    friend class concurrent_queue_rep;
+    friend struct micro_queue;
+    friend class concurrent_queue_iterator_rep;
+    friend class concurrent_queue_iterator_base;
+protected:
+    //! Prefix on a page
+    struct page {
+        page* next;
+        uintptr mask; 
+    };
+
+    //! Capacity of the queue
+    ptrdiff_t my_capacity;
+   
+    //! Always a power of 2
+    size_t items_per_page;
+
+    //! Size of an item
+    size_t item_size;
+private:
+    virtual void copy_item( page& dst, size_t index, const void* src ) = 0;
+    virtual void assign_and_destroy_item( void* dst, page& src, size_t index ) = 0;
+protected:
+    __TBB_EXPORTED_METHOD concurrent_queue_base( size_t item_size );
+    virtual __TBB_EXPORTED_METHOD ~concurrent_queue_base();
+
+    //! Enqueue item at tail of queue
+    void __TBB_EXPORTED_METHOD internal_push( const void* src );
+
+    //! Dequeue item from head of queue
+    void __TBB_EXPORTED_METHOD internal_pop( void* dst );
+
+    //! Attempt to enqueue item onto queue.
+    bool __TBB_EXPORTED_METHOD internal_push_if_not_full( const void* src );
+
+    //! Attempt to dequeue item from queue.
+    /** NULL if there was no item to dequeue. */
+    bool __TBB_EXPORTED_METHOD internal_pop_if_present( void* dst );
+
+    //! Get size of queue
+    ptrdiff_t __TBB_EXPORTED_METHOD internal_size() const;
+
+    void __TBB_EXPORTED_METHOD internal_set_capacity( ptrdiff_t capacity, size_t element_size );
+};
+
+//! Type-independent portion of concurrent_queue_iterator.
+/** @ingroup containers */
+class concurrent_queue_iterator_base {
+    //! Concurrentconcurrent_queue over which we are iterating.
+    /** NULL if one past last element in queue. */
+    concurrent_queue_iterator_rep* my_rep;
+
+    template<typename C, typename T, typename U>
+    friend bool operator==( const concurrent_queue_iterator<C,T>& i, const concurrent_queue_iterator<C,U>& j );
+
+    template<typename C, typename T, typename U>
+    friend bool operator!=( const concurrent_queue_iterator<C,T>& i, const concurrent_queue_iterator<C,U>& j );
+protected:
+    //! Pointer to current item
+    mutable void* my_item;
+
+    //! Default constructor
+    __TBB_EXPORTED_METHOD concurrent_queue_iterator_base() : my_rep(NULL), my_item(NULL) {}
+
+    //! Copy constructor
+    concurrent_queue_iterator_base( const concurrent_queue_iterator_base& i ) : my_rep(NULL), my_item(NULL) {
+        assign(i);
+    }
+
+    //! Construct iterator pointing to head of queue.
+    concurrent_queue_iterator_base( const concurrent_queue_base& queue );
+
+    //! Assignment
+    void __TBB_EXPORTED_METHOD assign( const concurrent_queue_iterator_base& i );
+
+    //! Advance iterator one step towards tail of queue.
+    void __TBB_EXPORTED_METHOD advance();
+
+    //! Destructor
+    __TBB_EXPORTED_METHOD ~concurrent_queue_iterator_base();
+};
+
+//! Meets requirements of a forward iterator for STL.
+/** Value is either the T or const T type of the container.
+    @ingroup containers */
+template<typename Container, typename Value>
+class concurrent_queue_iterator: public concurrent_queue_iterator_base {
+#if !defined(_MSC_VER) || defined(__INTEL_COMPILER)
+    template<typename T>
+    friend class ::tbb::concurrent_queue;
+#else
+public: // workaround for MSVC
+#endif 
+    //! Construct iterator pointing to head of queue.
+    concurrent_queue_iterator( const concurrent_queue_base& queue ) :
+        concurrent_queue_iterator_base(queue)
+    {
+    }
+public:
+    concurrent_queue_iterator() {}
+
+    /** If Value==Container::value_type, then this routine is the copy constructor. 
+        If Value==const Container::value_type, then this routine is a conversion constructor. */
+    concurrent_queue_iterator( const concurrent_queue_iterator<Container,typename Container::value_type>& other ) :
+        concurrent_queue_iterator_base(other)
+    {}
+
+    //! Iterator assignment
+    concurrent_queue_iterator& operator=( const concurrent_queue_iterator& other ) {
+        assign(other);
+        return *this;
+    }
+
+    //! Reference to current item 
+    Value& operator*() const {
+        return *static_cast<Value*>(my_item);
+    }
+
+    Value* operator->() const {return &operator*();}
+
+    //! Advance to next item in queue
+    concurrent_queue_iterator& operator++() {
+        advance();
+        return *this;
+    }
+
+    //! Post increment
+    Value* operator++(int) {
+        Value* result = &operator*();
+        operator++();
+        return result;
+    }
+}; // concurrent_queue_iterator
+
+template<typename C, typename T, typename U>
+bool operator==( const concurrent_queue_iterator<C,T>& i, const concurrent_queue_iterator<C,U>& j ) {
+    return i.my_item==j.my_item;
+}
+
+template<typename C, typename T, typename U>
+bool operator!=( const concurrent_queue_iterator<C,T>& i, const concurrent_queue_iterator<C,U>& j ) {
+    return i.my_item!=j.my_item;
+}
+
+} // namespace internal;
+//! @endcond
+
+//! A high-performance thread-safe queue.
+/** Multiple threads may each push and pop concurrently.
+    Assignment and copy construction are not allowed.
+    @ingroup containers */
+template<typename T>
+class concurrent_queue: public internal::concurrent_queue_base {
+    template<typename Container, typename Value> friend class internal::concurrent_queue_iterator;
+
+    //! Class used to ensure exception-safety of method "pop" 
+    class destroyer {
+        T& my_value;
+    public:
+        destroyer( T& value ) : my_value(value) {}
+        ~destroyer() {my_value.~T();}          
+    };
+
+    T& get_ref( page& page, size_t index ) {
+        __TBB_ASSERT( index<items_per_page, NULL );
+        return static_cast<T*>(static_cast<void*>(&page+1))[index];
+    }
+
+    /*override*/ virtual void copy_item( page& dst, size_t index, const void* src ) {
+        new( &get_ref(dst,index) ) T(*static_cast<const T*>(src)); 
+    }
+
+    /*override*/ virtual void assign_and_destroy_item( void* dst, page& src, size_t index ) {
+        T& from = get_ref(src,index);
+        destroyer d(from);
+        *static_cast<T*>(dst) = from;
+    }
+
+public:
+    //! Element type in the queue.
+    typedef T value_type;
+
+    //! Reference type
+    typedef T& reference;
+
+    //! Const reference type
+    typedef const T& const_reference;
+
+    //! Integral type for representing size of the queue.
+    /** Notice that the size_type is a signed integral type.
+        This is because the size can be negative if there are pending pops without corresponding pushes. */
+    typedef std::ptrdiff_t size_type;
+
+    //! Difference type for iterator
+    typedef std::ptrdiff_t difference_type;
+
+    //! Construct empty queue
+    concurrent_queue() : 
+        concurrent_queue_base( sizeof(T) )
+    {
+    }
+
+    //! Destroy queue
+    ~concurrent_queue();
+
+    //! Enqueue an item at tail of queue.
+    void push( const T& source ) {
+        internal_push( &source );
+    }
+
+    //! Dequeue item from head of queue.
+    /** Block until an item becomes available, and then dequeue it. */
+    void pop( T& destination ) {
+        internal_pop( &destination );
+    }
+
+    //! Enqueue an item at tail of queue if queue is not already full.
+    /** Does not wait for queue to become not full.
+        Returns true if item is pushed; false if queue was already full. */
+    bool push_if_not_full( const T& source ) {
+        return internal_push_if_not_full( &source );
+    }
+
+    //! Attempt to dequeue an item from head of queue.
+    /** Does not wait for item to become available.
+        Returns true if successful; false otherwise. */
+    bool pop_if_present( T& destination ) {
+        return internal_pop_if_present( &destination );
+    }
+
+    //! Return number of pushes minus number of pops.
+    /** Note that the result can be negative if there are pops waiting for the 
+        corresponding pushes.  The result can also exceed capacity() if there 
+        are push operations in flight. */
+    size_type size() const {return internal_size();}
+
+    //! Equivalent to size()<=0.
+    bool empty() const {return size()<=0;}
+
+    //! Maximum number of allowed elements
+    size_type capacity() const {
+        return my_capacity;
+    }
+
+    //! Set the capacity
+    /** Setting the capacity to 0 causes subsequent push_if_not_full operations to always fail,
+        and subsequent push operations to block forever. */
+    void set_capacity( size_type capacity ) {
+        internal_set_capacity( capacity, sizeof(T) );
+    }
+
+    typedef internal::concurrent_queue_iterator<concurrent_queue,T> iterator;
+    typedef internal::concurrent_queue_iterator<concurrent_queue,const T> const_iterator;
+
+    //------------------------------------------------------------------------
+    // The iterators are intended only for debugging.  They are slow and not thread safe.
+    //------------------------------------------------------------------------
+    iterator begin() {return iterator(*this);}
+    iterator end() {return iterator();}
+    const_iterator begin() const {return const_iterator(*this);}
+    const_iterator end() const {return const_iterator();}
+    
+}; 
+
+template<typename T>
+concurrent_queue<T>::~concurrent_queue() {
+    while( !empty() ) {
+        T value;
+        internal_pop(&value);
+    }
+}
+
+} // namespace tbb
+
+#endif /* __TBB_concurrent_queue_H */
diff --git a/dep/tbb/src/old/concurrent_vector_v2.cpp b/dep/tbb/src/old/concurrent_vector_v2.cpp
new file mode 100644
index 000000000..36186ea9a
--- /dev/null
+++ b/dep/tbb/src/old/concurrent_vector_v2.cpp
@@ -0,0 +1,266 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "concurrent_vector_v2.h"
+#include "tbb/tbb_machine.h"
+#include <stdexcept>
+#include "../tbb/itt_notify.h"
+#include "tbb/task.h"
+#include <cstring>
+
+
+#if defined(_MSC_VER) && defined(_Wp64)
+    // Workaround for overzealous compiler warnings in /Wp64 mode
+    #pragma warning (disable: 4267)
+#endif
+
+namespace tbb {
+
+namespace internal {
+
+void concurrent_vector_base::internal_grow_to_at_least( size_type new_size, size_type element_size, internal_array_op1 init ) {
+    size_type e = my_early_size;
+    while( e<new_size ) {
+        size_type f = my_early_size.compare_and_swap(new_size,e);
+        if( f==e ) {
+            internal_grow( e, new_size, element_size, init );
+            return;
+        }
+        e = f;
+    }
+}
+
+class concurrent_vector_base::helper {
+    static void extend_segment( concurrent_vector_base& v );
+public:
+    static segment_index_t find_segment_end( const concurrent_vector_base& v ) {
+        const size_t pointers_per_long_segment = sizeof(void*)==4 ? 32 : 64;
+        const size_t pointers_per_short_segment = 2;
+        //unsigned u = v.my_segment==v.my_storage ? pointers_per_short_segment : pointers_per_long_segment;
+        segment_index_t u = v.my_segment==(&(v.my_storage[0])) ? pointers_per_short_segment : pointers_per_long_segment;
+        segment_index_t k = 0;
+        while( k<u && v.my_segment[k].array )
+            ++k;
+        return k;
+    }
+    static void extend_segment_if_necessary( concurrent_vector_base& v, size_t k ) {
+        const size_t pointers_per_short_segment = 2;
+        if( k>=pointers_per_short_segment && v.my_segment==v.my_storage ) {
+            extend_segment(v);
+        }
+    }
+};
+
+void concurrent_vector_base::helper::extend_segment( concurrent_vector_base& v ) {
+    const size_t pointers_per_long_segment = sizeof(void*)==4 ? 32 : 64;
+    segment_t* s = (segment_t*)NFS_Allocate( pointers_per_long_segment, sizeof(segment_t), NULL );
+    std::memset( s, 0, pointers_per_long_segment*sizeof(segment_t) );
+    // If other threads are trying to set pointers in the short segment, wait for them to finish their
+    // assigments before we copy the short segment to the long segment.
+    atomic_backoff backoff;
+    while( !v.my_storage[0].array || !v.my_storage[1].array ) {
+        backoff.pause();
+    }
+    s[0] = v.my_storage[0]; 
+    s[1] = v.my_storage[1]; 
+    if( v.my_segment.compare_and_swap( s, v.my_storage )!=v.my_storage ) 
+        NFS_Free(s);
+}
+
+concurrent_vector_base::size_type concurrent_vector_base::internal_capacity() const {
+    return segment_base( helper::find_segment_end(*this) );
+}
+
+void concurrent_vector_base::internal_reserve( size_type n, size_type element_size, size_type max_size ) {
+    if( n>max_size ) {
+        throw std::length_error("argument to ConcurrentVector::reserve exceeds ConcurrentVector::max_size()");
+    }
+    for( segment_index_t k = helper::find_segment_end(*this); segment_base(k)<n; ++k ) {
+        helper::extend_segment_if_necessary(*this,k);
+        size_t m = segment_size(k);
+        __TBB_ASSERT( !my_segment[k].array, "concurrent operation during reserve(...)?" );
+        my_segment[k].array = NFS_Allocate( m, element_size, NULL );
+    }
+}
+
+void concurrent_vector_base::internal_copy( const concurrent_vector_base& src, size_type element_size, internal_array_op2 copy ) {
+    size_type n = src.my_early_size;
+    my_early_size = n;
+    my_segment = my_storage;
+    if( n ) {
+        size_type b;
+        for( segment_index_t k=0; (b=segment_base(k))<n; ++k ) {
+            helper::extend_segment_if_necessary(*this,k);
+            size_t m = segment_size(k);
+            __TBB_ASSERT( !my_segment[k].array, "concurrent operation during copy construction?" );
+            my_segment[k].array = NFS_Allocate( m, element_size, NULL );
+            if( m>n-b ) m = n-b; 
+            copy( my_segment[k].array, src.my_segment[k].array, m );
+        }
+    }
+}
+
+void concurrent_vector_base::internal_assign( const concurrent_vector_base& src, size_type element_size, internal_array_op1 destroy, internal_array_op2 assign, internal_array_op2 copy ) {
+    size_type n = src.my_early_size;
+    while( my_early_size>n ) { 
+        segment_index_t k = segment_index_of( my_early_size-1 );
+        size_type b=segment_base(k);
+        size_type new_end = b>=n ? b : n;
+        __TBB_ASSERT( my_early_size>new_end, NULL );
+        destroy( (char*)my_segment[k].array+element_size*(new_end-b), my_early_size-new_end );
+        my_early_size = new_end;
+    }
+    size_type dst_initialized_size = my_early_size;
+    my_early_size = n;
+    size_type b;
+    for( segment_index_t k=0; (b=segment_base(k))<n; ++k ) {
+        helper::extend_segment_if_necessary(*this,k);
+        size_t m = segment_size(k);
+        if( !my_segment[k].array )
+            my_segment[k].array = NFS_Allocate( m, element_size, NULL );
+        if( m>n-b ) m = n-b; 
+        size_type a = 0;
+        if( dst_initialized_size>b ) {
+            a = dst_initialized_size-b;
+            if( a>m ) a = m;
+            assign( my_segment[k].array, src.my_segment[k].array, a );
+            m -= a; 
+            a *= element_size; 
+        }
+        if( m>0 ) 
+            copy( (char*)my_segment[k].array+a, (char*)src.my_segment[k].array+a, m );
+    }
+    __TBB_ASSERT( src.my_early_size==n, "detected use of ConcurrentVector::operator= with right side that was concurrently modified" );
+}
+
+void* concurrent_vector_base::internal_push_back( size_type element_size, size_type& index ) {
+    __TBB_ASSERT( sizeof(my_early_size)==sizeof(reference_count), NULL );
+    //size_t tmp = __TBB_FetchAndIncrementWacquire(*(tbb::internal::reference_count*)&my_early_size);
+    size_t tmp = __TBB_FetchAndIncrementWacquire((tbb::internal::reference_count*)&my_early_size);
+    index = tmp;
+    segment_index_t k_old = segment_index_of( tmp );
+    size_type base = segment_base(k_old);
+    helper::extend_segment_if_necessary(*this,k_old);
+    segment_t& s = my_segment[k_old];
+    void* array = s.array;
+    if( !array ) {
+        // FIXME - consider factoring this out and share with internal_grow_by
+	if( base==tmp ) {
+	    __TBB_ASSERT( !s.array, NULL );
+            size_t n = segment_size(k_old);
+	    array = NFS_Allocate( n, element_size, NULL );
+	    ITT_NOTIFY( sync_releasing, &s.array );
+	    s.array = array;
+	} else {
+	    ITT_NOTIFY(sync_prepare, &s.array);
+	    spin_wait_while_eq( s.array, (void*)0 );
+	    ITT_NOTIFY(sync_acquired, &s.array);
+	    array = s.array;
+	}
+    }
+    size_type j_begin = tmp-base;
+    return (void*)((char*)array+element_size*j_begin);
+}
+
+concurrent_vector_base::size_type concurrent_vector_base::internal_grow_by( size_type delta, size_type element_size, internal_array_op1 init ) {
+    size_type result = my_early_size.fetch_and_add(delta);
+    internal_grow( result, result+delta, element_size, init );
+    return result;
+}
+
+void concurrent_vector_base::internal_grow( const size_type start, size_type finish, size_type element_size, internal_array_op1 init ) {
+    __TBB_ASSERT( start<finish, "start must be less than finish" );
+    size_t tmp = start;
+    do {
+        segment_index_t k_old = segment_index_of( tmp );
+        size_type base = segment_base(k_old);
+        size_t n = segment_size(k_old);
+        helper::extend_segment_if_necessary(*this,k_old);
+        segment_t& s = my_segment[k_old];
+        void* array = s.array;
+        if( !array ) {
+            if( base==tmp ) {
+                __TBB_ASSERT( !s.array, NULL );
+                array = NFS_Allocate( n, element_size, NULL );
+                ITT_NOTIFY( sync_releasing, &s.array );
+                s.array = array;
+            } else {
+                ITT_NOTIFY(sync_prepare, &s.array);
+                spin_wait_while_eq( s.array, (void*)0 );
+                ITT_NOTIFY(sync_acquired, &s.array);
+                array = s.array;
+            }
+        }
+        size_type j_begin = tmp-base;
+        size_type j_end = n > finish-base ? finish-base : n;
+        (*init)( (void*)((char*)array+element_size*j_begin), j_end-j_begin );
+        tmp = base+j_end;
+    } while( tmp<finish );
+}
+
+void concurrent_vector_base::internal_clear( internal_array_op1 destroy, bool reclaim_storage ) {
+    // Set "my_early_size" early, so that subscripting errors can be caught.
+    // FIXME - doing so may be hurting exception saftey
+    __TBB_ASSERT( my_segment, NULL );
+    size_type finish = my_early_size;
+    my_early_size = 0;
+    while( finish>0 ) {
+        segment_index_t k_old = segment_index_of(finish-1);
+        segment_t& s = my_segment[k_old];
+        __TBB_ASSERT( s.array, NULL );
+        size_type base = segment_base(k_old);
+        size_type j_end = finish-base;
+        __TBB_ASSERT( j_end, NULL );
+        (*destroy)( s.array, j_end );
+        finish = base;
+    }
+
+    // Free the arrays
+    if( reclaim_storage ) {
+        size_t k = helper::find_segment_end(*this);
+        while( k>0 ) {
+            --k;
+            segment_t& s = my_segment[k];
+            void* array = s.array;
+            s.array = NULL;
+            NFS_Free( array );
+        }
+        // Clear short segment.  
+        my_storage[0].array = NULL;
+        my_storage[1].array = NULL;
+        segment_t* s = my_segment;
+        if( s!=my_storage ) {
+            my_segment = my_storage;
+            NFS_Free( s );
+        } 
+    }
+}
+
+} // namespace internal
+
+} // tbb
diff --git a/dep/tbb/src/old/concurrent_vector_v2.h b/dep/tbb/src/old/concurrent_vector_v2.h
new file mode 100644
index 000000000..a9c3a3be4
--- /dev/null
+++ b/dep/tbb/src/old/concurrent_vector_v2.h
@@ -0,0 +1,512 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_concurrent_vector_H
+#define __TBB_concurrent_vector_H
+
+#include "tbb/tbb_stddef.h"
+#include <iterator>
+#include <new>
+#include "tbb/atomic.h"
+#include "tbb/cache_aligned_allocator.h"
+#include "tbb/blocked_range.h"
+
+#include "tbb/tbb_machine.h"
+
+namespace tbb {
+
+template<typename T>
+class concurrent_vector;
+
+//! @cond INTERNAL
+namespace internal {
+
+    //! Base class of concurrent vector implementation.
+    /** @ingroup containers */
+    class concurrent_vector_base {
+    protected:
+        typedef unsigned long segment_index_t;
+
+        //! Log2 of "min_segment_size".  
+        static const int lg_min_segment_size = 4;
+
+        //! Minimum size (in physical items) of a segment.
+        static const int min_segment_size = segment_index_t(1)<<lg_min_segment_size;
+      
+        static segment_index_t segment_index_of( size_t index ) { 
+            uintptr i = index|1<<(lg_min_segment_size-1);
+            uintptr j = __TBB_Log2(i); 
+            return segment_index_t(j-(lg_min_segment_size-1)); 
+        }
+
+        static segment_index_t segment_base( segment_index_t k ) { 
+            return min_segment_size>>1<<k & -min_segment_size;
+        }
+
+        static segment_index_t segment_size( segment_index_t k ) {
+            segment_index_t result = k==0 ? min_segment_size : min_segment_size/2<<k;
+            __TBB_ASSERT( result==segment_base(k+1)-segment_base(k), NULL );
+            return result;
+        }
+
+        typedef size_t size_type;
+
+        void __TBB_EXPORTED_METHOD internal_reserve( size_type n, size_type element_size, size_type max_size );
+
+        size_type __TBB_EXPORTED_METHOD internal_capacity() const;
+
+        //! Requested size of vector
+        atomic<size_type> my_early_size;
+
+        /** Can be zero-initialized. */
+        struct segment_t {
+            /** Declared volatile because in weak memory model, must have ld.acq/st.rel  */
+            void* volatile array;
+#if TBB_DO_ASSERT
+            ~segment_t() {
+                __TBB_ASSERT( !array, "should have been set to NULL by clear" );
+            }
+#endif /* TBB_DO_ASSERT */
+        };
+
+        atomic<segment_t*> my_segment;
+
+        segment_t my_storage[2];
+
+        concurrent_vector_base() {
+            my_early_size = 0;
+            my_storage[0].array = NULL;
+            my_storage[1].array = NULL;
+            my_segment = my_storage;
+        }
+
+        //! An operation on an n-lement array starting at begin.
+        typedef void(__TBB_EXPORTED_FUNC *internal_array_op1)(void* begin, size_type n );
+
+        //! An operation on n-element destination array and n-element source array.
+        typedef void(__TBB_EXPORTED_FUNC *internal_array_op2)(void* dst, const void* src, size_type n );
+
+        void __TBB_EXPORTED_METHOD internal_grow_to_at_least( size_type new_size, size_type element_size, internal_array_op1 init );
+        void internal_grow( size_type start, size_type finish, size_type element_size, internal_array_op1 init );
+        size_type __TBB_EXPORTED_METHOD internal_grow_by( size_type delta, size_type element_size, internal_array_op1 init );
+        void* __TBB_EXPORTED_METHOD internal_push_back( size_type element_size, size_type& index );
+        void __TBB_EXPORTED_METHOD internal_clear( internal_array_op1 destroy, bool reclaim_storage );
+        void __TBB_EXPORTED_METHOD internal_copy( const concurrent_vector_base& src, size_type element_size, internal_array_op2 copy );
+        void __TBB_EXPORTED_METHOD internal_assign( const concurrent_vector_base& src, size_type element_size,
+                              internal_array_op1 destroy, internal_array_op2 assign, internal_array_op2 copy );
+private:
+        //! Private functionality that does not cross DLL boundary.
+        class helper;
+
+        friend class helper;
+    };
+
+    //! Meets requirements of a forward iterator for STL and a Value for a blocked_range.*/
+    /** Value is either the T or const T type of the container.
+        @ingroup containers */
+    template<typename Container, typename Value>
+    class vector_iterator 
+#if defined(_WIN64) && defined(_MSC_VER) 
+        // Ensure that Microsoft's internal template function _Val_type works correctly.
+        : public std::iterator<std::random_access_iterator_tag,Value>
+#endif /* defined(_WIN64) && defined(_MSC_VER) */
+    {
+        //! concurrent_vector over which we are iterating.
+        Container* my_vector;
+
+        //! Index into the vector 
+        size_t my_index;
+
+        //! Caches my_vector-&gt;internal_subscript(my_index)
+        /** NULL if cached value is not available */
+        mutable Value* my_item;
+    
+        template<typename C, typename T, typename U>
+        friend bool operator==( const vector_iterator<C,T>& i, const vector_iterator<C,U>& j );
+
+        template<typename C, typename T, typename U>
+        friend bool operator<( const vector_iterator<C,T>& i, const vector_iterator<C,U>& j );
+
+        template<typename C, typename T, typename U>
+        friend ptrdiff_t operator-( const vector_iterator<C,T>& i, const vector_iterator<C,U>& j );
+    
+        template<typename C, typename U>
+        friend class internal::vector_iterator;
+
+#if !defined(_MSC_VER) || defined(__INTEL_COMPILER)
+        template<typename T>
+        friend class tbb::concurrent_vector;
+#else
+public: // workaround for MSVC
+#endif 
+
+        vector_iterator( const Container& vector, size_t index ) : 
+            my_vector(const_cast<Container*>(&vector)), 
+            my_index(index), 
+            my_item(NULL)
+        {}
+
+    public:
+        //! Default constructor
+        vector_iterator() : my_vector(NULL), my_index(~size_t(0)), my_item(NULL) {}
+
+        vector_iterator( const vector_iterator<Container,typename Container::value_type>& other ) :
+            my_vector(other.my_vector),
+            my_index(other.my_index),
+            my_item(other.my_item)
+        {}
+
+        vector_iterator operator+( ptrdiff_t offset ) const {
+            return vector_iterator( *my_vector, my_index+offset );
+        }
+        friend vector_iterator operator+( ptrdiff_t offset, const vector_iterator& v ) {
+            return vector_iterator( *v.my_vector, v.my_index+offset );
+        }
+        vector_iterator operator+=( ptrdiff_t offset ) {
+            my_index+=offset;
+            my_item = NULL;
+            return *this;
+        }
+        vector_iterator operator-( ptrdiff_t offset ) const {
+            return vector_iterator( *my_vector, my_index-offset );
+        }
+        vector_iterator operator-=( ptrdiff_t offset ) {
+            my_index-=offset;
+            my_item = NULL;
+            return *this;
+        }
+        Value& operator*() const {
+            Value* item = my_item;
+            if( !item ) {
+                item = my_item = &my_vector->internal_subscript(my_index);
+            }
+            __TBB_ASSERT( item==&my_vector->internal_subscript(my_index), "corrupt cache" );
+            return *item;
+        }
+        Value& operator[]( ptrdiff_t k ) const {
+            return my_vector->internal_subscript(my_index+k);
+        }
+        Value* operator->() const {return &operator*();}
+
+        //! Pre increment
+        vector_iterator& operator++() {
+            size_t k = ++my_index;
+            if( my_item ) {
+                // Following test uses 2's-complement wizardry and fact that
+                // min_segment_size is a power of 2.
+                if( (k& k-concurrent_vector<Container>::min_segment_size)==0 ) {
+                    // k is a power of two that is at least k-min_segment_size  
+                    my_item= NULL;
+                } else {
+                    ++my_item;
+                }
+            }
+            return *this;
+        }
+
+        //! Pre decrement
+        vector_iterator& operator--() {
+            __TBB_ASSERT( my_index>0, "operator--() applied to iterator already at beginning of concurrent_vector" ); 
+            size_t k = my_index--;
+            if( my_item ) {
+                // Following test uses 2's-complement wizardry and fact that
+                // min_segment_size is a power of 2.
+                if( (k& k-concurrent_vector<Container>::min_segment_size)==0 ) {
+                    // k is a power of two that is at least k-min_segment_size  
+                    my_item= NULL;
+                } else {
+                    --my_item;
+                }
+            }
+            return *this;
+        }
+
+        //! Post increment
+        vector_iterator operator++(int) {
+            vector_iterator result = *this;
+            operator++();
+            return result;
+        }
+
+        //! Post decrement
+        vector_iterator operator--(int) {
+            vector_iterator result = *this;
+            operator--();
+            return result;
+        }
+
+        // STL support
+
+        typedef ptrdiff_t difference_type;
+        typedef Value value_type;
+        typedef Value* pointer;
+        typedef Value& reference;
+        typedef std::random_access_iterator_tag iterator_category;
+    };
+
+    template<typename Container, typename T, typename U>
+    bool operator==( const vector_iterator<Container,T>& i, const vector_iterator<Container,U>& j ) {
+        return i.my_index==j.my_index;
+    }
+
+    template<typename Container, typename T, typename U>
+    bool operator!=( const vector_iterator<Container,T>& i, const vector_iterator<Container,U>& j ) {
+        return !(i==j);
+    }
+
+    template<typename Container, typename T, typename U>
+    bool operator<( const vector_iterator<Container,T>& i, const vector_iterator<Container,U>& j ) {
+        return i.my_index<j.my_index;
+    }
+
+    template<typename Container, typename T, typename U>
+    bool operator>( const vector_iterator<Container,T>& i, const vector_iterator<Container,U>& j ) {
+        return j<i;
+    }
+
+    template<typename Container, typename T, typename U>
+    bool operator>=( const vector_iterator<Container,T>& i, const vector_iterator<Container,U>& j ) {
+        return !(i<j);
+    }
+
+    template<typename Container, typename T, typename U>
+    bool operator<=( const vector_iterator<Container,T>& i, const vector_iterator<Container,U>& j ) {
+        return !(j<i);
+    }
+
+    template<typename Container, typename T, typename U>
+    ptrdiff_t operator-( const vector_iterator<Container,T>& i, const vector_iterator<Container,U>& j ) {
+        return ptrdiff_t(i.my_index)-ptrdiff_t(j.my_index);
+    }
+
+} // namespace internal
+//! @endcond
+
+//! Concurrent vector
+/** @ingroup containers */
+template<typename T>
+class concurrent_vector: private internal::concurrent_vector_base {
+public:
+    using internal::concurrent_vector_base::size_type;
+private:
+    template<typename I>
+    class generic_range_type: public blocked_range<I> {
+    public:
+        typedef T value_type;
+        typedef T& reference;
+        typedef const T& const_reference;
+        typedef I iterator;
+        typedef ptrdiff_t difference_type;
+        generic_range_type( I begin_, I end_, size_t grainsize ) : blocked_range<I>(begin_,end_,grainsize) {} 
+        generic_range_type( generic_range_type& r, split ) : blocked_range<I>(r,split()) {}
+    };
+
+    template<typename C, typename U>
+    friend class internal::vector_iterator;
+public:
+    typedef T& reference;
+    typedef const T& const_reference;
+
+    //! Construct empty vector.
+    concurrent_vector() {}
+
+    //! Copy a vector.
+    concurrent_vector( const concurrent_vector& vector ) {internal_copy(vector,sizeof(T),&copy_array);}
+
+    //! Assignment 
+    concurrent_vector& operator=( const concurrent_vector& vector ) {
+        if( this!=&vector )
+            internal_assign(vector,sizeof(T),&destroy_array,&assign_array,&copy_array);
+        return *this;
+    }
+
+    //! Clear and destroy vector.
+    ~concurrent_vector() {internal_clear(&destroy_array,/*reclaim_storage=*/true);}
+
+    //------------------------------------------------------------------------
+    // Concurrent operations
+    //------------------------------------------------------------------------
+    //! Grow by "delta" elements.
+    /** Returns old size. */
+    size_type grow_by( size_type delta ) {
+        return delta ? internal_grow_by( delta, sizeof(T), &initialize_array ) : my_early_size;
+    }
+
+    //! Grow array until it has at least n elements.
+    void grow_to_at_least( size_type n ) {
+        if( my_early_size<n )
+            internal_grow_to_at_least( n, sizeof(T), &initialize_array );
+    };
+
+    //! Push item 
+    size_type push_back( const_reference item ) {
+        size_type k;
+        new( internal_push_back(sizeof(T),k) ) T(item);
+        return k;
+    }
+
+    //! Get reference to element at given index.
+    /** This method is thread-safe for concurrent reads, and also while growing the vector,
+        as long as the calling thread has checked that index&lt;size(). */
+    reference operator[]( size_type index ) {
+        return internal_subscript(index);
+    }
+
+    //! Get const reference to element at given index.
+    const_reference operator[]( size_type index ) const {
+        return internal_subscript(index);
+    }
+
+    //------------------------------------------------------------------------
+    // Parallel algorithm support
+    //------------------------------------------------------------------------
+    typedef internal::vector_iterator<concurrent_vector,T> iterator;
+    typedef internal::vector_iterator<concurrent_vector,const T> const_iterator;
+
+#if !defined(_MSC_VER) || _CPPLIB_VER>=300 
+    // Assume ISO standard definition of std::reverse_iterator
+    typedef std::reverse_iterator<iterator> reverse_iterator;
+    typedef std::reverse_iterator<const_iterator> const_reverse_iterator;
+#else
+    // Use non-standard std::reverse_iterator
+    typedef std::reverse_iterator<iterator,T,T&,T*> reverse_iterator;
+    typedef std::reverse_iterator<const_iterator,T,const T&,const T*> const_reverse_iterator;
+#endif /* defined(_MSC_VER) && (_MSC_VER<1300) */
+
+    typedef generic_range_type<iterator> range_type;
+    typedef generic_range_type<const_iterator> const_range_type;
+
+    range_type range( size_t grainsize = 1 ) {
+        return range_type( begin(), end(), grainsize );
+    }
+
+    const_range_type range( size_t grainsize = 1 ) const {
+        return const_range_type( begin(), end(), grainsize );
+    }
+
+    //------------------------------------------------------------------------
+    // Capacity
+    //------------------------------------------------------------------------
+    //! Return size of vector.
+    size_type size() const {return my_early_size;}
+
+    //! Return size of vector.
+    bool empty() const {return !my_early_size;}
+
+    //! Maximum size to which array can grow without allocating more memory.
+    size_type capacity() const {return internal_capacity();}
+
+    //! Allocate enough space to grow to size n without having to allocate more memory later.
+    /** Like most of the methods provided for STL compatibility, this method is *not* thread safe. 
+        The capacity afterwards may be bigger than the requested reservation. */
+    void reserve( size_type n ) {
+        if( n )
+            internal_reserve(n, sizeof(T), max_size());
+    }
+
+    //! Upper bound on argument to reserve.
+    size_type max_size() const {return (~size_t(0))/sizeof(T);}
+
+    //------------------------------------------------------------------------
+    // STL support
+    //------------------------------------------------------------------------
+
+    typedef T value_type;
+    typedef ptrdiff_t difference_type;
+
+    iterator begin() {return iterator(*this,0);}
+    iterator end() {return iterator(*this,size());}
+    const_iterator begin() const {return const_iterator(*this,0);}
+    const_iterator end() const {return const_iterator(*this,size());}
+
+    reverse_iterator rbegin() {return reverse_iterator(end());}
+    reverse_iterator rend() {return reverse_iterator(begin());}
+    const_reverse_iterator rbegin() const {return const_reverse_iterator(end());}
+    const_reverse_iterator rend() const {return const_reverse_iterator(begin());}
+
+    //! Not thread safe
+    /** Does not change capacity. */
+    void clear() {internal_clear(&destroy_array,/*reclaim_storage=*/false);}       
+private:
+    //! Get reference to element at given index.
+    T& internal_subscript( size_type index ) const;
+
+    //! Construct n instances of T, starting at "begin".
+    static void __TBB_EXPORTED_FUNC initialize_array( void* begin, size_type n );
+
+    //! Construct n instances of T, starting at "begin".
+    static void __TBB_EXPORTED_FUNC copy_array( void* dst, const void* src, size_type n );
+
+    //! Assign n instances of T, starting at "begin".
+    static void __TBB_EXPORTED_FUNC assign_array( void* dst, const void* src, size_type n );
+
+    //! Destroy n instances of T, starting at "begin".
+    static void __TBB_EXPORTED_FUNC destroy_array( void* begin, size_type n );
+};
+
+template<typename T>
+T& concurrent_vector<T>::internal_subscript( size_type index ) const {
+    __TBB_ASSERT( index<size(), "index out of bounds" );
+    segment_index_t k = segment_index_of( index );
+    size_type j = index-segment_base(k);
+    return static_cast<T*>(my_segment[k].array)[j];
+}
+
+template<typename T>
+void concurrent_vector<T>::initialize_array( void* begin, size_type n ) {
+    T* array = static_cast<T*>(begin);
+    for( size_type j=0; j<n; ++j )
+        new( &array[j] ) T();
+}
+
+template<typename T>
+void concurrent_vector<T>::copy_array( void* dst, const void* src, size_type n ) {
+    T* d = static_cast<T*>(dst);
+    const T* s = static_cast<const T*>(src);
+    for( size_type j=0; j<n; ++j )
+        new( &d[j] ) T(s[j]);
+}
+
+template<typename T>
+void concurrent_vector<T>::assign_array( void* dst, const void* src, size_type n ) {
+    T* d = static_cast<T*>(dst);
+    const T* s = static_cast<const T*>(src);
+    for( size_type j=0; j<n; ++j )
+        d[j] = s[j];
+}
+
+template<typename T>
+void concurrent_vector<T>::destroy_array( void* begin, size_type n ) {
+    T* array = static_cast<T*>(begin);
+    for( size_type j=n; j>0; --j )
+        array[j-1].~T();
+}
+
+} // namespace tbb
+
+#endif /* __TBB_concurrent_vector_H */
diff --git a/dep/tbb/src/old/spin_rw_mutex_v2.cpp b/dep/tbb/src/old/spin_rw_mutex_v2.cpp
new file mode 100644
index 000000000..9067b0957
--- /dev/null
+++ b/dep/tbb/src/old/spin_rw_mutex_v2.cpp
@@ -0,0 +1,166 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "spin_rw_mutex_v2.h"
+#include "tbb/tbb_machine.h"
+#include "../tbb/itt_notify.h"
+
+namespace tbb {
+
+using namespace internal;
+
+static inline bool CAS(volatile uintptr &addr, uintptr newv, uintptr oldv) {
+    return __TBB_CompareAndSwapW((volatile void *)&addr, (intptr)newv, (intptr)oldv) == (intptr)oldv;
+}
+
+//! Signal that write lock is released
+void spin_rw_mutex::internal_itt_releasing(spin_rw_mutex *mutex) {
+    ITT_NOTIFY(sync_releasing, mutex);
+#if !DO_ITT_NOTIFY
+    (void)mutex;
+#endif
+}
+
+bool spin_rw_mutex::internal_acquire_writer(spin_rw_mutex *mutex)
+{
+    ITT_NOTIFY(sync_prepare, mutex);
+    atomic_backoff backoff;
+    for(;;) {
+        state_t s = mutex->state;
+        if( !(s & BUSY) ) { // no readers, no writers
+            if( CAS(mutex->state, WRITER, s) )
+                break; // successfully stored writer flag
+            backoff.reset(); // we could be very close to complete op.
+        } else if( !(s & WRITER_PENDING) ) { // no pending writers
+            __TBB_AtomicOR(&mutex->state, WRITER_PENDING);
+        }
+        backoff.pause();
+    }
+    ITT_NOTIFY(sync_acquired, mutex);
+    __TBB_ASSERT( (mutex->state & BUSY)==WRITER, "invalid state of a write lock" );
+    return false;
+}
+
+//! Signal that write lock is released
+void spin_rw_mutex::internal_release_writer(spin_rw_mutex *mutex) {
+    __TBB_ASSERT( (mutex->state & BUSY)==WRITER, "invalid state of a write lock" );
+    ITT_NOTIFY(sync_releasing, mutex);
+    mutex->state = 0; 
+}
+
+//! Acquire lock on given mutex.
+void spin_rw_mutex::internal_acquire_reader(spin_rw_mutex *mutex) {
+    ITT_NOTIFY(sync_prepare, mutex);
+    atomic_backoff backoff;
+    for(;;) {
+        state_t s = mutex->state;
+        if( !(s & (WRITER|WRITER_PENDING)) ) { // no writer or write requests
+            if( CAS(mutex->state, s+ONE_READER, s) )
+                break; // successfully stored increased number of readers
+            backoff.reset(); // we could be very close to complete op.
+        }
+        backoff.pause();
+    }
+    ITT_NOTIFY(sync_acquired, mutex);
+    __TBB_ASSERT( mutex->state & READERS, "invalid state of a read lock: no readers" );
+    __TBB_ASSERT( !(mutex->state & WRITER), "invalid state of a read lock: active writer" );
+}
+
+//! Upgrade reader to become a writer.
+/** Returns true if the upgrade happened without re-acquiring the lock and false if opposite */
+bool spin_rw_mutex::internal_upgrade(spin_rw_mutex *mutex) {
+    state_t s = mutex->state;
+    __TBB_ASSERT( s & READERS, "invalid state before upgrade: no readers " );
+    __TBB_ASSERT( !(s & WRITER), "invalid state before upgrade: active writer " );
+    // check and set writer-pending flag
+    // required conditions: either no pending writers, or we are the only reader
+    // (with multiple readers and pending writer, another upgrade could have been requested)
+    while( (s & READERS)==ONE_READER || !(s & WRITER_PENDING) ) {
+        if( CAS(mutex->state, s | WRITER_PENDING, s) )
+        {
+            atomic_backoff backoff;
+            ITT_NOTIFY(sync_prepare, mutex);
+            while( (mutex->state & READERS) != ONE_READER ) // more than 1 reader
+                backoff.pause();
+            // the state should be 0...0110, i.e. 1 reader and waiting writer;
+            // both new readers and writers are blocked
+            __TBB_ASSERT(mutex->state == (ONE_READER | WRITER_PENDING),"invalid state when upgrading to writer");
+            mutex->state = WRITER;
+            ITT_NOTIFY(sync_acquired, mutex);
+            __TBB_ASSERT( (mutex->state & BUSY) == WRITER, "invalid state after upgrade" );
+            return true; // successfully upgraded
+        } else {
+            s = mutex->state; // re-read
+        }
+    }
+    // slow reacquire
+    internal_release_reader(mutex);
+    return internal_acquire_writer(mutex); // always returns false
+}
+
+void spin_rw_mutex::internal_downgrade(spin_rw_mutex *mutex) {
+    __TBB_ASSERT( (mutex->state & BUSY) == WRITER, "invalid state before downgrade" );
+    ITT_NOTIFY(sync_releasing, mutex);
+    mutex->state = ONE_READER;
+    __TBB_ASSERT( mutex->state & READERS, "invalid state after downgrade: no readers" );
+    __TBB_ASSERT( !(mutex->state & WRITER), "invalid state after downgrade: active writer" );
+}
+
+void spin_rw_mutex::internal_release_reader(spin_rw_mutex *mutex)
+{
+    __TBB_ASSERT( mutex->state & READERS, "invalid state of a read lock: no readers" );
+    __TBB_ASSERT( !(mutex->state & WRITER), "invalid state of a read lock: active writer" );
+    ITT_NOTIFY(sync_releasing, mutex); // release reader
+    __TBB_FetchAndAddWrelease((volatile void *)&(mutex->state),-(intptr)ONE_READER);
+}
+
+bool spin_rw_mutex::internal_try_acquire_writer( spin_rw_mutex * mutex )
+{
+// for a writer: only possible to acquire if no active readers or writers
+    state_t s = mutex->state; // on Itanium, this volatile load has acquire semantic
+    if( !(s & BUSY) ) // no readers, no writers; mask is 1..1101
+        if( CAS(mutex->state, WRITER, s) ) {
+            ITT_NOTIFY(sync_acquired, mutex);
+            return true; // successfully stored writer flag
+        }
+    return false;
+}
+
+bool spin_rw_mutex::internal_try_acquire_reader( spin_rw_mutex * mutex )
+{
+// for a reader: acquire if no active or waiting writers
+    state_t s = mutex->state;    // on Itanium, a load of volatile variable has acquire semantic
+    while( !(s & (WRITER|WRITER_PENDING)) ) // no writers
+        if( CAS(mutex->state, s+ONE_READER, s) ) {
+            ITT_NOTIFY(sync_acquired, mutex);
+            return true; // successfully stored increased number of readers
+        }
+    return false;
+}
+
+} // namespace tbb
diff --git a/dep/tbb/src/old/spin_rw_mutex_v2.h b/dep/tbb/src/old/spin_rw_mutex_v2.h
new file mode 100644
index 000000000..3285e8ee5
--- /dev/null
+++ b/dep/tbb/src/old/spin_rw_mutex_v2.h
@@ -0,0 +1,185 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_spin_rw_mutex_H
+#define __TBB_spin_rw_mutex_H
+
+#include "tbb/tbb_stddef.h"
+
+namespace tbb {
+
+//! Fast, unfair, spinning reader-writer lock with backoff and writer-preference
+/** @ingroup synchronization */
+class spin_rw_mutex {
+    //! @cond INTERNAL
+
+    //! Present so that 1.0 headers work with 1.1 dynamic library.
+    static void __TBB_EXPORTED_FUNC internal_itt_releasing(spin_rw_mutex *);
+
+    //! Internal acquire write lock.
+    static bool __TBB_EXPORTED_FUNC internal_acquire_writer(spin_rw_mutex *);
+
+    //! Out of line code for releasing a write lock.  
+    /** This code is has debug checking and instrumentation for Intel(R) Thread Checker and Intel(R) Thread Profiler. */
+    static void __TBB_EXPORTED_FUNC internal_release_writer(spin_rw_mutex *);
+
+    //! Internal acquire read lock.
+    static void __TBB_EXPORTED_FUNC internal_acquire_reader(spin_rw_mutex *);
+
+    //! Internal upgrade reader to become a writer.
+    static bool __TBB_EXPORTED_FUNC internal_upgrade(spin_rw_mutex *);
+
+    //! Out of line code for downgrading a writer to a reader.   
+    /** This code is has debug checking and instrumentation for Intel(R) Thread Checker and Intel(R) Thread Profiler. */
+    static void __TBB_EXPORTED_FUNC internal_downgrade(spin_rw_mutex *);
+
+    //! Internal release read lock.
+    static void __TBB_EXPORTED_FUNC internal_release_reader(spin_rw_mutex *);
+
+    //! Internal try_acquire write lock.
+    static bool __TBB_EXPORTED_FUNC internal_try_acquire_writer(spin_rw_mutex *);
+
+    //! Internal try_acquire read lock.
+    static bool __TBB_EXPORTED_FUNC internal_try_acquire_reader(spin_rw_mutex *);
+
+    //! @endcond
+public:
+    //! Construct unacquired mutex.
+    spin_rw_mutex() : state(0) {}
+
+#if TBB_DO_ASSERT
+    //! Destructor asserts if the mutex is acquired, i.e. state is zero.
+    ~spin_rw_mutex() {
+        __TBB_ASSERT( !state, "destruction of an acquired mutex");
+    };
+#endif /* TBB_DO_ASSERT */
+
+    //! The scoped locking pattern
+    /** It helps to avoid the common problem of forgetting to release lock.
+        It also nicely provides the "node" for queuing locks. */
+    class scoped_lock : private internal::no_copy {
+    public:
+        //! Construct lock that has not acquired a mutex.
+        /** Equivalent to zero-initialization of *this. */
+        scoped_lock() : mutex(NULL) {}
+
+        //! Acquire lock on given mutex.
+        /** Upon entry, *this should not be in the "have acquired a mutex" state. */
+        scoped_lock( spin_rw_mutex& m, bool write = true ) : mutex(NULL) {
+            acquire(m, write);
+        }
+
+        //! Release lock (if lock is held).
+        ~scoped_lock() {
+            if( mutex ) release();
+        }
+
+        //! Acquire lock on given mutex.
+        void acquire( spin_rw_mutex& m, bool write = true ) {
+            __TBB_ASSERT( !mutex, "holding mutex already" );
+            is_writer = write; 
+            mutex = &m;
+            if( write ) internal_acquire_writer(mutex);
+            else        internal_acquire_reader(mutex);
+        }
+
+        //! Upgrade reader to become a writer.
+        /** Returns true if the upgrade happened without re-acquiring the lock and false if opposite */
+        bool upgrade_to_writer() {
+            __TBB_ASSERT( mutex, "lock is not acquired" );
+            __TBB_ASSERT( !is_writer, "not a reader" );
+            is_writer = true; 
+            return internal_upgrade(mutex);
+        }
+
+        //! Release lock.
+        void release() {
+            __TBB_ASSERT( mutex, "lock is not acquired" );
+            spin_rw_mutex *m = mutex; 
+            mutex = NULL;
+            if( is_writer ) {
+#if TBB_DO_THREADING_TOOLS||TBB_DO_ASSERT
+                internal_release_writer(m);
+#else
+                m->state = 0; 
+#endif /* TBB_DO_THREADING_TOOLS||TBB_DO_ASSERT */
+            } else {
+                internal_release_reader(m);
+            }
+        };
+
+        //! Downgrade writer to become a reader.
+        bool downgrade_to_reader() {
+#if TBB_DO_THREADING_TOOLS||TBB_DO_ASSERT
+            __TBB_ASSERT( mutex, "lock is not acquired" );
+            __TBB_ASSERT( is_writer, "not a writer" );
+            internal_downgrade(mutex);
+#else
+            mutex->state = 4; // Bit 2 - reader, 00..00100
+#endif
+            is_writer = false;
+
+            return true;
+        }
+
+        //! Try acquire lock on given mutex.
+        bool try_acquire( spin_rw_mutex& m, bool write = true ) {
+            __TBB_ASSERT( !mutex, "holding mutex already" );
+            bool result;
+            is_writer = write; 
+            result = write? internal_try_acquire_writer(&m)
+                          : internal_try_acquire_reader(&m);
+            if( result ) mutex = &m;
+            return result;
+        }
+
+    private:
+        //! The pointer to the current mutex that is held, or NULL if no mutex is held.
+        spin_rw_mutex* mutex;
+
+        //! True if holding a writer lock, false if holding a reader lock.
+        /** Not defined if not holding a lock. */
+        bool is_writer;
+    };
+
+private:
+    typedef internal::uintptr state_t;
+    static const state_t WRITER = 1;
+    static const state_t WRITER_PENDING = 2;
+    static const state_t READERS = ~(WRITER | WRITER_PENDING);
+    static const state_t ONE_READER = 4;
+    static const state_t BUSY = WRITER | READERS;
+    /** Bit 0 = writer is holding lock
+        Bit 1 = request by a writer to acquire lock (hint to readers to wait)
+        Bit 2..N = number of readers holding lock */
+    volatile state_t state;
+};
+
+} // namespace ThreadingBuildingBlocks
+
+#endif /* __TBB_spin_rw_mutex_H */
diff --git a/dep/tbb/src/old/test_concurrent_queue_v2.cpp b/dep/tbb/src/old/test_concurrent_queue_v2.cpp
new file mode 100644
index 000000000..4443b592c
--- /dev/null
+++ b/dep/tbb/src/old/test_concurrent_queue_v2.cpp
@@ -0,0 +1,361 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "tbb/concurrent_queue.h"
+#include "tbb/atomic.h"
+#include "tbb/tick_count.h"
+
+#include "../test/harness_assert.h"
+#include "../test/harness.h"
+
+static tbb::atomic<long> FooConstructed;
+static tbb::atomic<long> FooDestroyed;
+
+class Foo {
+    enum state_t{
+        LIVE=0x1234,
+        DEAD=0xDEAD
+    };
+    state_t state;
+public:
+    int thread_id;
+    int serial;
+    Foo() : state(LIVE) {
+        ++FooConstructed;
+    }
+    Foo( const Foo& item ) : state(LIVE) {
+        ASSERT( item.state==LIVE, NULL );
+        ++FooConstructed;
+        thread_id = item.thread_id;
+        serial = item.serial;
+    }
+    ~Foo() {
+        ASSERT( state==LIVE, NULL );
+        ++FooDestroyed;
+        state=DEAD;
+        thread_id=0xDEAD;
+        serial=0xDEAD;
+    }
+    void operator=( Foo& item ) {
+        ASSERT( item.state==LIVE, NULL );
+        ASSERT( state==LIVE, NULL );
+        thread_id = item.thread_id;
+        serial = item.serial;
+    }
+    bool is_const() {return false;}
+    bool is_const() const {return true;}
+};
+
+const size_t MAXTHREAD = 256;
+
+static int Sum[MAXTHREAD];
+
+//! Count of various pop operations
+/** [0] = pop_if_present that failed
+    [1] = pop_if_present that succeeded
+    [2] = pop */
+static tbb::atomic<long> PopKind[3];
+
+const int M = 10000;
+
+struct Body {
+    tbb::concurrent_queue<Foo>* queue;
+    const int nthread;
+    Body( int nthread_ ) : nthread(nthread_) {}
+    void operator()( long thread_id ) const {
+        long pop_kind[3] = {0,0,0};
+        int serial[MAXTHREAD+1];
+        memset( serial, 0, nthread*sizeof(unsigned) );
+        ASSERT( thread_id<nthread, NULL );
+
+        long sum = 0;
+        for( long j=0; j<M; ++j ) {
+            Foo f;
+            f.thread_id = 0xDEAD;
+            f.serial = 0xDEAD;
+            bool prepopped = false;
+            if( j&1 ) {
+                prepopped = queue->pop_if_present(f);
+                ++pop_kind[prepopped];
+            }
+            Foo g;
+            g.thread_id = thread_id;
+            g.serial = j+1;
+            queue->push( g );
+            if( !prepopped ) {
+                queue->pop(f);
+                ++pop_kind[2];
+            }
+            ASSERT( f.thread_id<=nthread, NULL );
+            ASSERT( f.thread_id==nthread || serial[f.thread_id]<f.serial, "partial order violation" );
+            serial[f.thread_id] = f.serial;
+            sum += f.serial-1;
+        }
+        Sum[thread_id] = sum;
+        for( int k=0; k<3; ++k )
+            PopKind[k] += pop_kind[k];
+    }
+};
+
+void TestPushPop( int prefill, ptrdiff_t capacity, int nthread ) {
+    ASSERT( nthread>0, "nthread must be positive" );
+    if( prefill+1>=capacity )
+        return;
+    bool success = false;
+    for( int k=0; k<3; ++k )
+        PopKind[k] = 0;
+    for( int trial=0; !success; ++trial ) {
+        FooConstructed = 0;
+        FooDestroyed = 0;
+        Body body(nthread);
+        tbb::concurrent_queue<Foo> queue;
+        queue.set_capacity( capacity );
+        body.queue = &queue;
+        for( int i=0; i<prefill; ++i ) {
+            Foo f;
+            f.thread_id = nthread;
+            f.serial = 1+i;
+            queue.push(f);
+            ASSERT( queue.size()==i+1, NULL );
+            ASSERT( !queue.empty(), NULL );
+        }
+        tbb::tick_count t0 = tbb::tick_count::now();
+        NativeParallelFor( nthread, body );
+        tbb::tick_count t1 = tbb::tick_count::now();
+        double timing = (t1-t0).seconds();
+        if( Verbose )
+            printf("prefill=%d capacity=%d time = %g = %g nsec/operation\n", prefill, int(capacity), timing, timing/(2*M*nthread)*1.E9);
+        int sum = 0;
+        for( int k=0; k<nthread; ++k )
+            sum += Sum[k];
+        int expected = nthread*((M-1)*M/2) + ((prefill-1)*prefill)/2;
+        for( int i=prefill; --i>=0; ) {
+            ASSERT( !queue.empty(), NULL );
+            Foo f;
+            queue.pop(f);
+            ASSERT( queue.size()==i, NULL );
+            sum += f.serial-1;
+        }
+        ASSERT( queue.empty(), NULL );
+        ASSERT( queue.size()==0, NULL );
+        if( sum!=expected )
+            printf("sum=%d expected=%d\n",sum,expected);
+        ASSERT( FooConstructed==FooDestroyed, NULL );
+
+        success = true;
+        if( nthread>1 && prefill==0 ) {
+            // Check that pop_if_present got sufficient exercise
+            for( int k=0; k<2; ++k ) {
+#if (_WIN32||_WIN64)
+                // The TBB library on Windows seems to have a tough time generating
+                // the desired interleavings for pop_if_present, so the code tries longer, and settles
+                // for fewer desired interleavings.
+                const int max_trial = 100;
+                const int min_requirement = 20;
+#else
+                const int min_requirement = 100;
+                const int max_trial = 20;
+#endif /* _WIN32||_WIN64 */
+                if( PopKind[k]<min_requirement ) {
+                    if( trial>=max_trial ) {
+                        if( Verbose )
+                            printf("Warning: %d threads had only %ld pop_if_present operations %s after %d trials (expected at least %d). "
+                                    "This problem may merely be unlucky scheduling. "
+                                    "Investigate only if it happens repeatedly.\n",
+                                    nthread, long(PopKind[k]), k==0?"failed":"succeeded", max_trial, min_requirement);
+                        else
+                            printf("Warning: the number of %s pop_if_present operations is less than expected for %d threads. Investigate if it happens repeatedly.\n",
+                                   k==0?"failed":"succeeded", nthread );
+                    } else {
+                        success = false;
+                    }
+               }
+            }
+        }
+    }
+}
+
+template<typename Iterator1, typename Iterator2>
+void TestIteratorAux( Iterator1 i, Iterator2 j, int size ) {
+    // Now test iteration
+    Iterator1 old_i;
+    for( int k=0; k<size; ++k ) {
+        ASSERT( i!=j, NULL );
+        ASSERT( !(i==j), NULL );
+        Foo f;
+        if( k&1 ) {
+            // Test pre-increment
+            f = *old_i++;
+            // Test assignment
+            i = old_i;
+        } else {
+            // Test post-increment
+            f=*i++;
+            if( k<size-1 ) {
+                // Test "->"
+                ASSERT( k+2==i->serial, NULL );
+            }
+            // Test assignment
+            old_i = i;
+        }
+        ASSERT( k+1==f.serial, NULL );
+    }
+    ASSERT( !(i!=j), NULL );
+    ASSERT( i==j, NULL );
+}
+
+template<typename Iterator1, typename Iterator2>
+void TestIteratorAssignment( Iterator2 j ) {
+    Iterator1 i(j);
+    ASSERT( i==j, NULL );
+    ASSERT( !(i!=j), NULL );
+    Iterator1 k;
+    k = j;
+    ASSERT( k==j, NULL );
+    ASSERT( !(k!=j), NULL );
+}
+
+//! Test the iterators for concurrent_queue
+void TestIterator() {
+    tbb::concurrent_queue<Foo> queue;
+    tbb::concurrent_queue<Foo>& const_queue = queue;
+    for( int j=0; j<500; ++j ) {
+        TestIteratorAux( queue.begin(), queue.end(), j );
+        TestIteratorAux( const_queue.begin(), const_queue.end(), j );
+        TestIteratorAux( const_queue.begin(), queue.end(), j );
+        TestIteratorAux( queue.begin(), const_queue.end(), j );
+        Foo f;
+        f.serial = j+1;
+        queue.push(f);
+    }
+    TestIteratorAssignment<tbb::concurrent_queue<Foo>::const_iterator>( const_queue.begin() );
+    TestIteratorAssignment<tbb::concurrent_queue<Foo>::const_iterator>( queue.begin() );
+    TestIteratorAssignment<tbb::concurrent_queue<Foo>::iterator>( queue.begin() );
+}
+
+void TestConcurrenetQueueType() {
+    AssertSameType( tbb::concurrent_queue<Foo>::value_type(), Foo() );
+    Foo f;
+    const Foo g;
+    tbb::concurrent_queue<Foo>::reference r = f;
+    ASSERT( &r==&f, NULL );
+    ASSERT( !r.is_const(), NULL );
+    tbb::concurrent_queue<Foo>::const_reference cr = g;
+    ASSERT( &cr==&g, NULL );
+    ASSERT( cr.is_const(), NULL );
+}
+
+template<typename T>
+void TestEmptyQueue() {
+    const tbb::concurrent_queue<T> queue;
+    ASSERT( queue.size()==0, NULL );
+    ASSERT( queue.capacity()>0, NULL );
+    ASSERT( size_t(queue.capacity())>=size_t(-1)/(sizeof(void*)+sizeof(T)), NULL );
+}
+
+void TestFullQueue() {
+    for( int n=0; n<10; ++n ) {
+        FooConstructed = 0;
+        FooDestroyed = 0;
+        tbb::concurrent_queue<Foo> queue;
+        queue.set_capacity(n);
+        for( int i=0; i<=n; ++i ) {
+            Foo f;
+            f.serial = i;
+            bool result = queue.push_if_not_full( f );
+            ASSERT( result==(i<n), NULL );
+        }
+        for( int i=0; i<=n; ++i ) {
+            Foo f;
+            bool result = queue.pop_if_present( f );
+            ASSERT( result==(i<n), NULL );
+            ASSERT( !result || f.serial==i, NULL );
+        }
+        ASSERT( FooConstructed==FooDestroyed, NULL );
+    }
+}
+
+template<typename T>
+struct TestNegativeQueueBody {
+    tbb::concurrent_queue<T>& queue;
+    const int nthread;
+    TestNegativeQueueBody( tbb::concurrent_queue<T>& q, int n ) : queue(q), nthread(n) {}
+    void operator()( int k ) const {
+        if( k==0 ) {
+            int number_of_pops = nthread-1;
+            // Wait for all pops to pend.
+            while( queue.size()>-number_of_pops ) {
+                __TBB_Yield();
+            }
+            for( int i=0; ; ++i ) {
+                ASSERT( queue.size()==i-number_of_pops, NULL );
+                ASSERT( queue.empty()==(queue.size()<=0), NULL );
+                if( i==number_of_pops ) break;
+                // Satisfy another pop
+                queue.push( T() );
+            }
+        } else {
+            // Pop item from queue
+            T item;
+            queue.pop(item);
+        }
+    }
+};
+
+//! Test a queue with a negative size.
+template<typename T>
+void TestNegativeQueue( int nthread ) {
+    tbb::concurrent_queue<T> queue;
+    NativeParallelFor( nthread, TestNegativeQueueBody<T>(queue,nthread) );
+}
+
+int main( int argc, char* argv[] ) {
+    // Set default for minimum number of threads.
+    MinThread = 1;
+    ParseCommandLine(argc,argv);
+
+    TestEmptyQueue<char>();
+    TestEmptyQueue<Foo>();
+    TestFullQueue();
+    TestConcurrenetQueueType();
+    TestIterator();
+
+    // Test concurrent operations
+    for( int nthread=MinThread; nthread<=MaxThread; ++nthread ) {
+        TestNegativeQueue<Foo>(nthread);
+        for( int prefill=0; prefill<64; prefill+=(1+prefill/3) ) {
+            TestPushPop(prefill,ptrdiff_t(-1),nthread);
+            TestPushPop(prefill,ptrdiff_t(1),nthread);
+            TestPushPop(prefill,ptrdiff_t(2),nthread);
+            TestPushPop(prefill,ptrdiff_t(10),nthread);
+            TestPushPop(prefill,ptrdiff_t(100),nthread);
+        }
+    }
+    printf("done\n");
+    return 0;
+}
diff --git a/dep/tbb/src/old/test_concurrent_vector_v2.cpp b/dep/tbb/src/old/test_concurrent_vector_v2.cpp
new file mode 100644
index 000000000..68a73115b
--- /dev/null
+++ b/dep/tbb/src/old/test_concurrent_vector_v2.cpp
@@ -0,0 +1,570 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "concurrent_vector_v2.h"
+#include <cstdio>
+#include <cstdlib>
+#include "../test/harness_assert.h"
+
+tbb::atomic<long> FooCount;
+
+//! Problem size
+const size_t N = 500000;
+
+struct Foo {
+    int my_bar;
+public:
+    enum State {
+        DefaultInitialized=0x1234,
+        CopyInitialized=0x89ab,
+        Destroyed=0x5678
+    } state;
+    int& bar() {
+        ASSERT( state==DefaultInitialized||state==CopyInitialized, NULL );
+        return my_bar;
+    }
+    int bar() const {
+        ASSERT( state==DefaultInitialized||state==CopyInitialized, NULL );
+        return my_bar;
+    }
+    static const int initial_value_of_bar = 42;
+    Foo() {
+        state = DefaultInitialized;
+        ++FooCount;
+        my_bar = initial_value_of_bar;
+    }
+    Foo( const Foo& foo ) {
+        state = CopyInitialized;
+        ++FooCount;
+        my_bar = foo.my_bar;
+    }
+    ~Foo() {
+        ASSERT( state==DefaultInitialized||state==CopyInitialized, NULL );
+        state = Destroyed;
+        my_bar = ~initial_value_of_bar;
+        --FooCount;
+    }
+    bool is_const() const {return true;}
+    bool is_const() {return false;}
+};
+
+class FooWithAssign: public Foo {
+public:
+    void operator=( const FooWithAssign& x ) {
+        ASSERT( x.state==DefaultInitialized||x.state==CopyInitialized, NULL );
+        ASSERT( state==DefaultInitialized||state==CopyInitialized, NULL );
+        my_bar = x.my_bar;
+    } 
+};
+
+inline void NextSize( int& s ) {
+    if( s<=32 ) ++s;
+    else s += s/10;     
+}
+
+static void CheckVector( const tbb::concurrent_vector<Foo>& cv, size_t expected_size, size_t old_size ) {
+    ASSERT( cv.size()==expected_size, NULL );
+    ASSERT( cv.empty()==(expected_size==0), NULL );
+    for( int j=0; j<int(expected_size); ++j ) {
+        if( cv[j].bar()!=~j )
+            std::printf("ERROR on line %d for old_size=%ld expected_size=%ld j=%d\n",__LINE__,long(old_size),long(expected_size),j);
+    }
+}
+
+void TestResizeAndCopy() {
+    typedef tbb::concurrent_vector<Foo> vector_t;
+    for( int old_size=0; old_size<=128; NextSize( old_size ) ) {
+        for( int new_size=old_size; new_size<=128; NextSize( new_size ) ) {
+            long count = FooCount;
+            vector_t v;
+            ASSERT( count==FooCount, NULL );
+            v.grow_by(old_size);
+            ASSERT( count+old_size==FooCount, NULL );
+            for( int j=0; j<old_size; ++j )
+                v[j].bar() = j*j;
+            v.grow_to_at_least(new_size);
+            ASSERT( count+new_size==FooCount, NULL );
+            for( int j=0; j<new_size; ++j ) {
+                int expected = j<old_size ? j*j : Foo::initial_value_of_bar;
+                if( v[j].bar()!=expected ) 
+                    std::printf("ERROR on line %d for old_size=%ld new_size=%ld v[%ld].bar()=%d != %d\n",__LINE__,long(old_size),long(new_size),long(j),v[j].bar(), expected);
+            }
+            ASSERT( v.size()==size_t(new_size), NULL );
+            for( int j=0; j<new_size; ++j ) {
+                v[j].bar() = ~j;
+            }
+            const vector_t& cv = v;
+            // Try copy constructor
+            vector_t copy_of_v(cv);
+            CheckVector(cv,new_size,old_size);
+            v.clear();
+            ASSERT( v.empty(), NULL );
+            CheckVector(copy_of_v,new_size,old_size);
+        }
+    }
+}
+
+void TestCapacity() {
+    for( size_t old_size=0; old_size<=10000; old_size=(old_size<5 ? old_size+1 : 3*old_size) ) {
+        for( size_t new_size=0; new_size<=10000; new_size=(new_size<5 ? new_size+1 : 3*new_size) ) {
+            long count = FooCount; 
+            {
+                typedef tbb::concurrent_vector<Foo> vector_t;
+                vector_t v;
+                v.reserve( old_size );
+                ASSERT( v.capacity()>=old_size, NULL );
+                v.reserve( new_size );
+                ASSERT( v.capacity()>=old_size, NULL );
+                ASSERT( v.capacity()>=new_size, NULL );
+                for( size_t i=0; i<2*new_size; ++i ) {
+                    ASSERT( size_t(FooCount)==count+i, NULL );
+                    size_t j = v.grow_by(1);
+                    ASSERT( j==i, NULL );
+                }
+            }
+            ASSERT( FooCount==count, NULL );
+        }
+    } 
+}
+
+struct AssignElement {
+    typedef tbb::concurrent_vector<int>::range_type::iterator iterator;
+    iterator base;
+    void operator()( const tbb::concurrent_vector<int>::range_type& range ) const {
+        for( iterator i=range.begin(); i!=range.end(); ++i ) {
+            if( *i!=0 )
+                std::printf("ERROR for v[%ld]\n", long(i-base));
+            *i = int(i-base);
+        }
+    }
+    AssignElement( iterator base_ ) : base(base_) {}
+};
+
+struct CheckElement {
+    typedef tbb::concurrent_vector<int>::const_range_type::iterator iterator;
+    iterator base;
+    void operator()( const tbb::concurrent_vector<int>::const_range_type& range ) const {
+        for( iterator i=range.begin(); i!=range.end(); ++i )
+            if( *i != int(i-base) )
+                std::printf("ERROR for v[%ld]\n", long(i-base));
+    }
+    CheckElement( iterator base_ ) : base(base_) {}
+};
+
+#include "tbb/tick_count.h"
+#include "tbb/parallel_for.h"
+#include "../test/harness.h"
+
+void TestParallelFor( int nthread ) {
+    typedef tbb::concurrent_vector<int> vector_t;
+    vector_t v;
+    v.grow_to_at_least(N);  
+    tbb::tick_count t0 = tbb::tick_count::now();
+    if( Verbose )
+        std::printf("Calling parallel_for.h with %ld threads\n",long(nthread));
+    tbb::parallel_for( v.range(10000), AssignElement(v.begin()) );
+    tbb::tick_count t1 = tbb::tick_count::now();
+    const vector_t& u = v;      
+    tbb::parallel_for( u.range(10000), CheckElement(u.begin()) );
+    tbb::tick_count t2 = tbb::tick_count::now();
+    if( Verbose )
+        std::printf("Time for parallel_for.h: assign time = %8.5f, check time = %8.5f\n",
+               (t1-t0).seconds(),(t2-t1).seconds());
+    for( long i=0; size_t(i)<v.size(); ++i )
+        if( v[i]!=i )
+            std::printf("ERROR for v[%ld]\n", i);
+}
+
+template<typename Iterator1, typename Iterator2>
+void TestIteratorAssignment( Iterator2 j ) {
+    Iterator1 i(j);
+    ASSERT( i==j, NULL );
+    ASSERT( !(i!=j), NULL );
+    Iterator1 k;
+    k = j;
+    ASSERT( k==j, NULL );
+    ASSERT( !(k!=j), NULL );
+}
+
+template<typename Iterator, typename T>
+void TestIteratorTraits() {
+    AssertSameType( static_cast<typename Iterator::difference_type*>(0), static_cast<ptrdiff_t*>(0) ); 
+    AssertSameType( static_cast<typename Iterator::value_type*>(0), static_cast<T*>(0) ); 
+    AssertSameType( static_cast<typename Iterator::pointer*>(0), static_cast<T**>(0) ); 
+    AssertSameType( static_cast<typename Iterator::iterator_category*>(0), static_cast<std::random_access_iterator_tag*>(0) );
+    T x;
+    typename Iterator::reference xr = x;
+    typename Iterator::pointer xp = &x;
+    ASSERT( &xr==xp, NULL );
+}
+
+template<typename Vector, typename Iterator>
+void CheckConstIterator( const Vector& u, int i, const Iterator& cp ) {
+    typename Vector::const_reference pref = *cp;
+    if( pref.bar()!=i )
+        std::printf("ERROR for u[%ld] using const_iterator\n", long(i));
+    typename Vector::difference_type delta = cp-u.begin();
+    ASSERT( delta==i, NULL );
+    if( u[i].bar()!=i )
+        std::printf("ERROR for u[%ld] using subscripting\n", long(i));
+    ASSERT( u.begin()[i].bar()==i, NULL );
+}
+
+template<typename Iterator1, typename Iterator2, typename V> 
+void CheckIteratorComparison( V& u ) {
+    Iterator1 i = u.begin();
+    for( int i_count=0; i_count<100; ++i_count ) {
+        Iterator2 j = u.begin();
+        for( int j_count=0; j_count<100; ++j_count ) {
+            ASSERT( (i==j)==(i_count==j_count), NULL );
+            ASSERT( (i!=j)==(i_count!=j_count), NULL );
+            ASSERT( (i-j)==(i_count-j_count), NULL );
+            ASSERT( (i<j)==(i_count<j_count), NULL );
+            ASSERT( (i>j)==(i_count>j_count), NULL );
+            ASSERT( (i<=j)==(i_count<=j_count), NULL );
+            ASSERT( (i>=j)==(i_count>=j_count), NULL );
+            ++j;
+        }
+        ++i;
+    }
+}
+
+//! Test sequential iterators for vector type V.
+/** Also does timing. */
+template<typename V>
+void TestSequentialFor() {
+    V v;
+    v.grow_by(N);
+
+    // Check iterator 
+    tbb::tick_count t0 = tbb::tick_count::now();
+    typename V::iterator p = v.begin();
+    ASSERT( !(*p).is_const(), NULL );
+    ASSERT( !p->is_const(), NULL );
+    for( int i=0; size_t(i)<v.size(); ++i, ++p ) {
+        if( (*p).state!=Foo::DefaultInitialized )
+            std::printf("ERROR for v[%ld]\n", long(i));
+        typename V::reference pref = *p;
+        pref.bar() = i;
+        typename V::difference_type delta = p-v.begin();
+        ASSERT( delta==i, NULL );
+        ASSERT( -delta<=0, "difference type not signed?" );
+    }
+    tbb::tick_count t1 = tbb::tick_count::now();
+    
+    // Check const_iterator going forwards
+    const V& u = v;     
+    typename V::const_iterator cp = u.begin();
+    ASSERT( (*cp).is_const(), NULL );
+    ASSERT( cp->is_const(), NULL );
+    for( int i=0; size_t(i)<u.size(); ++i, ++cp ) {
+        CheckConstIterator(u,i,cp);
+    }
+    tbb::tick_count t2 = tbb::tick_count::now();
+    if( Verbose )
+        std::printf("Time for serial for:  assign time = %8.5f, check time = %8.5f\n",
+               (t1-t0).seconds(),(t2-t1).seconds());
+
+    // Now go backwards
+    cp = u.end();
+    for( int i=int(u.size()); i>0; ) {
+        --i;
+        --cp;
+        if( i>0 ) {
+            typename V::const_iterator cp_old = cp--;
+            int here = (*cp_old).bar();
+            ASSERT( here==u[i].bar(), NULL );
+            typename V::const_iterator cp_new = cp++;
+            int prev = (*cp_new).bar();
+            ASSERT( prev==u[i-1].bar(), NULL );
+        }
+        CheckConstIterator(u,i,cp);
+    }
+
+    // Now go forwards and backwards
+    cp = u.begin();
+    ptrdiff_t j = 0;
+    for( size_t i=0; i<u.size(); ++i ) {
+        CheckConstIterator(u,int(j),cp);
+        typename V::difference_type delta = i*3 % u.size();
+        if( 0<=j+delta && size_t(j+delta)<u.size() ) {
+            cp += delta;
+            j += delta; 
+        } 
+        delta = i*7 % u.size();
+        if( 0<=j-delta && size_t(j-delta)<u.size() ) {
+            if( i&1 ) 
+                cp -= delta;            // Test operator-=
+            else
+                cp = cp - delta;        // Test operator-
+            j -= delta; 
+        } 
+    }
+    
+    for( int i=0; size_t(i)<u.size(); i=(i<50?i+1:i*3) )
+        for( int j=-i; size_t(i+j)<u.size(); j=(j<50?j+1:j*5) ) {
+            ASSERT( (u.begin()+i)[j].bar()==i+j, NULL );
+            ASSERT( (v.begin()+i)[j].bar()==i+j, NULL );
+            ASSERT( (i+u.begin())[j].bar()==i+j, NULL );
+            ASSERT( (i+v.begin())[j].bar()==i+j, NULL );
+        }
+
+    CheckIteratorComparison<typename V::iterator, typename V::iterator>(v);
+    CheckIteratorComparison<typename V::iterator, typename V::const_iterator>(v);
+    CheckIteratorComparison<typename V::const_iterator, typename V::iterator>(v);
+    CheckIteratorComparison<typename V::const_iterator, typename V::const_iterator>(v);
+
+    TestIteratorAssignment<typename V::const_iterator>( u.begin() );
+    TestIteratorAssignment<typename V::const_iterator>( v.begin() );
+    TestIteratorAssignment<typename V::iterator>( v.begin() );
+
+    // Check reverse_iterator 
+    typename V::reverse_iterator rp = v.rbegin();
+    for( size_t i=v.size(); i>0; --i, ++rp ) {
+        typename V::reference pref = *rp;
+        ASSERT( size_t(pref.bar())==i-1, NULL );
+        ASSERT( rp!=v.rend(), NULL );
+    }
+    ASSERT( rp==v.rend(), NULL );
+    
+    // Check const_reverse_iterator 
+    typename V::const_reverse_iterator crp = u.rbegin();
+    for( size_t i=v.size(); i>0; --i, ++crp ) {
+        typename V::const_reference cpref = *crp;
+        ASSERT( size_t(cpref.bar())==i-1, NULL );
+        ASSERT( crp!=u.rend(), NULL );
+    }
+    ASSERT( crp==u.rend(), NULL );
+
+    TestIteratorAssignment<typename V::const_reverse_iterator>( u.rbegin() );
+    TestIteratorAssignment<typename V::reverse_iterator>( v.rbegin() );
+}
+
+static const size_t Modulus = 7;
+
+typedef tbb::concurrent_vector<Foo> MyVector;
+
+class GrowToAtLeast {
+    MyVector& my_vector;
+public:
+    void operator()( const tbb::blocked_range<size_t>& range ) const {
+        for( size_t i=range.begin(); i!=range.end(); ++i ) {
+            size_t n = my_vector.size();
+            size_t k = n==0 ? 0 : i % (2*n+1);
+            my_vector.grow_to_at_least(k+1);
+            ASSERT( my_vector.size()>=k+1, NULL );
+        }
+    }
+    GrowToAtLeast( MyVector& vector ) : my_vector(vector) {}
+};
+
+void TestConcurrentGrowToAtLeast() {
+    MyVector v;
+    for( size_t s=1; s<1000; s*=10 ) {
+        tbb::parallel_for( tbb::blocked_range<size_t>(0,1000000,100), GrowToAtLeast(v) );
+    }
+}
+
+//! Test concurrent invocations of method concurrent_vector::grow_by
+class GrowBy {
+    MyVector& my_vector;
+public:
+    void operator()( const tbb::blocked_range<int>& range ) const {
+        for( int i=range.begin(); i!=range.end(); ++i ) {
+            if( i%3 ) {
+                Foo& element = my_vector[my_vector.grow_by(1)]; 
+                element.bar() = i;
+            } else {
+                Foo f;
+                f.bar() = i;
+                size_t k = my_vector.push_back( f );
+                ASSERT( my_vector[k].bar()==i, NULL );
+            }
+        }
+    }
+    GrowBy( MyVector& vector ) : my_vector(vector) {}
+};
+
+//! Test concurrent invocations of method concurrent_vector::grow_by
+void TestConcurrentGrowBy( int nthread ) {
+    int m = 100000;
+    MyVector v;
+    tbb::parallel_for( tbb::blocked_range<int>(0,m,1000), GrowBy(v) );
+    ASSERT( v.size()==size_t(m), NULL );
+
+    // Verify that v is a permutation of 0..m
+    int inversions = 0;
+    bool* found = new bool[m];
+    memset( found, 0, m );
+    for( int i=0; i<m; ++i ) {
+        int index = v[i].bar();
+        ASSERT( !found[index], NULL );
+        found[index] = true;
+        if( i>0 )
+            inversions += v[i].bar()<v[i-1].bar();
+    }
+    for( int i=0; i<m; ++i ) {
+        ASSERT( found[i], NULL );
+        ASSERT( nthread>1 || v[i].bar()==i, "sequential execution is wrong" );
+    }
+    delete[] found;
+    if( nthread>1 && inversions<m/10 )
+        std::printf("Warning: not much concurrency in TestConcurrentGrowBy\n");
+}
+
+//! Test the assignment operator
+void TestAssign() {
+    typedef tbb::concurrent_vector<FooWithAssign> vector_t;
+    for( int dst_size=1; dst_size<=128; NextSize( dst_size ) ) {
+        for( int src_size=2; src_size<=128; NextSize( src_size ) ) {
+            vector_t u;
+            u.grow_to_at_least(src_size);
+            for( int i=0; i<src_size; ++i )
+                u[i].bar() = i*i;
+            vector_t v;
+            v.grow_to_at_least(dst_size);
+            for( int i=0; i<dst_size; ++i )
+                v[i].bar() = -i;
+            v = u;
+            u.clear();
+            ASSERT( u.size()==0, NULL );
+            ASSERT( v.size()==size_t(src_size), NULL );
+            for( int i=0; i<src_size; ++i )
+                ASSERT( v[i].bar()==(i*i), NULL );
+        }
+    }    
+}
+
+//------------------------------------------------------------------------
+// Regression test for problem where on oversubscription caused
+// concurrent_vector::grow_by to run very slowly (TR#196).
+//------------------------------------------------------------------------
+
+#include "tbb/task_scheduler_init.h"
+#include <math.h>
+
+typedef unsigned long Number;
+
+static tbb::concurrent_vector<Number> Primes;
+
+class FindPrimes {
+    bool is_prime( Number val ) const {
+        int limit, factor = 3;
+        if( val<5u ) 
+            return val==2;
+        else {
+            limit = long(sqrtf(float(val))+0.5f);
+            while( factor<=limit && val % factor )
+                ++factor;
+            return factor>limit;
+        }
+    }
+public:
+    void operator()( const tbb::blocked_range<Number>& r ) const {
+        for( Number i=r.begin(); i!=r.end(); ++i ) { 
+            if( i%2 && is_prime(i) ) {
+                Primes[Primes.grow_by(1)] = i;
+            }
+        }
+    }
+};
+
+static double TimeFindPrimes( int nthread ) {
+    Primes.clear();
+    tbb::task_scheduler_init init(nthread);
+    tbb::tick_count t0 = tbb::tick_count::now();
+    tbb::parallel_for( tbb::blocked_range<Number>(0,1000000,500), FindPrimes() );
+    tbb::tick_count t1 = tbb::tick_count::now();
+    return (t1-t0).seconds();
+}
+
+static void TestFindPrimes() {
+    // Time fully subscribed run.
+    double t2 = TimeFindPrimes( tbb::task_scheduler_init::automatic );
+
+    // Time parallel run that is very likely oversubscribed.  
+    double t128 = TimeFindPrimes(128);
+
+    if( Verbose ) 
+        std::printf("TestFindPrimes: t2==%g t128=%g\n", t2, t128 );
+
+    // We allow the 128-thread run a little extra time to allow for thread overhead.
+    // Theoretically, following test will fail on machine with >128 processors.
+    // But that situation is not going to come up in the near future,
+    // and the generalization to fix the issue is not worth the trouble.
+    if( t128>1.10*t2 ) {
+        std::printf("Warning: grow_by is pathetically slow: t2==%g t128=%g\n", t2, t128);
+    } 
+}
+
+//------------------------------------------------------------------------
+// Test compatibility with STL sort.
+//------------------------------------------------------------------------
+
+#include <algorithm>
+
+void TestSort() {
+    for( int n=1; n<100; n*=3 ) {
+        tbb::concurrent_vector<int> array;
+        array.grow_by( n );
+        for( int i=0; i<n; ++i )
+            array[i] = (i*7)%n;
+        std::sort( array.begin(), array.end() );
+        for( int i=0; i<n; ++i )
+            ASSERT( array[i]==i, NULL );
+    }
+}
+
+//------------------------------------------------------------------------
+
+//! Test driver
+int main( int argc, char* argv[] ) {
+    // Test requires at least one thread.
+    MinThread = 1;
+    ParseCommandLine( argc, argv );
+    if( MinThread<1 ) {
+        std::printf("ERROR: MinThread=%d, but must be at least 1\n",MinThread);
+    }
+
+    TestIteratorTraits<tbb::concurrent_vector<Foo>::iterator,Foo>();
+    TestIteratorTraits<tbb::concurrent_vector<Foo>::const_iterator,const Foo>();
+    TestSequentialFor<tbb::concurrent_vector<Foo> > ();
+    TestResizeAndCopy();
+    TestAssign();
+    TestCapacity();
+    for( int nthread=MinThread; nthread<=MaxThread; ++nthread ) {
+        tbb::task_scheduler_init init( nthread );
+        TestParallelFor( nthread );
+        TestConcurrentGrowToAtLeast();
+        TestConcurrentGrowBy( nthread );
+    }
+    TestFindPrimes();
+    TestSort();
+    std::printf("done\n");
+    return 0;
+}
diff --git a/dep/tbb/src/old/test_mutex_v2.cpp b/dep/tbb/src/old/test_mutex_v2.cpp
new file mode 100644
index 000000000..4e2a1ef72
--- /dev/null
+++ b/dep/tbb/src/old/test_mutex_v2.cpp
@@ -0,0 +1,270 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+//------------------------------------------------------------------------
+// Test TBB mutexes when used with parallel_for.h
+//
+// Usage: test_Mutex.exe [-v] nthread
+//
+// The -v option causes timing information to be printed.
+//
+// Compile with _OPENMP and -openmp
+//------------------------------------------------------------------------
+#include "tbb/atomic.h"
+#include "tbb/blocked_range.h"
+#include "tbb/parallel_for.h"
+#include "tbb/tick_count.h"
+#include "../test/harness.h"
+#include "spin_rw_mutex_v2.h"
+#include <cstdlib>
+#include <cstdio>
+
+#if __linux__
+#define STD std
+#else
+#define STD   /* Cater to broken Windows compilers that are missing "std". */
+#endif /* __linux__ */
+
+// This test deliberately avoids a "using tbb" statement,
+// so that the error of putting types in the wrong namespace will be caught.
+
+template<typename M>
+struct Counter {
+    typedef M mutex_type;
+    M mutex;
+    volatile long value;
+};
+
+//! Function object for use with parallel_for.h.
+template<typename C>
+struct AddOne {
+    C& counter;
+    /** Increments counter once for each iteration in the iteration space. */
+    void operator()( tbb::blocked_range<size_t>& range ) const {
+        for( size_t i=range.begin(); i!=range.end(); ++i ) {
+            if( i&1 ) {
+                // Try implicit acquire and explicit release
+                typename C::mutex_type::scoped_lock lock(counter.mutex);
+                counter.value = counter.value+1;
+                lock.release();
+            } else {
+                // Try explicit acquire and implicit release
+                typename C::mutex_type::scoped_lock lock;
+                lock.acquire(counter.mutex);
+                counter.value = counter.value+1;
+            }
+        }
+    }
+    AddOne( C& counter_ ) : counter(counter_) {}
+};
+
+//! Generic test of a TBB mutex type M.
+/** Does not test features specific to reader-writer locks. */
+template<typename M>
+void Test( const char * name ) {
+    if( Verbose ) {
+        printf("%s time = ",name);
+        fflush(stdout);
+    }
+    Counter<M> counter;
+    counter.value = 0;
+    const int n = 100000;
+    tbb::tick_count t0 = tbb::tick_count::now();
+    tbb::parallel_for(tbb::blocked_range<size_t>(0,n,10000),AddOne<Counter<M> >(counter));
+    tbb::tick_count t1 = tbb::tick_count::now();
+    if( Verbose )
+        printf("%g usec\n",(t1-t0).seconds());
+    if( counter.value!=n )
+        STD::printf("ERROR for %s: counter.value=%ld\n",name,counter.value);
+}
+
+template<typename M, size_t N>
+struct Invariant {
+    typedef M mutex_type;
+    M mutex;
+    const char* mutex_name;
+    volatile long value[N];
+    volatile long single_value;
+    Invariant( const char* mutex_name_ ) :
+        mutex_name(mutex_name_)
+    {
+    single_value = 0;
+        for( size_t k=0; k<N; ++k )
+            value[k] = 0;
+    }
+    void update() {
+        for( size_t k=0; k<N; ++k )
+            ++value[k];
+    }
+    bool value_is( long expected_value ) const {
+        long tmp;
+        for( size_t k=0; k<N; ++k )
+//            if( value[k]!=expected_value )
+//                return false;
+            if( (tmp=value[k])!=expected_value ) {
+                printf("ATTN! %ld!=%ld\n", tmp, expected_value);
+                return false;
+            }
+        return true;
+    }
+    bool is_okay() {
+        return value_is( value[0] );
+    }
+};
+
+//! Function object for use with parallel_for.h.
+template<typename I>
+struct TwiddleInvariant {
+    I& invariant;
+    /** Increments counter once for each iteration in the iteration space. */
+    void operator()( tbb::blocked_range<size_t>& range ) const {
+        for( size_t i=range.begin(); i!=range.end(); ++i ) {
+            //! Every 8th access is a write access
+            bool write = (i%8)==7;
+            bool okay = true;
+            bool lock_kept = true;
+            if( (i/8)&1 ) {
+                // Try implicit acquire and explicit release
+                typename I::mutex_type::scoped_lock lock(invariant.mutex,write);
+                if( write ) {
+                    long my_value = invariant.value[0];
+                    invariant.update();
+                    if( i%16==7 ) {
+                        lock_kept = lock.downgrade_to_reader();
+                        if( !lock_kept )
+                            my_value = invariant.value[0] - 1;
+                        okay = invariant.value_is(my_value+1);
+                    }
+                } else {
+                    okay = invariant.is_okay();
+                    if( i%8==3 ) {
+                        long my_value = invariant.value[0];
+                        lock_kept = lock.upgrade_to_writer();
+                        if( !lock_kept )
+                            my_value = invariant.value[0];
+                        invariant.update();
+                        okay = invariant.value_is(my_value+1);
+                    }
+                }
+                lock.release();
+            } else {
+                // Try explicit acquire and implicit release
+                typename I::mutex_type::scoped_lock lock;
+                lock.acquire(invariant.mutex,write);
+                if( write ) {
+                    long my_value = invariant.value[0];
+                    invariant.update();
+                    if( i%16==7 ) {
+                        lock_kept = lock.downgrade_to_reader();
+                        if( !lock_kept )
+                            my_value = invariant.value[0] - 1;
+                        okay = invariant.value_is(my_value+1);
+                    }
+                } else {
+                    okay = invariant.is_okay();
+                    if( i%8==3 ) {
+                        long my_value = invariant.value[0];
+                        lock_kept = lock.upgrade_to_writer();
+                        if( !lock_kept )
+                            my_value = invariant.value[0];
+                        invariant.update();
+                        okay = invariant.value_is(my_value+1);
+                    }
+                }
+            }
+            if( !okay ) {
+                STD::printf( "ERROR for %s at %ld: %s %s %s %s\n",invariant.mutex_name, long(i),
+                             write?"write,":"read,", write?(i%16==7?"downgrade,":""):(i%8==3?"upgrade,":""),
+                             lock_kept?"lock kept,":"lock not kept,", (i/8)&1?"imp/exp":"exp/imp" );
+            }
+        }
+    }
+    TwiddleInvariant( I& invariant_ ) : invariant(invariant_) {}
+};
+
+/** This test is generic so that we can test any other kinds of ReaderWriter locks we write later. */
+template<typename M>
+void TestReaderWriterLock( const char * mutex_name ) {
+    if( Verbose ) {
+        printf("%s readers & writers time = ",mutex_name);
+        fflush(stdout);
+    }
+    Invariant<M,8> invariant(mutex_name);
+    const size_t n = 500000;
+    tbb::tick_count t0 = tbb::tick_count::now();
+    tbb::parallel_for(tbb::blocked_range<size_t>(0,n,5000),TwiddleInvariant<Invariant<M,8> >(invariant));
+    tbb::tick_count t1 = tbb::tick_count::now();
+    // There is either a writer or a reader upgraded to a writer for each 4th iteration
+    long expected_value = n/4;
+    if( !invariant.value_is(expected_value) )
+        STD::printf("ERROR for %s: final invariant value is wrong\n",mutex_name);
+    if( Verbose )
+        printf("%g usec\n",(t1-t0).seconds());
+}
+
+/** Test try_acquire functionality of a non-reenterable mutex */
+template<typename M>
+void TestTryAcquire_OneThread( const char * mutex_name ) {
+    M tested_mutex;
+    typename M::scoped_lock lock1;
+    if( lock1.try_acquire(tested_mutex) )
+        lock1.release();
+    else
+        STD::printf("ERROR for %s: try_acquire failed though it should not\n", mutex_name);
+    {
+        typename M::scoped_lock lock2(tested_mutex);
+        if( lock1.try_acquire(tested_mutex) )
+            STD::printf("ERROR for %s: try_acquire succeeded though it should not\n", mutex_name);
+    }
+    if( lock1.try_acquire(tested_mutex) )
+        lock1.release();
+    else
+        STD::printf("ERROR for %s: try_acquire failed though it should not\n", mutex_name);
+}
+
+#include "tbb/task_scheduler_init.h"
+
+int main( int argc, char * argv[] ) {
+    ParseCommandLine( argc, argv );
+    for( int p=MinThread; p<=MaxThread; ++p ) {
+        tbb::task_scheduler_init init( p );
+	if( Verbose )
+	    printf( "testing with %d workers\n", static_cast<int>(p) );
+	// Run each test 3 times.
+	for( int i=0; i<3; ++i ) {
+	    Test<tbb::spin_rw_mutex>( "Spin RW Mutex" );
+            
+            TestTryAcquire_OneThread<tbb::spin_rw_mutex>("Spin RW Mutex"); // only tests try_acquire for writers
+	    TestReaderWriterLock<tbb::spin_rw_mutex>( "Spin RW Mutex" );
+	if( Verbose )
+	    printf( "calling destructor for task_scheduler_init\n" );
+	}
+    }
+    STD::printf("done\n");
+    return 0;
+}
diff --git a/dep/tbb/src/rml/client/index.html b/dep/tbb/src/rml/client/index.html
new file mode 100644
index 000000000..5c7bd50fc
--- /dev/null
+++ b/dep/tbb/src/rml/client/index.html
@@ -0,0 +1,43 @@
+<HTML>
+<BODY>
+<H2>Overview</H2>
+
+This directory has source code that must be statically linked into an RML client.
+
+<H2>Files</H2>
+
+<DL>
+<DT><P><A HREF="rml_factory.h">rml_factory.h</A>
+<DD>Text shared by <A HREF="rml_omp.cpp">rml_omp.cpp</A> and <A HREF="rml_tbb.cpp">rml_tbb.cpp</A>.
+       This is not an ordinary include file, so it does not have an #ifndef guard.</P>
+</DL>
+
+<H3> Specific to client=OpenMP</H3>
+<DL>
+<DT><P><A HREF="rml_omp.cpp">rml_omp.cpp</A>
+<DD>Source file for OpenMP client.</P>
+<DT><P><A HREF="omp_dynamic_link.h">omp_dynamic_link.h</A>
+<DT><A HREF="omp_dynamic_link.cpp">omp_dynamic_link.cpp</A>
+<DD>Source files for dynamic linking support.  
+       The code is the code from the TBB source directory, but adjusted so that it 
+       appears in namespace <TT>__kmp</TT> instead of namespace <TT>tbb::internal</TT>.
+</DL>
+<H3> Specific to client=TBB</H3>
+<DL>
+<DT><P><A HREF="rml_tbb.cpp">rml_tbb.cpp</A>
+<DD>Source file for TBB client.  It uses the dynamic linking support from the TBB source directory.
+</DL>
+
+<HR>
+<A HREF="../index.html">Up to parent directory</A>
+<p></p>
+Copyright &copy; 2005-2009 Intel Corporation.  All Rights Reserved.
+<p></p>
+Intel, Pentium, Intel Xeon, Itanium, Intel XScale and VTune are 
+registered trademarks or trademarks of Intel Corporation or its 
+subsidiaries in the United States and other countries. 
+<p></p>
+* Other names and brands may be claimed as the property of others.
+</BODY>
+</HTML>
+
diff --git a/dep/tbb/src/rml/client/library_assert.h b/dep/tbb/src/rml/client/library_assert.h
new file mode 100644
index 000000000..6d8300b94
--- /dev/null
+++ b/dep/tbb/src/rml/client/library_assert.h
@@ -0,0 +1,41 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef LIBRARY_ASSERT_H
+#define LIBRARY_ASSERT_H
+
+#ifndef  LIBRARY_ASSERT
+#ifdef KMP_ASSERT2
+#define LIBRARY_ASSERT(x,y) KMP_ASSERT2((x),(y))
+#else
+#include <assert.h>
+#define LIBRARY_ASSERT(x,y) assert(x)
+#endif
+#endif /* LIBRARY_ASSERT */
+
+#endif /* LIBRARY_ASSERT_H */
diff --git a/dep/tbb/src/rml/client/omp_dynamic_link.cpp b/dep/tbb/src/rml/client/omp_dynamic_link.cpp
new file mode 100644
index 000000000..0f89a3ccb
--- /dev/null
+++ b/dep/tbb/src/rml/client/omp_dynamic_link.cpp
@@ -0,0 +1,32 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "omp_dynamic_link.h"
+#include "library_assert.h"
+#include "tbb/dynamic_link.cpp" // Refers to src/tbb, not include/tbb
+
diff --git a/dep/tbb/src/rml/client/omp_dynamic_link.h b/dep/tbb/src/rml/client/omp_dynamic_link.h
new file mode 100644
index 000000000..290b668fc
--- /dev/null
+++ b/dep/tbb/src/rml/client/omp_dynamic_link.h
@@ -0,0 +1,37 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __KMP_omp_dynamic_link_H
+#define __KMP_omp_dynamic_link_H
+
+#define OPEN_INTERNAL_NAMESPACE namespace __kmp {
+#define CLOSE_INTERNAL_NAMESPACE }
+
+#include "tbb/dynamic_link.h" // Refers to src/tbb, not include/tbb 
+
+#endif /* __KMP_omp_dynamic_link_H */
diff --git a/dep/tbb/src/rml/client/rml_factory.h b/dep/tbb/src/rml/client/rml_factory.h
new file mode 100644
index 000000000..2f584b9cf
--- /dev/null
+++ b/dep/tbb/src/rml/client/rml_factory.h
@@ -0,0 +1,100 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+// No ifndef guard because this file is not a normal include file.
+
+// FIXME - resolve whether _debug version of the RML should have different suffix. */
+
+#if TBB_USE_DEBUG
+#define DEBUG_SUFFIX "_debug"
+#else
+#define DEBUG_SUFFIX
+#endif /* TBB_USE_DEBUG */
+
+// RML_SERVER_NAME is the name of the RML server library.
+#if _WIN32||_WIN64
+#define RML_SERVER_NAME "irml" DEBUG_SUFFIX ".dll"
+#elif __APPLE__
+#define RML_SERVER_NAME "libirml" DEBUG_SUFFIX ".dylib"
+#elif __linux__
+#define RML_SERVER_NAME "libirml" DEBUG_SUFFIX ".so.1"
+#elif __FreeBSD__ || __sun
+#define RML_SERVER_NAME "libirml" DEBUG_SUFFIX ".so"
+#else
+#error Unknown OS
+#endif
+
+#include "library_assert.h"
+
+const ::rml::versioned_object::version_type CLIENT_VERSION = 1;
+
+::rml::factory::status_type FACTORY::open() {
+    // Failure of following assertion indicates that factory is already open, or not zero-inited.
+    LIBRARY_ASSERT( !library_handle, NULL );
+    status_type (*open_factory_routine)( factory&, version_type&, version_type );
+    dynamic_link_descriptor server_link_table[4] = {
+        DLD(__RML_open_factory,open_factory_routine),
+        MAKE_SERVER(my_make_server_routine),
+        DLD(__RML_close_factory,my_wait_to_close_routine),
+        GET_INFO(my_call_with_server_info_routine),
+    };
+    status_type result;
+    dynamic_link_handle h;
+    if( dynamic_link( RML_SERVER_NAME, server_link_table, 4, 4, &h ) ) {
+        library_handle = h; 
+        version_type server_version;
+        status_type result = (*open_factory_routine)( *this, server_version, CLIENT_VERSION );
+        // server_version can be checked here for incompatibility here if necessary.
+        return result;
+    } else {
+        library_handle = NULL;
+        result = st_not_found;
+    }
+    return result;
+}
+
+void FACTORY::close() {
+    if( library_handle ) {
+        (*my_wait_to_close_routine)(*this);
+        dynamic_link_handle h = library_handle;
+        dynamic_unlink(h);
+        library_handle = NULL;
+    }
+}
+
+::rml::factory::status_type FACTORY::make_server( SERVER*& s, CLIENT& c) {
+    // Failure of following assertion means that factory was not successfully opened.
+    LIBRARY_ASSERT( my_make_server_routine, NULL );
+    return (*my_make_server_routine)(*this,s,c);
+}
+
+void FACTORY::call_with_server_info( ::rml::server_info_callback_t cb, void* arg ) const {
+    // Failure of following assertion means that factory was not successfully opened.
+    LIBRARY_ASSERT( my_call_with_server_info_routine, NULL );
+    (*my_call_with_server_info_routine)( cb, arg );
+}
diff --git a/dep/tbb/src/rml/client/rml_omp.cpp b/dep/tbb/src/rml/client/rml_omp.cpp
new file mode 100644
index 000000000..38a5a5f63
--- /dev/null
+++ b/dep/tbb/src/rml/client/rml_omp.cpp
@@ -0,0 +1,44 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "rml_omp.h"
+#include "omp_dynamic_link.h"
+#include <assert.h>
+
+namespace __kmp {
+namespace rml {
+
+#define MAKE_SERVER(x) DLD(__KMP_make_rml_server,x)
+#define GET_INFO(x) DLD(__KMP_call_with_my_server_info,x)
+#define SERVER omp_server 
+#define CLIENT omp_client
+#define FACTORY omp_factory
+#include "rml_factory.h"
+
+} // rml
+} // __kmp
diff --git a/dep/tbb/src/rml/client/rml_tbb.cpp b/dep/tbb/src/rml/client/rml_tbb.cpp
new file mode 100644
index 000000000..7e1612e28
--- /dev/null
+++ b/dep/tbb/src/rml/client/rml_tbb.cpp
@@ -0,0 +1,46 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "../include/rml_tbb.h"
+#include "tbb/dynamic_link.h"
+#include <assert.h>
+
+namespace tbb {
+namespace internal {
+namespace rml {
+
+#define MAKE_SERVER(x) DLD(__TBB_make_rml_server,x)
+#define GET_INFO(x) DLD(__TBB_call_with_my_server_info,x)
+#define SERVER tbb_server 
+#define CLIENT tbb_client
+#define FACTORY tbb_factory
+#include "rml_factory.h"
+
+} // rml
+} // internal
+} // tbb
diff --git a/dep/tbb/src/rml/include/index.html b/dep/tbb/src/rml/include/index.html
new file mode 100644
index 000000000..aacad333b
--- /dev/null
+++ b/dep/tbb/src/rml/include/index.html
@@ -0,0 +1,30 @@
+<HTML>
+<BODY>
+<H2>Overview</H2>
+
+This directory has the include files for the Resource Management Layer (RML).
+
+<H2>Files</H2>
+
+<DL>
+<DT><P><A HREF="rml_base.h">rml_base.h</A>
+<DD>Interfaces shared by TBB and OpenMP.</P>
+<DT><P><A HREF="rml_omp.h">rml_omp.h</A>
+<DD>Interface exclusive to OpenMP.</P>
+<DT><P><A HREF="rml_tbb.h">rml_tbb.h</A>
+<DD>Interface exclusive to TBB.</P>
+</DL>
+
+<HR>
+<A HREF="../index.html">Up to parent directory</A>
+<p></p>
+Copyright &copy; 2005-2009 Intel Corporation.  All Rights Reserved.
+<p></p>
+Intel, Pentium, Intel Xeon, Itanium, Intel XScale and VTune are 
+registered trademarks or trademarks of Intel Corporation or its 
+subsidiaries in the United States and other countries. 
+<p></p>
+* Other names and brands may be claimed as the property of others.
+</BODY>
+</HTML>
+
diff --git a/dep/tbb/src/rml/include/rml_base.h b/dep/tbb/src/rml/include/rml_base.h
new file mode 100644
index 000000000..148edb28b
--- /dev/null
+++ b/dep/tbb/src/rml/include/rml_base.h
@@ -0,0 +1,186 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+// Header guard and namespace names follow rml conventions.
+
+#ifndef __RML_rml_base_H
+#define __RML_rml_base_H
+
+#include <cstddef>
+#if _WIN32||_WIN64
+#include <windows.h>
+#endif /* _WIN32||_WIN64 */
+
+#ifdef RML_PURE_VIRTUAL_HANDLER
+#define RML_PURE(T) {RML_PURE_VIRTUAL_HANDLER(); return (T)0;}
+#else
+#define RML_PURE(T) = 0;
+#endif
+
+namespace rml {
+
+//! Base class for denying assignment and copy constructor.
+class no_copy {
+    void operator=( no_copy& );
+    no_copy( no_copy& );
+public:
+    no_copy() {}
+};
+
+class server;
+
+class versioned_object {
+public:
+    //! A version number
+    typedef unsigned version_type;
+    
+    //! Get version of this object
+    /** The version number is incremented when a incompatible change is introduced.
+        The version number is invariant for the lifetime of the object. */
+    virtual version_type version() const RML_PURE(version_type)
+};
+
+//! Represents a client's job for an execution context.
+/** A job object is constructed by the client.
+    Not derived from versioned_object because version is same as for client. */
+class job {
+    friend class server;
+
+    //! Word for use by server
+    /** Typically the server uses it to speed up internal lookup.
+        Clients must not modify the word. */
+    void* scratch_ptr;
+};
+
+//! Information that client provides to server when asking for a server.
+/** The instance must endure at least until acknowledge_close_connection is called. */
+class client: public versioned_object {
+public:
+    //! Typedef for convenience of derived classes in other namespaces.
+    typedef ::rml::job job;
+
+    //! Index of a job in a job pool
+    typedef unsigned size_type;
+
+    //! Maximum number of threads that client can exploit profitably if nothing else is running on the machine.  
+    /** The returned value should remain invariant for the lifetime of the connection.  [idempotent] */
+    virtual size_type max_job_count() const RML_PURE(size_type)
+
+    //! Minimum stack size for each job.  0 means to use default stack size. [idempotent]
+    virtual std::size_t min_stack_size() const RML_PURE(std::size_t)
+
+    //! Server calls this routine when it needs client to create a job object.
+    /** Value of index is guaranteed to be unique for each job and in the half-open 
+        interval [0,max_job_count) */
+    virtual job* create_one_job() RML_PURE(job*)
+
+    //! Acknowledge that all jobs have been cleaned up.
+    /** Called by server in response to request_close_connection
+        after cleanup(job) has been called for each job. */
+    virtual void acknowledge_close_connection() RML_PURE(void)
+
+    enum policy_type {turnaround,throughput};
+
+    //! Inform server of desired policy. [idempotent]
+    virtual policy_type policy() const RML_PURE(policy_type)
+
+    //! Inform client that server is done with *this.   
+    /** Client should destroy the job.
+        Not necessarily called by execution context represented by *this.
+        Never called while any other thread is working on the job. */
+    virtual void cleanup( job& ) RML_PURE(void)
+
+    // In general, we should not add new virtual methods, because that would 
+    // break derived classes.  Think about reserving some vtable slots.  
+};
+
+// Information that server provides to client.
+// Virtual functions are routines provided by the server for the client to call. 
+class server: public versioned_object {
+public:
+    //! Typedef for convenience of derived classes.
+    typedef ::rml::job job;
+
+    //! Request that connection to server be closed.
+    /** Causes each job associated with the client to have its cleanup method called,
+        possibly by a thread different than the thread that created the job. 
+        This method can return before all cleanup methods return. 
+        Actions that have to wait after all cleanup methods return should be part of 
+        client::acknowledge_close_connection. */
+    virtual void request_close_connection() = 0;
+
+    //! Called by client thread when it reaches a point where it cannot make progress until other threads do.  
+    virtual void yield() = 0;
+
+    //! Called by client to indicate a change in the number of non-RML threads that are running.
+    /** This is a performance hint to the RML to adjust how many many threads it should let run 
+        concurrently.  The delta is the change in the number of non-RML threads that are running.
+        For example, a value of 1 means the client has started running another thread, and a value 
+        of -1 indicates that the client has blocked or terminated one of its threads. */
+    virtual void independent_thread_number_changed( int delta ) = 0;
+
+    //! Default level of concurrency for which RML strives when there are no non-RML threads running.
+    /** Normally, the value is the hardware concurrency minus one. 
+        The "minus one" accounts for the thread created by main(). */
+    virtual unsigned default_concurrency() const = 0;
+
+protected:
+    static void*& scratch_ptr( job& j ) {return j.scratch_ptr;}
+};
+
+class factory {
+public:
+    //! status results
+    enum status_type {
+        st_success=0,
+        st_connection_exists,
+        st_not_found,
+        st_incompatible
+    };
+
+    //! Scratch pointer for use by RML.
+    void* scratch_ptr;
+
+protected:
+    //! Pointer to routine that waits for server to indicate when client can close itself.
+    status_type (*my_wait_to_close_routine)( factory& );
+
+public:
+    //! Library handle for use by RML.
+#if _WIN32||_WIN64
+    HMODULE library_handle;
+#else
+    void* library_handle;
+#endif /* _WIN32||_WIN64 */ 
+};
+
+typedef void (*server_info_callback_t)( void* arg, const char* server_info );
+
+} // namespace rml
+
+#endif /* __RML_rml_base_H */
diff --git a/dep/tbb/src/rml/include/rml_omp.h b/dep/tbb/src/rml/include/rml_omp.h
new file mode 100644
index 000000000..d664908d3
--- /dev/null
+++ b/dep/tbb/src/rml/include/rml_omp.h
@@ -0,0 +1,123 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+// Header guard and namespace names follow OpenMP runtime conventions.
+
+#ifndef KMP_RML_OMP_H
+#define KMP_RML_OMP_H
+
+#include "rml_base.h"
+
+namespace __kmp {
+namespace rml {
+
+class omp_client;
+
+//------------------------------------------------------------------------
+// Classes instantiated by the server
+//------------------------------------------------------------------------
+
+//! Represents a set of worker threads provided by the server.
+class omp_server: public ::rml::server {
+public:
+    //! A number of threads
+    typedef unsigned size_type;
+
+    //! Return the number of coins in the bank. (negative if machine is oversubscribed).
+    virtual int current_balance() const RML_PURE(int);
+  
+    //! Request n coins.  Returns number of coins granted. Oversubscription amount if negative.
+    /** Always granted if is_strict is true.
+        - Positive or zero result indicates that the number of coins was taken from the bank.
+        - Negative result indicates that no coins were taken, and that the bank has deficit 
+          by that amount and the caller (if being a good citizen) should return that many coins.
+     */
+    virtual int try_increase_load( size_type /*n*/, bool /*strict*/ ) RML_PURE(size_type)
+
+    //! Return n coins into the bank.
+    virtual void decrease_load( size_type /*n*/ ) RML_PURE(void);
+
+    //! Convert n coins into n threads.
+    /** When a thread returns, it is converted back into a coin and the coin is returned to the bank. */
+    virtual void get_threads( size_type /*m*/, void* /*cookie*/, job* /*array*/[] ) RML_PURE(void);
+
+    /** Putting a thread to sleep - convert a thread into a coin
+        Waking up a thread        - convert a coin into a thread
+      
+       Note: conversion between a coin and a thread does not affect the accounting.
+     */
+};
+
+
+//------------------------------------------------------------------------
+// Classes (or base classes thereof) instantiated by the client
+//------------------------------------------------------------------------
+
+class omp_client: public ::rml::client {
+public:
+    //! Called by server thread when it runs its part of a parallel region.  
+    /** The index argument is a 0-origin index of this thread within the array
+        returned by method get_threads.  Server decreases the load by 1 after this method returns. */
+    virtual void process( job&, void* /*cookie*/, size_type /*index*/ ) RML_PURE(void)
+};
+
+/** Client must ensure that instance is zero-inited, typically by being a file-scope object. */
+class omp_factory: public ::rml::factory {
+
+    //! Pointer to routine that creates an RML server.
+    status_type (*my_make_server_routine)( omp_factory&, omp_server*&, omp_client& );
+
+    //! Pointer to routine that returns server version info.
+    void (*my_call_with_server_info_routine)( ::rml::server_info_callback_t cb, void* arg );
+
+public:
+    typedef ::rml::versioned_object::version_type version_type;
+    typedef omp_client client_type;
+    typedef omp_server server_type;
+
+    //! Open factory.
+    /** Dynamically links against RML library. 
+        Returns st_success, st_incompatible, or st_not_found. */
+    status_type open();
+
+    //! Factory method to be called by client to create a server object.
+    /** Factory must be open. 
+        Returns st_success or st_incompatible . */
+    status_type make_server( server_type*&, client_type& );
+
+    //! Close factory.
+    void close();
+
+    //! Call the callback with the server build info.
+    void call_with_server_info( ::rml::server_info_callback_t cb, void* arg ) const;
+};
+
+} // namespace rml
+} // namespace __kmp
+
+#endif /* KMP_RML_OMP_H */
diff --git a/dep/tbb/src/rml/include/rml_tbb.h b/dep/tbb/src/rml/include/rml_tbb.h
new file mode 100644
index 000000000..3c0d8a94c
--- /dev/null
+++ b/dep/tbb/src/rml/include/rml_tbb.h
@@ -0,0 +1,98 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+// Header guard and namespace names follow TBB conventions.
+
+#ifndef __TBB_rml_tbb_H
+#define __TBB_rml_tbb_H
+
+#include "rml_base.h"
+
+namespace tbb {
+namespace internal {
+namespace rml {
+
+class tbb_client;
+
+//------------------------------------------------------------------------
+// Classes instantiated by the server
+//------------------------------------------------------------------------
+class tbb_server: public ::rml::server {
+public:
+    //! Inform server of adjustments in the number of workers that the client can profitably use.
+    virtual void adjust_job_count_estimate( int delta ) = 0;
+};
+
+//------------------------------------------------------------------------
+// Classes instantiated by the client
+//------------------------------------------------------------------------
+
+class tbb_client: public ::rml::client {
+public:
+    //! Defined by TBB to steal a task and execute it.  
+    /** Called by server when wants an execution context to do some TBB work.
+        The method should return when it is okay for the thread to yield indefinitely. */
+    virtual void process( job& ) = 0;
+};
+
+/** Client must ensure that instance is zero-inited, typically by being a file-scope object. */
+class tbb_factory: public ::rml::factory {
+
+    //! Pointer to routine that creates an RML server.
+    status_type (*my_make_server_routine)( tbb_factory&, tbb_server*&, tbb_client& );
+
+    //! Pointer to routine that returns server version info.
+    void (*my_call_with_server_info_routine)( ::rml::server_info_callback_t cb, void* arg );
+
+public:
+    typedef ::rml::versioned_object::version_type version_type;
+    typedef tbb_client client_type;
+    typedef tbb_server server_type;
+
+    //! Open factory.
+    /** Dynamically links against RML library. 
+        Returns st_success, st_incompatible, or st_not_found. */
+    status_type open();
+
+    //! Factory method to be called by client to create a server object.
+    /** Factory must be open. 
+        Returns st_success, st_connection_exists, or st_incompatible . */
+    status_type make_server( server_type*&, client_type& );
+
+    //! Close factory
+    void close();
+
+    //! Call the callback with the server build info
+    void call_with_server_info( ::rml::server_info_callback_t cb, void* arg ) const;
+};
+
+} // namespace rml
+} // namespace internal
+} // namespace tbb
+
+#endif /*__TBB_rml_tbb_H */
diff --git a/dep/tbb/src/rml/index.html b/dep/tbb/src/rml/index.html
new file mode 100644
index 000000000..9c403afa5
--- /dev/null
+++ b/dep/tbb/src/rml/index.html
@@ -0,0 +1,32 @@
+<HTML>
+<BODY>
+<H2>Overview</H2>
+
+The subdirectories pertain to the Resource Management Layer (RML).
+
+<H2>Directories</H2>
+
+<DL>
+<DT><P><A HREF="include/index.html">include/</A>
+<DD>Include files used by clients of RML.</P>
+<DT><P><A HREF="client/index.html">client/</A>
+<DD>Source files for code that must be statically linked with a client.</P>
+<DT><P><A HREF="server/index.html">server/</A>
+<DD>Source files for the RML server.</P>
+<DT><P><A HREF="test">test/</A>
+<DD>Unit tests for RML server and its components.</P>
+</DL>
+
+<HR>
+<A HREF="../index.html">Up to parent directory</A>
+<p></p>
+Copyright &copy; 2005-2009 Intel Corporation.  All Rights Reserved.
+<p></p>
+Intel, Pentium, Intel Xeon, Itanium, Intel XScale and VTune are 
+registered trademarks or trademarks of Intel Corporation or its 
+subsidiaries in the United States and other countries. 
+<p></p>
+* Other names and brands may be claimed as the property of others.
+</BODY>
+</HTML>
+
diff --git a/dep/tbb/src/rml/server/index.html b/dep/tbb/src/rml/server/index.html
new file mode 100644
index 000000000..e2750c643
--- /dev/null
+++ b/dep/tbb/src/rml/server/index.html
@@ -0,0 +1,19 @@
+<HTML>
+<BODY>
+<H2>Overview</H2>
+
+This directory has source code internal to the server.
+
+<HR>
+<A HREF="../index.html">Up to parent directory</A>
+<p></p>
+Copyright &copy; 2005-2009 Intel Corporation.  All Rights Reserved.
+<p></p>
+Intel, Pentium, Intel Xeon, Itanium, Intel XScale and VTune are 
+registered trademarks or trademarks of Intel Corporation or its 
+subsidiaries in the United States and other countries. 
+<p></p>
+* Other names and brands may be claimed as the property of others.
+</BODY>
+</HTML>
+
diff --git a/dep/tbb/src/rml/server/irml.rc b/dep/tbb/src/rml/server/irml.rc
new file mode 100644
index 000000000..35e5db81d
--- /dev/null
+++ b/dep/tbb/src/rml/server/irml.rc
@@ -0,0 +1,126 @@
+// Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+//
+// This file is part of Threading Building Blocks.
+//
+// Threading Building Blocks is free software; you can redistribute it
+// and/or modify it under the terms of the GNU General Public License
+// version 2 as published by the Free Software Foundation.
+//
+// Threading Building Blocks is distributed in the hope that it will be
+// useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+// of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with Threading Building Blocks; if not, write to the Free Software
+// Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+//
+// As a special exception, you may use this file as part of a free software
+// library without restriction.  Specifically, if other files instantiate
+// templates or use macros or inline functions from this file, or you compile
+// this file and link it with other files to produce an executable, this
+// file does not by itself cause the resulting executable to be covered by
+// the GNU General Public License.  This exception does not however
+// invalidate any other reasons why the executable file might be covered by
+// the GNU General Public License.
+
+// Microsoft Visual C++ generated resource script.
+//
+#ifdef APSTUDIO_INVOKED
+#ifndef APSTUDIO_READONLY_SYMBOLS
+#define _APS_NO_MFC                     1
+#define _APS_NEXT_RESOURCE_VALUE        102
+#define _APS_NEXT_COMMAND_VALUE         40001
+#define _APS_NEXT_CONTROL_VALUE         1001
+#define _APS_NEXT_SYMED_VALUE           101
+#endif
+#endif
+
+#define APSTUDIO_READONLY_SYMBOLS
+/////////////////////////////////////////////////////////////////////////////
+//
+// Generated from the TEXTINCLUDE 2 resource.
+//
+#include <winresrc.h>
+#define ENDL "\r\n"
+#include "tbb/tbb_version.h"
+
+/////////////////////////////////////////////////////////////////////////////
+#undef APSTUDIO_READONLY_SYMBOLS
+
+/////////////////////////////////////////////////////////////////////////////
+// Neutral resources
+
+#if !defined(AFX_RESOURCE_DLL) || defined(AFX_TARG_NEU)
+#ifdef _WIN32
+LANGUAGE LANG_NEUTRAL, SUBLANG_NEUTRAL
+#pragma code_page(1252)
+#endif //_WIN32
+
+/////////////////////////////////////////////////////////////////////////////
+// manifest integration
+#ifdef TBB_MANIFEST
+#include "winuser.h"
+2 RT_MANIFEST tbbmanifest.exe.manifest
+#endif
+
+/////////////////////////////////////////////////////////////////////////////
+//
+// Version
+//
+
+VS_VERSION_INFO VERSIONINFO
+ FILEVERSION TBB_VERNUMBERS
+ PRODUCTVERSION TBB_VERNUMBERS
+ FILEFLAGSMASK 0x17L
+#ifdef _DEBUG
+ FILEFLAGS 0x1L
+#else
+ FILEFLAGS 0x0L
+#endif
+ FILEOS 0x40004L
+ FILETYPE 0x2L
+ FILESUBTYPE 0x0L
+BEGIN
+    BLOCK "StringFileInfo"
+    BEGIN
+        BLOCK "000004b0"
+        BEGIN
+            VALUE "CompanyName", "Intel Corporation\0"
+            VALUE "FileDescription", "Resource manager library\0"
+            VALUE "FileVersion", TBB_VERSION "\0"
+//what is it?            VALUE "InternalName", "irml\0"
+            VALUE "LegalCopyright", "Copyright (C) 2009\0"
+            VALUE "LegalTrademarks", "\0"
+#ifndef TBB_USE_DEBUG
+            VALUE "OriginalFilename", "irml.dll\0"
+#else
+            VALUE "OriginalFilename", "irml_debug.dll\0"
+#endif
+            VALUE "ProductName", "Threading Building Blocks\0"
+            VALUE "ProductVersion", TBB_VERSION "\0"
+            VALUE "Comments", TBB_VERSION_STRINGS "\0"
+            VALUE "PrivateBuild", "\0"
+            VALUE "SpecialBuild", "\0"
+        END
+    END
+    BLOCK "VarFileInfo"
+    BEGIN
+        VALUE "Translation", 0x0, 1200
+    END
+END
+
+#endif    // Neutral resources
+/////////////////////////////////////////////////////////////////////////////
+
+
+#ifndef APSTUDIO_INVOKED
+/////////////////////////////////////////////////////////////////////////////
+//
+// Generated from the TEXTINCLUDE 3 resource.
+//
+
+
+/////////////////////////////////////////////////////////////////////////////
+#endif    // not APSTUDIO_INVOKED
+
diff --git a/dep/tbb/src/rml/server/job_automaton.h b/dep/tbb/src/rml/server/job_automaton.h
new file mode 100644
index 000000000..7e3c4f354
--- /dev/null
+++ b/dep/tbb/src/rml/server/job_automaton.h
@@ -0,0 +1,157 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __RML_job_automaton_H
+#define __RML_job_automaton_H
+
+#include "rml_base.h"
+#include "tbb/atomic.h"
+
+#if defined(_MSC_VER) && !defined(__INTEL_COMPILER)
+    // Workaround for overzealous compiler warnings 
+    #pragma warning (push)
+    #pragma warning (disable: 4244)
+#endif
+
+namespace rml {
+
+namespace internal {
+
+//! Finite state machine.   
+/**   /--------------\
+     /                V
+    0 --> 1--> ptr --> -1 
+                ^
+                |
+                |
+                V
+              ptr|1
+
+"owner" = corresponding server_thread.
+Odd states indicate that someone is executing code on the job.
+Furthermore, odd states!=-1 indicate that owner will read its mailbox shortly.
+Most transitions driven only by owner.
+Transition 0-->-1 is driven by non-owner.
+Transition ptr->-1 is driven  by owner or non-owner.
+*/ 
+class job_automaton: no_copy {
+private:
+    tbb::atomic<intptr_t> my_job;
+public:
+    /** Created by non-owner */
+    job_automaton() {
+        my_job = 0;
+    }
+ 
+    ~job_automaton() {
+        __TBB_ASSERT( my_job==-1, "must plug before destroying" );
+    }
+
+    //! Try to transition 0-->1 or ptr-->ptr|1.
+    /** Should only be called by owner. */
+    bool try_acquire() {
+        intptr_t snapshot = my_job;
+        if( snapshot==-1 ) {
+            return false;
+        } else {
+            __TBB_ASSERT( (snapshot&1)==0, "already marked that way" );
+            intptr_t old = my_job.compare_and_swap( snapshot|1, snapshot );
+            __TBB_ASSERT( old==snapshot || old==-1, "unexpected interference" );  
+            return old==snapshot;
+        }
+    }
+    //! Transition ptr|1-->ptr
+    /** Should only be called by owner. */
+    void release() {
+        intptr_t snapshot = my_job;
+        __TBB_ASSERT( snapshot&1, NULL );
+        // Atomic store suffices here.
+        my_job = snapshot&~1;
+    }
+
+    //! Transition 1-->ptr
+    /** Should only be called by owner. */
+    void set_and_release( rml::job& job ) {
+        intptr_t value = reinterpret_cast<intptr_t>(&job);
+        __TBB_ASSERT( (value&1)==0, "job misaligned" );
+        __TBB_ASSERT( value!=0, "null job" );
+        __TBB_ASSERT( my_job==1, "already set, or not marked busy?" );
+        // Atomic store suffices here.
+        my_job = value;
+    }
+
+    //! Transition 0-->-1
+    /** If successful, return true. */
+    bool try_plug_null() {
+        return my_job.compare_and_swap( -1, 0 )==0;
+    }
+
+    //! Try to transition to -1.  If successful, set j to contents and return true.
+    /** Called by owner or non-owner. */
+    bool try_plug( rml::job*&j ) {
+        for(;;) {
+            intptr_t snapshot = my_job;
+            if( snapshot&1 ) {
+                // server_thread that owns job is executing a mailbox item for the job,
+                // and will thus read its mailbox afterwards, and see a terminate request
+                // for the job.
+                j = NULL;
+                return false;
+            } 
+            // Not busy
+            if( my_job.compare_and_swap( -1, snapshot )==snapshot ) {
+                j = reinterpret_cast<rml::job*>(snapshot);
+                return true;
+            } 
+            // Need to retry, because current thread may be nonowner that read a 0, and owner might have
+            // caused transition 0->1->ptr after we took our snapshot.
+        }
+    }
+
+    /** Called by non-owner to wait for transition to ptr. */
+    rml::job& wait_for_job() const {
+        intptr_t snapshot;
+        for(;;) {
+            snapshot = my_job;
+            if( snapshot&~1 ) break;
+            __TBB_Yield();
+        }
+        __TBB_ASSERT( snapshot!=-1, "wait on plugged job_automaton" );
+        return *reinterpret_cast<rml::job*>(snapshot&~1);
+    }
+};
+
+} // namespace internal
+} // namespace rml
+
+
+#if defined(_MSC_VER) && !defined(__INTEL_COMPILER)
+    #pragma warning (pop)
+#endif // warning 4244 are back
+
+#endif /* __RML_job_automaton_H */
diff --git a/dep/tbb/src/rml/server/lin-rml-export.def b/dep/tbb/src/rml/server/lin-rml-export.def
new file mode 100644
index 000000000..2c332aa0d
--- /dev/null
+++ b/dep/tbb/src/rml/server/lin-rml-export.def
@@ -0,0 +1,38 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+{
+global:
+__RML_open_factory;
+__RML_close_factory;
+__TBB_make_rml_server;
+__KMP_make_rml_server;
+__TBB_call_with_my_server_info;
+__KMP_call_with_my_server_info;
+local:*;
+};
diff --git a/dep/tbb/src/rml/server/rml_server.cpp b/dep/tbb/src/rml/server/rml_server.cpp
new file mode 100644
index 000000000..0ffdfe72c
--- /dev/null
+++ b/dep/tbb/src/rml/server/rml_server.cpp
@@ -0,0 +1,1287 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "rml_tbb.h"
+#define private public /* Sleazy trick to avoid publishing internal names in public header. */
+#include "rml_omp.h"
+#undef private
+
+#include "tbb/tbb_allocator.h"
+#include "tbb/cache_aligned_allocator.h"
+#include "job_automaton.h"
+#include "wait_counter.h"
+#include "thread_monitor.h"
+#include "tbb/aligned_space.h"
+#include "tbb/atomic.h"
+#include "tbb/tbb_misc.h"           // Get DetectNumberOfWorkers() from here.
+#if _MSC_VER==1500 && !defined(__INTEL_COMPILER)
+// VS2008/VC9 seems to have an issue; 
+#pragma warning( push )
+#pragma warning( disable: 4985 )
+#endif
+#include "tbb/concurrent_vector.h"
+#if _MSC_VER==1500 && !defined(__INTEL_COMPILER)
+#pragma warning( pop )
+#endif
+
+namespace rml {
+
+namespace internal {
+
+//! Number of hardware contexts
+static inline unsigned hardware_concurrency() {
+    static unsigned DefaultNumberOfThreads = 0;
+    unsigned n = DefaultNumberOfThreads;
+    if( !n ) DefaultNumberOfThreads = n = tbb::internal::DetectNumberOfWorkers();
+    return n;
+}
+
+using tbb::internal::rml::tbb_client;
+using tbb::internal::rml::tbb_server;
+
+using __kmp::rml::omp_client;
+using __kmp::rml::omp_server;
+
+typedef versioned_object::version_type version_type;
+
+const version_type SERVER_VERSION = 1;
+
+static const size_t cache_line_size = tbb::internal::NFS_MaxLineSize;
+
+template<typename Server, typename Client> class generic_connection;
+class tbb_connection_v1;
+class omp_connection_v1;
+
+enum request_kind {
+    rk_none,
+    rk_initialize_tbb_job,
+    rk_terminate_tbb_job,
+    rk_initialize_omp_job,
+    rk_terminate_omp_job
+};
+
+//! State of a server_thread
+/** Below is a diagram of legal state transitions.
+
+    OMP
+              ts_omp_busy               
+              ^          ^       
+             /            \       
+            /              V       
+    ts_asleep <-----------> ts_idle 
+
+    TBB 
+              ts_tbb_busy               
+              ^          ^       
+             /            \       
+            /              V       
+    ts_asleep <-----------> ts_idle --> ts_done
+
+    For TBB only. Extra state transition.
+
+    ts_created -> ts_started -> ts_visited
+ */
+enum thread_state_t {
+    //! Thread not doing anything useful, but running and looking for work. 
+    ts_idle,
+    //! Thread not doing anything useful and is asleep */
+    ts_asleep,
+    //! Thread is enlisted into OpenMP team
+    ts_omp_busy,
+    //! Thread is busy doing TBB work.
+    ts_tbb_busy,
+    //! For tbb threads only
+    ts_done,
+    ts_created,
+    ts_started,
+    ts_visited
+};
+
+#if TBB_USE_ASSERT
+#define PRODUCE_ARG(x) ,x
+#else
+#define PRODUCE_ARG(x) 
+#endif
+
+//! Synchronizes dispatch of OpenMP work.
+class omp_dispatch_type {
+    typedef ::rml::job job_type;
+    omp_client* client;
+    void* cookie;
+    omp_client::size_type index;
+    tbb::atomic<job_type*> job;
+#if TBB_USE_ASSERT
+    omp_connection_v1* server;
+#endif /* TBB_USE_ASSERT */
+public:
+    omp_dispatch_type() {job=NULL;}
+    void consume();
+    void produce( omp_client& c, job_type& j, void* cookie_, omp_client::size_type index_ PRODUCE_ARG( omp_connection_v1& s )) {
+        __TBB_ASSERT( &j, NULL );
+        __TBB_ASSERT( !job, "job already set" );
+        client = &c;
+#if TBB_USE_ASSERT
+        server = &s;
+#endif /* TBB_USE_ASSERT */
+        cookie = cookie_;
+        index = index_;
+        // Must be last
+        job = &j;
+    }
+};
+
+//! A reference count.
+/** No default constructor, because clients must be very careful about whether the 
+    initial reference count is 0 or 1. */
+class ref_count: no_copy {
+    tbb::atomic<int> my_ref_count;
+public:
+    ref_count(int k ) {my_ref_count=k;}
+    ~ref_count() {__TBB_ASSERT( !my_ref_count, "premature destruction of refcounted object" );}
+    //! Add one and return new value.
+    int add_ref() {
+        int k = ++my_ref_count;
+        __TBB_ASSERT(k>=1,"reference count underflowed before add_ref");
+        return k;
+    }
+    //! Subtract one and return new value.
+    int remove_ref() {
+        int k = --my_ref_count; 
+        __TBB_ASSERT(k>=0,"reference count underflow");
+        return k;
+    }
+};
+
+//! Forward declaration
+class server_thread;
+class thread_map;
+
+//! thread_map_base; we need to make the iterator type available to server_thread
+struct thread_map_base {
+    //! A value in the map
+    class value_type {
+    public:
+        server_thread& thread() {
+            __TBB_ASSERT( my_thread, "thread_map::value_type::thread() called when !my_thread" );
+            return *my_thread;
+        }
+        rml::job& job() {
+            __TBB_ASSERT( my_job, "thread_map::value_type::job() called when !my_job" );
+            return *my_job;
+        }
+        value_type() : my_thread(NULL), my_job(NULL) {}
+        server_thread& wait_for_thread() const {
+            for(;;) {
+                server_thread* ptr=const_cast<server_thread*volatile&>(my_thread);
+                if( ptr ) 
+                    return *ptr;
+                __TBB_Yield();
+            } 
+        }
+        /** Shortly after when a connection is established, it is possible for the server
+            to grab a server_thread that has not yet created a job object for that server. */
+        rml::job& wait_for_job() const {
+            if( !my_job ) {
+                my_job = &my_automaton.wait_for_job();
+            }
+            return *my_job;
+        }
+    private:
+        server_thread* my_thread;
+        /** Marked mutable because though it is physically modified, conceptually it is a duplicate of 
+            the job held by job_automaton. */
+        mutable rml::job* my_job;
+        job_automaton my_automaton;
+// FIXME - pad out to cache line, because my_automaton is hit hard by thread()
+        friend class thread_map;
+    };
+    typedef tbb::concurrent_vector<value_type,tbb::zero_allocator<value_type,tbb::cache_aligned_allocator> > array_type;
+};
+
+template<typename T>
+class padded: public T {
+    char pad[cache_line_size - sizeof(T)%cache_line_size];
+};
+
+// FIXME - should we pad out memory to avoid false sharing of our global variables?
+
+static tbb::atomic<int> the_balance;
+static tbb::atomic<int> the_balance_inited;
+
+//! Per thread information 
+/** ref_count holds number of clients that are using this, 
+    plus 1 if a host thread owns this instance. */
+class server_thread: public ref_count {
+    friend class thread_map;
+    template<typename Server, typename Client> friend class generic_connection;
+    //! Integral type that can hold a thread_state_t
+    typedef int thread_state_rep_t;
+    tbb::atomic<thread_state_rep_t> state;
+public:
+    thread_monitor monitor;
+    // FIXME: make them private...
+    bool    is_omp_thread;
+    tbb::atomic<thread_state_rep_t> tbb_state;
+    server_thread* link; // FIXME: this is a temporary fix. Remove when all is done.
+    thread_map_base::array_type::iterator my_map_pos;
+private:
+    rml::server *my_conn;
+    rml::job* my_job;
+    job_automaton* my_ja;
+    size_t my_index;
+
+#if TBB_USE_ASSERT
+    //! Flag used to check if thread is still using *this.
+    bool has_active_thread;
+#endif /* TBB_USE_ASSERT */
+
+    //! Volunteer to sleep. 
+    void sleep_perhaps( thread_state_t asleep );
+
+    //! Destroy job corresponding to given client
+    /** Return true if thread must quit. */
+    template<typename Connection>
+    bool destroy_job( Connection& c );
+
+    //! Process requests
+    /** Return true if thread must quit. */
+    bool process_requests();
+
+    void loop();
+    static __RML_DECL_THREAD_ROUTINE thread_routine( void* arg ); 
+public:
+    thread_state_t read_state() const {
+        thread_state_rep_t s = state;
+        __TBB_ASSERT( unsigned(s)<=unsigned(ts_done), "corrupted server thread?" );
+        return thread_state_t(s);
+    }
+
+    tbb::atomic<request_kind> request;
+
+    omp_dispatch_type omp_dispatch;
+
+    server_thread();
+    ~server_thread();
+
+    //! Launch a thread that is bound to *this.
+    void launch( size_t stack_size );
+
+    //! Attempt to wakeup a thread 
+    /** The value "to" is the new state for the thread, if it was woken up.
+        Returns true if thread was woken up, false otherwise. */
+    bool wakeup( thread_state_t to, thread_state_t from );
+
+    //! Attempt to enslave a thread for OpenMP/TBB.
+    bool try_grab_for( thread_state_t s );
+};
+
+//! Bag of threads that are private to a client.
+class private_thread_bag {
+    struct list_thread: server_thread {
+       list_thread* next;
+    };
+    //! Root of atomic linked list of list_thread
+    /** ABA problem is avoided because items are only atomically pushed, never popped. */
+    tbb::atomic<list_thread*> my_root; 
+    tbb::cache_aligned_allocator<padded<list_thread> > my_allocator; 
+public:
+    //! Construct empty bag
+    private_thread_bag() {my_root=NULL;}
+
+    //! Create a fresh server_thread object.
+    server_thread& add_one_thread() {
+        list_thread* t = my_allocator.allocate(1);
+        new( t ) list_thread;
+        // Atomically add to list
+        list_thread* old_root;
+        do {
+            old_root = my_root;
+            t->next = old_root;
+        } while( my_root.compare_and_swap( t, old_root )!=old_root );
+        return *t;  
+    }
+
+    //! Destroy the bag and threads in it. 
+    ~private_thread_bag() {
+        while( my_root ) {
+            // Unlink thread from list.
+            list_thread* t = my_root;
+            my_root = t->next;
+            // Destroy and deallocate the thread.
+            t->~list_thread();
+            my_allocator.deallocate(static_cast<padded<list_thread>*>(t),1);    
+        }
+    }
+};
+
+//! Forward declaration
+void wakeup_some_tbb_threads();
+
+//! Type-independent part of class generic_connection. *
+/** One to one map from server threads to jobs, and associated reference counting. */
+class thread_map : public thread_map_base {
+public:
+    typedef rml::client::size_type size_type;
+    //! ctor
+    thread_map( wait_counter& fc, ::rml::client& client ) : 
+        all_visited_at_least_once(false), my_min_stack_size(0), my_server_ref_count(1),
+        my_client_ref_count(1), my_client(client), my_factory_counter(fc)
+    { my_unrealized_threads = 0; }
+    //! dtor
+    ~thread_map() {}
+    typedef array_type::iterator iterator;
+    iterator begin() {return my_array.begin();}
+    iterator end() {return my_array.end();}
+    void bind( /* rml::server& server, message_kind initialize */ );
+    void unbind( request_kind terminate );
+    void assist_cleanup( bool assist_null_only );
+
+    /** Returns number of unrealized threads to create. */
+    size_type wakeup_tbb_threads( size_type n );
+    bool wakeup_next_thread( iterator i, tbb_connection_v1& conn );
+    void release_tbb_threads( server_thread* t );
+    void adjust_balance( int delta );
+
+    //! Add a server_thread object to the map, but do not bind it.
+    /** Return NULL if out of unrealized threads. */
+    value_type* add_one_thread( bool is_omp_thread_ );
+
+    void bind_one_thread( rml::server& server, request_kind initialize, value_type& x );
+
+    void remove_client_ref();
+    int add_server_ref() {return my_server_ref_count.add_ref();}
+    int remove_server_ref() {return my_server_ref_count.remove_ref();}
+
+    ::rml::client& client() const {return my_client;}
+
+    size_type get_unrealized_threads() { return my_unrealized_threads; }
+
+private:
+    private_thread_bag my_private_threads;
+    bool all_visited_at_least_once;
+    array_type my_array;
+    size_t my_min_stack_size;
+    tbb::atomic<size_type> my_unrealized_threads;
+
+    //! Number of threads referencing *this, plus one extra.
+    /** When it becomes zero, the containing server object can be safely deleted. */
+    ref_count my_server_ref_count;
+
+    //! Number of jobs that need cleanup, plus one extra.
+    /** When it becomes zero, acknowledge_close_connection is called. */
+    ref_count my_client_ref_count;
+    ::rml::client& my_client;
+    //! Counter owned by factory that produced this thread_map.
+    wait_counter& my_factory_counter;
+};
+
+void thread_map::bind_one_thread( rml::server& server, request_kind initialize, value_type& x ) {
+    // Add one to account for the thread referencing this map hereforth.
+    server_thread& t = x.thread();
+    my_server_ref_count.add_ref();
+    my_client_ref_count.add_ref();
+#if TBB_USE_ASSERT
+    __TBB_ASSERT( t.add_ref()==1, NULL );
+#else
+    t.add_ref();
+#endif
+    // Have responsibility to start the thread.
+    t.my_conn = &server;
+    t.my_ja = &x.my_automaton;
+    t.request = initialize;
+    t.launch( my_min_stack_size );
+    // Must wakeup thread so it can fill in its "my_job" field in *this.
+    // Otherwise deadlock can occur where wait_for_job spins on thread that is sleeping.
+    __TBB_ASSERT( t.state!=ts_tbb_busy, NULL );
+    t.wakeup( ts_idle, ts_asleep );
+}
+
+thread_map::value_type* thread_map::add_one_thread( bool is_omp_thread_ ) {
+    size_type u;
+    do {
+        u = my_unrealized_threads;
+        if( !u ) return NULL;
+    } while( my_unrealized_threads.compare_and_swap(u-1,u)!=u );
+    server_thread& t = my_private_threads.add_one_thread();
+    t.is_omp_thread = is_omp_thread_;
+    __TBB_ASSERT( u>=1, NULL );
+    t.my_index = u - 1;
+    __TBB_ASSERT( t.state!=ts_tbb_busy, NULL );
+    if( !t.is_omp_thread )
+        t.tbb_state = ts_created;
+    iterator i = t.my_map_pos = my_array.grow_by(1);
+    value_type& v = *i;
+    v.my_thread = &t;
+    return &v;
+}
+
+void thread_map::bind( /* rml::server& server, request_kind initialize */ ) {
+    ++my_factory_counter;
+    my_min_stack_size = my_client.min_stack_size();
+    __TBB_ASSERT( my_unrealized_threads==0, "already called bind?" );
+    my_unrealized_threads = my_client.max_job_count();
+}
+
+void thread_map::unbind( request_kind terminate ) {
+    // Ask each server_thread to cleanup its job for this server.
+    for( iterator i=begin(); i!=end(); ++i ) {
+        server_thread& t = i->thread();
+        // The last parameter of the message is not used by the recipient. 
+        t.request = terminate;
+        t.wakeup( ts_idle, ts_asleep );
+    }
+    // Remove extra ref to client.
+    remove_client_ref();
+}
+
+void thread_map::assist_cleanup( bool assist_null_only ) {
+    // To avoid deadlock, the current thread *must* help out with cleanups that have not started,
+    // becausd the thread that created the job may be busy for a long time.
+    for( iterator i = begin(); i!=end(); ++i ) {
+        rml::job* j=0;
+        job_automaton& ja = i->my_automaton;
+        if( assist_null_only ? ja.try_plug_null() : ja.try_plug(j) ) {
+            if( j ) {
+                my_client.cleanup(*j);
+            } else {
+                // server thread did not get a chance to create a job.
+            }
+            remove_client_ref();
+        } 
+    }
+}
+
+thread_map::size_type thread_map::wakeup_tbb_threads( size_type n ) {
+    __TBB_ASSERT(n>0,"must specify positive number of threads to wake up");
+    iterator e = end();
+    for( iterator k=begin(); k!=e; ++k ) {
+        // If another thread added *k, there is a tiny timing window where thread() is invalid.
+        server_thread& t = k->wait_for_thread();
+        if( t.tbb_state==ts_created || t.read_state()==ts_tbb_busy )
+            continue;
+        if( --the_balance>=0 ) { // try to withdraw a coin from the deposit
+            while( !t.try_grab_for( ts_tbb_busy ) ) {
+                if( t.read_state()==ts_tbb_busy ) {
+                    // we lost; move on to the next.
+                    ++the_balance;
+                    goto skip;
+                }
+            }
+            if( --n==0 ) 
+                return 0;
+        } else {
+            // overdraft.
+            ++the_balance;
+            break;
+        }
+skip:
+        ;
+    }
+    return n<my_unrealized_threads ? n : my_unrealized_threads;
+}
+
+void thread_map::remove_client_ref() {
+    int k = my_client_ref_count.remove_ref();
+    if( k==0 ) {
+        // Notify factory that thread has crossed back into RML.
+        --my_factory_counter;
+        // Notify client that RML is done with the client object.
+        my_client.acknowledge_close_connection();
+    } 
+}
+
+//------------------------------------------------------------------------
+// generic_connection
+//------------------------------------------------------------------------
+
+template<typename Server, typename Client>
+struct connection_traits {};
+
+static tbb::atomic<tbb_connection_v1*> this_tbb_connection;
+
+template<typename Server, typename Client>
+class generic_connection: public Server, no_copy {
+    /*override*/ version_type version() const {return SERVER_VERSION;}
+    /*override*/ void yield() {thread_monitor::yield();}
+    /*override*/ void independent_thread_number_changed( int delta ) {my_thread_map.adjust_balance( -delta );}
+    /*override*/ unsigned default_concurrency() const {return hardware_concurrency()-1;}
+
+protected:
+    thread_map my_thread_map;
+    void do_open() {my_thread_map.bind();}
+    void request_close_connection();
+    //! Make destructor virtual
+    virtual ~generic_connection() {}
+    generic_connection( wait_counter& fc, Client& c ) : my_thread_map(fc,c) {}
+
+public:
+    Client& client() const {return static_cast<Client&>(my_thread_map.client());}
+    int add_server_ref () {return my_thread_map.add_server_ref();}
+    void remove_server_ref() {if( my_thread_map.remove_server_ref()==0 ) delete this;}
+    void remove_client_ref() {my_thread_map.remove_client_ref();}
+    void make_job( server_thread& t, job_automaton& ja );
+};
+
+//------------------------------------------------------------------------
+// TBB server
+//------------------------------------------------------------------------
+
+template<>
+struct connection_traits<tbb_server,tbb_client> {
+    static const request_kind initialize = rk_initialize_tbb_job;
+    static const request_kind terminate = rk_terminate_tbb_job;
+    static const bool assist_null_only = true;
+    static const bool is_tbb = true;
+};
+
+//! Represents a server and client binding.
+/** The internal representation uses inheritance for the server part and a pointer for the client part. */
+class tbb_connection_v1: public generic_connection<tbb_server,tbb_client> {
+    friend void wakeup_some_tbb_threads();
+    /*override*/ void adjust_job_count_estimate( int delta );
+    //! Estimate on number of jobs without threads working on them.
+    tbb::atomic<int> my_slack;
+    friend class dummy_class_to_shut_up_gratuitous_warning_from_gcc_3_2_3;
+#if TBB_USE_ASSERT
+    tbb::atomic<int> my_job_count_estimate;
+#endif /* TBB_USE_ASSERT */
+
+    // pad these? or use a single variable w/ atomic add/subtract?
+    tbb::atomic<int> n_adjust_job_count_requests;
+    ~tbb_connection_v1();
+
+public:
+    enum tbb_conn_t {
+        c_empty  =  0,
+        c_init   = -1,
+        c_locked = -2
+    };
+
+    //! True if there is slack that try_process can use.
+    bool has_slack() const {return my_slack>0;}
+
+    bool try_process( job& job ) {
+        bool visited = false;
+        // No check for my_slack>0 here because caller is expected to do that check.
+        int k = --my_slack;
+        if( k>=0 ) {
+            client().process(job);
+            visited = true;
+        }
+        ++my_slack; 
+        return visited;
+    }
+
+    tbb_connection_v1( wait_counter& fc, tbb_client& client ) : generic_connection<tbb_server,tbb_client>(fc,client) {
+        my_slack = 0;
+#if TBB_USE_ASSERT
+        my_job_count_estimate = 0;
+#endif /* TBB_USE_ASSERT */
+        __TBB_ASSERT( !my_slack, NULL );
+        do_open();
+        __TBB_ASSERT( this_tbb_connection==reinterpret_cast<tbb_connection_v1*>(tbb_connection_v1::c_init), NULL );
+        n_adjust_job_count_requests = 0;
+        this_tbb_connection = this;
+    }
+
+    void wakeup_tbb_threads( unsigned n ) {my_thread_map.wakeup_tbb_threads( n );}
+    bool wakeup_next_thread( thread_map::iterator i ) {return my_thread_map.wakeup_next_thread( i, *this );}
+    thread_map::size_type get_unrealized_threads () {return my_thread_map.get_unrealized_threads();}
+};
+
+/* to deal with cases where the machine is oversubscribed; we want each thread to trip to try_process() at least once */
+/* this should not involve computing the_balance */
+bool thread_map::wakeup_next_thread( thread_map::iterator this_thr, tbb_connection_v1& conn ) {
+    if( all_visited_at_least_once ) 
+        return false;
+
+    iterator e = end();
+retry:
+    bool exist = false;
+    iterator k=this_thr; 
+    for( ++k; k!=e; ++k ) {
+        // If another thread added *k, there is a tiny timing window where thread() is invalid.
+        server_thread& t = k->wait_for_thread();
+        if( t.tbb_state!=ts_visited )
+            exist = true;
+        if( t.read_state()!=ts_tbb_busy && t.tbb_state==ts_started )
+            if( t.try_grab_for( ts_tbb_busy ) )
+                return true;
+    }
+    for( k=begin(); k!=this_thr; ++k ) {
+        server_thread& t = k->wait_for_thread();
+        if( t.tbb_state!=ts_visited )
+            exist = true;
+        if( t.read_state()!=ts_tbb_busy && t.tbb_state==ts_started )
+            if( t.try_grab_for( ts_tbb_busy ) )
+                return true;
+    }
+
+    if( exist ) 
+        if( conn.has_slack() )
+            goto retry;
+    else 
+        all_visited_at_least_once = true;
+    return false;
+}
+
+void thread_map::release_tbb_threads( server_thread* t ) {
+    for( ; t; t = t->link ) {
+        while( t->read_state()!=ts_asleep )
+            __TBB_Yield();
+        t->tbb_state = ts_started;
+    }
+}
+
+void thread_map::adjust_balance( int delta ) {
+    int new_balance = the_balance += delta;
+    if( new_balance>0 && 0>=new_balance-delta /*== old the_balance*/ )
+        wakeup_some_tbb_threads();
+}
+
+//------------------------------------------------------------------------
+// OpenMP server
+//------------------------------------------------------------------------
+
+template<>
+struct connection_traits<omp_server,omp_client> {
+    static const request_kind initialize = rk_initialize_omp_job;
+    static const request_kind terminate = rk_terminate_omp_job;
+    static const bool assist_null_only = false;
+    static const bool is_tbb = false;
+};
+
+class omp_connection_v1: public generic_connection<omp_server,omp_client> {
+    /*override*/ int current_balance() const {return the_balance;}
+    /*override*/ int try_increase_load( size_type n, bool strict ); 
+    /*override*/ void decrease_load( size_type n ); 
+    /*override*/ void get_threads( size_type request_size, void* cookie, job* array[] );
+public:
+#if TBB_USE_ASSERT
+    //! Net change in delta caused by this connection.
+    /** Should be zero when connection is broken */
+    tbb::atomic<int> net_delta;
+#endif /* TBB_USE_ASSERT */
+
+    omp_connection_v1( wait_counter& fc, omp_client& client ) : generic_connection<omp_server,omp_client>(fc,client) {
+#if TBB_USE_ASSERT
+        net_delta = 0;
+#endif /* TBB_USE_ASSERT */
+        do_open();
+    }
+    ~omp_connection_v1() {__TBB_ASSERT( net_delta==0, "net increase/decrease of load is nonzero" );}
+};
+
+template<typename Server, typename Client>    
+void generic_connection<Server,Client>::request_close_connection() {
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+// Suppress "conditional expression is constant" warning.
+#pragma warning( push )
+#pragma warning( disable: 4127 ) 
+#endif          
+    if( connection_traits<Server,Client>::is_tbb ) {
+        __TBB_ASSERT( this_tbb_connection==reinterpret_cast<tbb_connection_v1*>(this), NULL );
+        tbb_connection_v1* conn;
+        do {
+            while( (conn=this_tbb_connection)==reinterpret_cast<tbb_connection_v1*>(tbb_connection_v1::c_locked) )
+                __TBB_Yield();
+        } while  ( this_tbb_connection.compare_and_swap(0, conn)!=conn );
+    }
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+#pragma warning( pop )
+#endif
+    my_thread_map.unbind( connection_traits<Server,Client>::terminate );
+    my_thread_map.assist_cleanup( connection_traits<Server,Client>::assist_null_only );
+    // Remove extra reference
+    remove_server_ref();
+}
+
+template<typename Server, typename Client>
+void generic_connection<Server,Client>::make_job( server_thread& t, job_automaton& ja ) {
+    if( ja.try_acquire() ) {
+        rml::job& j = *client().create_one_job();
+        __TBB_ASSERT( &j!=NULL, "client:::create_one_job returned NULL" );
+        __TBB_ASSERT( (intptr_t(&j)&1)==0, "client::create_one_job returned misaligned job" );
+        ja.set_and_release( j );
+        __TBB_ASSERT( t.my_conn && t.my_ja && t.my_job==NULL, NULL );
+        t.my_job  = &j;
+    }
+}
+
+tbb_connection_v1::~tbb_connection_v1() {
+#if TBB_USE_ASSERT
+    if( my_job_count_estimate!=0 ) {
+        fprintf(stderr, "TBB client tried to disconnect with non-zero net job count estimate of %d\n", int(my_job_count_estimate ));
+        abort();
+    }
+    __TBB_ASSERT( !my_slack, "attempt to destroy tbb_server with nonzero slack" );
+    __TBB_ASSERT( this!=this_tbb_connection, "request_close_connection() must be called" );
+#endif /* TBB_USE_ASSERT */
+    // if the next connection has unstarted threads, start one of them.
+    wakeup_some_tbb_threads();
+}
+
+void tbb_connection_v1::adjust_job_count_estimate( int delta ) {
+#if TBB_USE_ASSERT
+    my_job_count_estimate += delta;
+#endif /* TBB_USE_ASSERT */
+    // Atomically update slack.
+    int c = my_slack+=delta;
+    if( c>0 ) {
+        ++n_adjust_job_count_requests;
+        // The client has work to do and there are threads available
+        thread_map::size_type n = my_thread_map.wakeup_tbb_threads(c); 
+
+        server_thread* new_threads_anchor = NULL;
+        thread_map::size_type i;
+        for( i=0; i<n; ++i ) {
+            // Obtain unrealized threads
+            thread_map::value_type* k = my_thread_map.add_one_thread( false );
+            if( !k ) 
+                // No unrealized threads left.
+                break;
+            // eagerly start the thread off.
+            my_thread_map.bind_one_thread( *this, rk_initialize_tbb_job, *k );
+            server_thread& t = k->thread();
+            __TBB_ASSERT( !t.link, NULL );
+            t.link = new_threads_anchor;
+            new_threads_anchor = &t;
+        }
+
+        thread_map::size_type j=0; 
+        for( ; the_balance>0 && j<i; ++j ) {
+            if( --the_balance>=0 ) {
+                // withdraw a coin from the bank
+                __TBB_ASSERT( new_threads_anchor, NULL );
+
+                server_thread* t = new_threads_anchor;
+                new_threads_anchor = t->link;
+                while( !t->try_grab_for( ts_tbb_busy ) ) 
+                    __TBB_Yield();
+                t->tbb_state = ts_started;
+            } else {
+                // overdraft. return it to the bank
+                ++the_balance;
+                break;
+            }
+        }
+        __TBB_ASSERT( i-j!=0||new_threads_anchor==NULL, NULL );
+        // mark the ones that did not get started as eligible for being snatched.
+        if( new_threads_anchor )
+            my_thread_map.release_tbb_threads( new_threads_anchor );
+
+        --n_adjust_job_count_requests;
+    }
+}
+
+//! wake some available tbb threads
+/**
+     First, atomically grab the connection, then increase the server ref count to keep it from being released prematurely.
+     Second, check if the balance is available for TBB and the tbb conneciton has slack to exploit.
+     If the answer is true, go ahead and try to wake some up.
+ */
+void wakeup_some_tbb_threads()
+{
+    for( ;; ) {
+        tbb_connection_v1* conn = this_tbb_connection;
+        /*
+            if( conn==0 or conn==tbb_connection_v1::c_init )
+                the next connection will see my last change to the deposit; do nothing
+            if( conn==tbb_connection_v1::c_locked )
+                a thread is already in the region A-B below. 
+                it will read the change made by threads of my connection to the_balance;
+                do nothing
+
+            0==c_empty, -1==c_init, -2==c_locked
+        */
+        if( ((-ptrdiff_t(conn))&~3 )==0 )
+            return;
+
+        // FIXME: place the_balance next to tbb_this_connection ? to save some cache moves ?
+        /* region A: this is the only place to set this_tbb_connection to c_locked */
+        tbb_connection_v1* old_ttc = this_tbb_connection.compare_and_swap( reinterpret_cast<tbb_connection_v1*>(tbb_connection_v1::c_locked), conn );
+        if( old_ttc==conn ) {
+#if USE_TBB_ASSERT
+            __TBB_ASSERT( conn->add_server_ref()>1, NULL );
+#else
+            conn->add_server_ref();
+#endif
+            /* region B: this is the only place to restore this_tbb_connection from c_locked */
+            this_tbb_connection = conn; // restoring it means releasing it
+
+            /* some threads are creating tbb server threads; they may not see my changes made to the_balance */
+            while( conn->n_adjust_job_count_requests>0 )
+                __TBB_Yield();
+
+            int bal = the_balance;
+            if( bal>0 && conn->has_slack() ) 
+                conn->wakeup_tbb_threads( bal );
+            conn->remove_server_ref();
+            break;
+        } else if( ((-ptrdiff_t(old_ttc))&~3)==0 ) {
+            return; /* see above */
+        } else {
+            __TBB_Yield();
+        }
+    }
+}
+
+int omp_connection_v1::try_increase_load( size_type n, bool strict ) { 
+    __TBB_ASSERT(int(n)>=0,NULL);
+    if( strict ) {
+        the_balance-=int(n);
+    } else {
+        int avail, old;
+        do {
+            avail = the_balance;
+            if( avail<=0 ) {
+                // No atomic read-write-modify operation necessary.
+                return avail;
+            }
+            // don't read the_balance; if it changes, compare_and_swap will fail anyway.
+            old = the_balance.compare_and_swap( int(n)<avail ? avail-n : 0, avail );
+        } while( old!=avail );
+        if( int(n)>avail ) 
+            n=avail;
+    }
+#if TBB_USE_ASSERT
+    net_delta += n;
+#endif /* TBB_USE_ASSERT */
+    return n;
+}
+
+void omp_connection_v1::decrease_load( size_type n ) {
+    __TBB_ASSERT(int(n)>=0,NULL);
+    my_thread_map.adjust_balance(int(n));
+#if TBB_USE_ASSERT
+    net_delta -= n;
+#endif /* TBB_USE_ASSERT */
+}
+
+void omp_connection_v1::get_threads( size_type request_size, void* cookie, job* array[] ) {
+
+    if( !request_size ) 
+        return;
+
+    unsigned index = 0;
+    for(;;) { // don't return until all request_size threads are grabbed.
+        // Need to grab some threads
+        thread_map::iterator k_end=my_thread_map.end();
+        // FIXME - this search is going to be *very* slow when there is a large number of threads and most are in use.
+        // Consider starting search at random point, or high water mark of sorts.
+        for( thread_map::iterator k=my_thread_map.begin(); k!=k_end; ++k ) {
+            // If another thread added *k, there is a tiny timing window where thread() is invalid.
+            server_thread& t = k->wait_for_thread();
+            if( t.try_grab_for( ts_omp_busy ) ) {
+                // The preincrement instead of post-increment of index is deliberate.
+                job& j = k->wait_for_job();
+                    array[index] = &j;
+                t.omp_dispatch.produce( client(), j, cookie, index PRODUCE_ARG(*this) );
+                if( ++index==request_size ) 
+                    return;
+            } 
+        }
+        // Need to allocate more threads
+        for( unsigned i=index; i<request_size; ++i ) {
+            __TBB_ASSERT( index<request_size, NULL );
+            thread_map::value_type* k = my_thread_map.add_one_thread( true );
+            if( !k ) {
+                // Client erred
+                fprintf(stderr,"server::get_threads: exceeded job_count\n");
+                __TBB_ASSERT(false, NULL);
+            }
+            my_thread_map.bind_one_thread( *this, rk_initialize_omp_job, *k );
+            server_thread& t = k->thread();
+            if( t.try_grab_for( ts_omp_busy ) ) {
+                job& j = k->wait_for_job();
+                array[index] = &j;
+                // The preincrement instead of post-increment of index is deliberate.
+                t.omp_dispatch.produce( client(), j, cookie, index PRODUCE_ARG(*this) );
+                if( ++index==request_size ) 
+                    return;
+            } // else someone else snatched it.
+        }
+    }
+}
+
+//------------------------------------------------------------------------
+// Methods of omp_dispatch_type
+//------------------------------------------------------------------------
+void omp_dispatch_type::consume() {
+    job_type* j; 
+    // Wait for short window between when master sets state of this thread to ts_omp_busy
+    // and master thread calls produce.
+    // FIXME - this is a very short spin while the producer is setting fields of *this, 
+    // but nonetheless the loop should probably use exponential backoff, or at least pause instructions.
+    do {
+        j = job;
+    } while( !j );
+    job = static_cast<job_type*>(NULL);
+    client->process(*j,cookie,index);
+#if TBB_USE_ASSERT
+    // Return of method process implies "decrease_load" from client's viewpoint, even though
+    // the actual adjustment of the_balance only happens when this thread really goes to sleep.
+    --server->net_delta;
+#endif /* TBB_USE_ASSERT */
+}
+
+//------------------------------------------------------------------------
+// Methods of server_thread
+//------------------------------------------------------------------------
+
+server_thread::server_thread() : 
+    ref_count(0),
+    link(NULL), // FIXME: remove when all fixes are done.
+    my_map_pos(),
+    my_conn(NULL), my_job(NULL), my_ja(NULL)
+{
+    state = ts_idle;
+#if TBB_USE_ASSERT
+    has_active_thread = false;
+#endif /* TBB_USE_ASSERT */
+}
+
+server_thread::~server_thread() {
+    __TBB_ASSERT( !has_active_thread, NULL );
+}
+
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+    // Suppress overzealous compiler warnings about an initialized variable 'sink_for_alloca' not referenced
+    #pragma warning(push)
+    #pragma warning(disable:4189)
+#endif
+__RML_DECL_THREAD_ROUTINE server_thread::thread_routine( void* arg ) {
+    server_thread* self = static_cast<server_thread*>(arg);
+    AVOID_64K_ALIASING( self->my_index );
+#if TBB_USE_ASSERT
+    __TBB_ASSERT( !self->has_active_thread, NULL );
+    self->has_active_thread = true;
+#endif /* TBB_USE_ASSERT */
+    self->loop();
+    return NULL;
+}
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+    #pragma warning(pop)
+#endif
+
+void server_thread::launch( size_t stack_size ) {
+    thread_monitor::launch( thread_routine, this, stack_size );
+}
+
+void server_thread::sleep_perhaps( thread_state_t asleep ) {
+    __TBB_ASSERT( asleep==ts_asleep, NULL );
+    thread_monitor::cookie c; 
+    monitor.prepare_wait(c);
+    if( state.compare_and_swap( asleep, ts_idle )==ts_idle ) {
+        if( request==rk_none ) {
+            monitor.commit_wait(c);
+            // Someone else woke me up.  The compare_and_swap further below deals with spurious wakeups.
+        } else {
+            monitor.cancel_wait();
+        }
+        // Following compare-and-swap logic tries to transition from asleep to idle while both ignoring the
+        // preserving the reserved_flag bit in state, because some other thread may be asynchronously clearing
+        // the reserved_flag bit within state.
+        thread_state_t s = read_state();
+        if( s==ts_asleep ) {
+            state.compare_and_swap( ts_idle, ts_asleep );
+            // I woke myself up, either because I cancelled the wait or suffered a spurious wakeup.
+        } else {
+            // Someone else woke me up; there the_balance is decremented by 1. -- tbb only
+            if( !is_omp_thread ) {
+                __TBB_ASSERT( state==ts_tbb_busy||state==ts_idle, NULL );
+            }
+        }
+    } else {
+        // someone else made it busy ; see try_grab_for when state==ts_idle.
+        __TBB_ASSERT( state==ts_omp_busy||state==ts_tbb_busy, NULL );
+        monitor.cancel_wait();
+    }
+    __TBB_ASSERT( read_state()!=asleep, "a thread can only put itself to sleep" );
+}
+
+bool server_thread::wakeup( thread_state_t to, thread_state_t from ) {
+    bool success = false;
+    __TBB_ASSERT( from==ts_asleep && (to==ts_idle||to==ts_omp_busy||to==ts_tbb_busy), NULL );
+    if( state.compare_and_swap( to, from )==from ) {
+        if( !is_omp_thread ) __TBB_ASSERT( to==ts_idle||to==ts_tbb_busy, NULL );
+        // There is a small timing window that permits balance to become negative,
+        // but such occurrences are probably rare enough to not worry about, since
+        // at worst the result is slight temporary oversubscription.
+        monitor.notify();
+        success = true;
+    } 
+    return success;
+}
+
+//! Attempt to change a thread's state to ts_omp_busy, and waking it up if necessary. 
+bool server_thread::try_grab_for( thread_state_t target_state ) {
+    bool success = false;
+    switch( read_state() ) {
+        case ts_asleep: 
+            success = wakeup( target_state, ts_asleep );
+            break;
+        case ts_idle:
+            success = state.compare_and_swap( target_state, ts_idle )==ts_idle;
+            break;
+        default:
+            // Thread is not available to be part of an OpenMP thread team.
+            break;
+    }
+    return success;
+}
+
+template<typename Connection>
+bool server_thread::destroy_job( Connection& c ) {
+    __TBB_ASSERT( !is_omp_thread||state==ts_idle, NULL );
+    __TBB_ASSERT( is_omp_thread||(state==ts_idle||state==ts_tbb_busy), NULL );
+    if( !is_omp_thread ) {
+        __TBB_ASSERT( state==ts_idle||state==ts_tbb_busy, NULL );
+        if( state==ts_idle )
+            state.compare_and_swap( ts_done, ts_idle );
+        // 'state' may be set to ts_tbb_busy by another thread..
+
+        if( state==ts_tbb_busy ) { // return the coin to the deposit
+            // need to deposit first to let the next connection see the change
+            ++the_balance;
+            state = ts_done; // no other thread changes the state when it is ts_*_busy
+        }
+    }
+    if( job_automaton* ja = my_ja ) {
+        rml::job* j;
+        if( ja->try_plug(j) ) {
+            __TBB_ASSERT( j, NULL );
+            c.client().cleanup(*j);
+            c.remove_client_ref();
+        } else {
+            // Some other thread took responsibility for cleaning up the job.
+        }
+    }
+    //! Must do remove client reference first, because execution of c.remove_ref() can cause *this to be destroyed.
+    int k = remove_ref();
+    __TBB_ASSERT_EX( k==0, "more than one references?" );
+#if TBB_USE_ASSERT
+    has_active_thread = false;
+#endif /* TBB_USE_ASSERT */
+    c.remove_server_ref();
+    return true;
+}
+
+bool server_thread::process_requests() {
+    __TBB_ASSERT( request!=rk_none, "should only be called when at least one request is present" );
+    do {
+        request_kind my_req = request;
+        request.compare_and_swap( rk_none, my_req );
+        switch( my_req ) {
+            case rk_initialize_tbb_job: 
+                static_cast<tbb_connection_v1*>(my_conn)->make_job( *this, *my_ja );
+                break;
+            
+            case rk_initialize_omp_job: 
+                static_cast<omp_connection_v1*>(my_conn)->make_job( *this, *my_ja );
+                break;
+
+            case rk_terminate_tbb_job:
+                if( destroy_job( *static_cast<tbb_connection_v1*>(my_conn) ) )
+                    return true;
+                break; 
+
+            case rk_terminate_omp_job:
+                if( destroy_job( *static_cast<omp_connection_v1*>(my_conn) ) )
+                    return true;
+                break; 
+            default:
+                break;
+         }
+    } while( request!=rk_none );
+    return false;
+}
+
+//! Loop that each thread executes
+void server_thread::loop() {
+    for(;;) {
+        __TBB_Yield();
+        if( state==ts_idle )
+            sleep_perhaps( ts_asleep );   
+
+        // Drain mailbox before reading the state.
+        if( request!=rk_none ) 
+            if( process_requests() )
+                return;     
+             
+        // read the state after draining the mail box
+        thread_state_t s = read_state();
+        __TBB_ASSERT( s==ts_idle||s==ts_omp_busy||s==ts_tbb_busy, NULL );
+
+        if( s==ts_omp_busy ) {
+            // Enslaved by OpenMP team.  
+            omp_dispatch.consume();
+            /* here wake a tbb thread up if feasible */
+            int bal = ++the_balance;
+            if( bal>0 )
+                wakeup_some_tbb_threads();
+            state = ts_idle;
+        } else if( s==ts_tbb_busy ) {
+            // do some TBB work.
+            __TBB_ASSERT( my_conn && my_job, NULL );
+            tbb_connection_v1& conn = *static_cast<tbb_connection_v1*>(my_conn);
+            // give openmp higher priority
+            bool has_coin = true;
+            while( has_coin && conn.has_slack() && the_balance>=0 ) {
+                if( conn.try_process(*my_job) ) {
+                    tbb_state = ts_visited;
+                    if( conn.has_slack() && the_balance>=0 )
+                        has_coin = !conn.wakeup_next_thread( my_map_pos );
+                }
+            }
+            state = ts_idle;
+            if( has_coin ) {
+                ++the_balance; // return the coin back to the deposit
+                if( conn.has_slack() ) { // a new adjust_job_request_estimate() is in progress
+                                         // it may have missed my changes to state and/or the_balance
+                    int bal = --the_balance; // try to grab the coin back
+                    if( bal>=0 ) { // I got the coin
+                        if( state.compare_and_swap( ts_tbb_busy, ts_idle )!=ts_idle )
+                            ++the_balance; // someone else enlisted me.
+                    } else {
+                        // overdraft. return the coin
+                        ++the_balance;
+                    }
+                } // else the new request will see my changes to state & the_balance.
+            }
+        }
+    }
+}
+
+template<typename Connection, typename Server, typename Client>
+static factory::status_type connect( factory& f, Server*& server, Client& client ) {
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+// Suppress "conditional expression is constant" warning.
+#pragma warning( push )
+#pragma warning( disable: 4127 ) 
+#endif          
+    if( connection_traits<Server,Client>::is_tbb )
+        if( this_tbb_connection.compare_and_swap(reinterpret_cast<tbb_connection_v1*>(-1), reinterpret_cast<tbb_connection_v1*>(0))!=0 )
+            return factory::st_connection_exists;
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+#pragma warning( pop )
+#endif
+    server = new Connection(*static_cast<wait_counter*>(f.scratch_ptr),client);
+    return factory::st_success; 
+}
+
+extern "C" factory::status_type __RML_open_factory( factory& f, version_type& server_version, version_type client_version ) {
+    // Hack to keep this library from being closed by causing the first client's dlopen to not have a corresponding dlclose. 
+    // This code will be removed once we figure out how to do shutdown of the RML perfectly.
+    static tbb::atomic<bool> one_time_flag;
+    if( one_time_flag.compare_and_swap(true,false)==false) {
+        f.library_handle = NULL;
+    }
+    // End of hack
+
+    // initialize the_balance only once
+    if( the_balance_inited==0 ) {
+        if( the_balance_inited.compare_and_swap( 1, 0 )==0 ) {
+            the_balance = hardware_concurrency()-1;
+            the_balance_inited = 2;
+        } else {
+            tbb::internal::spin_wait_until_eq( the_balance_inited, 2 );
+        }
+    }
+
+    server_version = SERVER_VERSION;
+    f.scratch_ptr = 0;
+    if( client_version==0 ) {
+        return factory::st_incompatible;
+    } else {
+        f.scratch_ptr = new wait_counter;
+        return factory::st_success;
+    }
+}
+
+extern "C" void __RML_close_factory( factory& f ) {
+    if( wait_counter* fc = static_cast<wait_counter*>(f.scratch_ptr) ) {
+        f.scratch_ptr = 0;
+        fc->wait();
+        delete fc;
+    }
+}
+
+void call_with_build_date_str( ::rml::server_info_callback_t cb, void* arg );
+
+}} // rml::internal 
+
+namespace tbb {
+namespace internal {
+namespace rml {
+
+extern "C" tbb_factory::status_type __TBB_make_rml_server( tbb_factory& f, tbb_server*& server, tbb_client& client ) {
+    return ::rml::internal::connect< ::rml::internal::tbb_connection_v1>(f,server,client);
+}
+
+extern "C" void __TBB_call_with_my_server_info( ::rml::server_info_callback_t cb, void* arg ) {
+    return ::rml::internal::call_with_build_date_str( cb, arg );
+}
+
+}}}
+
+namespace __kmp {
+namespace rml {
+
+extern "C" omp_factory::status_type __KMP_make_rml_server( omp_factory& f, omp_server*& server, omp_client& client ) {
+    return ::rml::internal::connect< ::rml::internal::omp_connection_v1>(f,server,client);
+}
+
+extern "C" void __KMP_call_with_my_server_info( ::rml::server_info_callback_t cb, void* arg ) {
+    return ::rml::internal::call_with_build_date_str( cb, arg );
+}
+
+}}
+
+/*
+ * RML server info
+ */
+#include "version_string.tmp"
+
+#ifndef __TBB_VERSION_STRINGS
+#pragma message("Warning: version_string.tmp isn't generated properly by version_info.sh script!")
+#endif
+
+// We pass the build time as the RML server info.  TBB is required to build RML, so we make it the same as the TBB build time.
+#ifndef __TBB_DATETIME
+#define __TBB_DATETIME __DATE__ " " __TIME__
+#endif
+#define RML_SERVER_INFO "Intel(R) RML library built: " __TBB_DATETIME
+
+namespace rml {
+namespace internal {
+void call_with_build_date_str( ::rml::server_info_callback_t cb, void* arg )
+{
+    (*cb)( arg, RML_SERVER_INFO );
+}
+}} // rml::internal 
diff --git a/dep/tbb/src/rml/server/thread_monitor.h b/dep/tbb/src/rml/server/thread_monitor.h
new file mode 100644
index 000000000..607188bfa
--- /dev/null
+++ b/dep/tbb/src/rml/server/thread_monitor.h
@@ -0,0 +1,244 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+// All platform-specific threading support is encapsulated here. */
+ 
+#ifndef __RML_thread_monitor_H
+#define __RML_thread_monitor_H
+
+#if USE_WINTHREAD
+#include <windows.h>
+#include <process.h>
+#include <malloc.h> //_alloca
+#elif USE_PTHREAD
+#include <pthread.h>
+#include <string.h>
+#include <stdlib.h>
+#else
+#error Unsupported platform
+#endif 
+#include <stdio.h>
+
+// All platform-specific threading support is in this header.
+
+#if (_WIN32||_WIN64)&&!__TBB_ipf
+// Deal with 64K aliasing.  The formula for "offset" is a Fibonacci hash function,
+// which has the desirable feature of spreading out the offsets fairly evenly
+// without knowing the total number of offsets, and furthermore unlikely to
+// accidentally cancel out other 64K aliasing schemes that Microsoft might implement later.
+// See Knuth Vol 3. "Theorem S" for details on Fibonacci hashing.
+// The second statement is really does need "volatile", otherwise the compiler might remove the _alloca.
+#define AVOID_64K_ALIASING(idx)                       \
+    size_t offset = (idx+1) * 40503U % (1U<<16);      \
+    void* volatile sink_for_alloca = _alloca(offset); \
+    __TBB_ASSERT_EX(sink_for_alloca, "_alloca failed");
+#else
+// Linux thread allocators avoid 64K aliasing.
+#define AVOID_64K_ALIASING(idx)
+#endif /* _WIN32||_WIN64 */
+
+namespace rml {
+
+namespace internal {
+
+//! Monitor with limited two-phase commit form of wait.  
+/** At most one thread should wait on an instance at a time. */
+class thread_monitor {
+public:
+    class cookie {
+        friend class thread_monitor;
+        unsigned long long my_version;
+    };
+    thread_monitor();
+    ~thread_monitor();
+
+    //! If a thread is waiting or started a two-phase wait, notify it.
+    /** Can be called by any thread. */
+    void notify();
+
+    //! Begin two-phase wait.
+    /** Should only be called by thread that owns the monitor. 
+        The caller must either complete the wait or cancel it. */
+    void prepare_wait( cookie& c );
+
+    //! Complete a two-phase wait and wait until notification occurs after the earlier prepare_wait.
+    void commit_wait( cookie& c );
+
+    //! Cancel a two-phase wait.
+    void cancel_wait();
+
+#if USE_WINTHREAD
+#define __RML_DECL_THREAD_ROUTINE unsigned WINAPI
+    typedef unsigned (WINAPI *thread_routine_type)(void*);
+#endif /* USE_WINTHREAD */
+
+#if USE_PTHREAD
+#define __RML_DECL_THREAD_ROUTINE void*
+    typedef void*(*thread_routine_type)(void*);
+#endif /* USE_PTHREAD */
+
+    //! Launch a thread
+    static void launch( thread_routine_type thread_routine, void* arg, size_t stack_size );
+    static void yield();
+
+private:
+    cookie my_cookie;
+#if USE_WINTHREAD
+    CRITICAL_SECTION critical_section;
+    HANDLE event;
+#endif /* USE_WINTHREAD */
+#if USE_PTHREAD
+    pthread_mutex_t my_mutex;
+    pthread_cond_t my_cond;
+    static void check( int error_code, const char* routine );
+#endif /* USE_PTHREAD */
+};
+
+
+#if USE_WINTHREAD
+#ifndef STACK_SIZE_PARAM_IS_A_RESERVATION
+#define STACK_SIZE_PARAM_IS_A_RESERVATION 0x00010000
+#endif
+inline void thread_monitor::launch( thread_routine_type thread_routine, void* arg, size_t stack_size ) {
+    unsigned thread_id;
+    uintptr_t status = _beginthreadex( NULL, unsigned(stack_size), thread_routine, arg, STACK_SIZE_PARAM_IS_A_RESERVATION, &thread_id );
+    if( status==0 ) {
+        fprintf(stderr,"thread_monitor::launch: _beginthreadex failed\n");
+        exit(1); 
+    } else {
+        CloseHandle((HANDLE)status);
+    }
+}
+
+inline void thread_monitor::yield() {
+    SwitchToThread();
+}
+
+inline thread_monitor::thread_monitor() {
+    event = CreateEvent( NULL, /*manualReset=*/true, /*initialState=*/false, NULL );
+    InitializeCriticalSection( &critical_section );
+    my_cookie.my_version = 0;
+}
+
+inline thread_monitor::~thread_monitor() {
+    CloseHandle( event );
+    DeleteCriticalSection( &critical_section );
+}
+     
+inline void thread_monitor::notify() {
+    EnterCriticalSection( &critical_section );
+    ++my_cookie.my_version;
+    SetEvent( event );
+    LeaveCriticalSection( &critical_section );
+}
+
+inline void thread_monitor::prepare_wait( cookie& c ) {
+    EnterCriticalSection( &critical_section );
+    c = my_cookie;
+}
+
+inline void thread_monitor::commit_wait( cookie& c ) {
+    ResetEvent( event );
+    LeaveCriticalSection( &critical_section );
+    while( my_cookie.my_version==c.my_version ) {
+        WaitForSingleObject( event, INFINITE );
+        ResetEvent( event );
+    }
+}
+
+inline void thread_monitor::cancel_wait() {
+    LeaveCriticalSection( &critical_section );
+}
+#endif /* USE_WINTHREAD */
+
+#if USE_PTHREAD
+inline void thread_monitor::check( int error_code, const char* routine ) {
+    if( error_code ) {
+        fprintf(stderr,"thread_monitor %s\n", strerror(error_code) );
+        exit(1);
+    }
+}
+
+inline void thread_monitor::launch( void* (*thread_routine)(void*), void* arg, size_t stack_size ) {
+    // FIXME - consider more graceful recovery than just exiting if a thread cannot be launched.
+    // Note that there are some tricky situations to deal with, such that the thread is already 
+    // grabbed as part of an OpenMP team, or is being launched as a replacement for a thread with
+    // too small a stack.
+    pthread_attr_t s;
+    check(pthread_attr_init( &s ), "pthread_attr_init");
+    if( stack_size>0 ) {
+        check(pthread_attr_setstacksize( &s, stack_size ),"pthread_attr_setstack_size");
+    }
+    pthread_t handle;
+    check( pthread_create( &handle, &s, thread_routine, arg ), "pthread_create" );
+    check( pthread_detach( handle ), "pthread_detach" );
+}
+
+inline void thread_monitor::yield() {
+    sched_yield();
+}
+
+inline thread_monitor::thread_monitor() {
+    check( pthread_mutex_init(&my_mutex,NULL), "pthread_mutex_init" );
+    check( pthread_cond_init(&my_cond,NULL), "pthread_cond_init" );
+    my_cookie.my_version = 0;
+}
+
+inline thread_monitor::~thread_monitor() {
+    pthread_cond_destroy(&my_cond);
+    pthread_mutex_destroy(&my_mutex);
+}
+
+inline void thread_monitor::notify() {
+    check( pthread_mutex_lock( &my_mutex ), "pthread_mutex_lock" );
+    ++my_cookie.my_version;
+    check( pthread_mutex_unlock( &my_mutex ), "pthread_mutex_unlock" );
+    check( pthread_cond_signal(&my_cond), "pthread_cond_signal" );
+}
+
+inline void thread_monitor::prepare_wait( cookie& c ) {
+    check( pthread_mutex_lock( &my_mutex ), "pthread_mutex_lock" );
+    c = my_cookie;
+}
+
+inline void thread_monitor::commit_wait( cookie& c ) {
+    while( my_cookie.my_version==c.my_version ) {
+        pthread_cond_wait( &my_cond, &my_mutex );
+    }
+    check( pthread_mutex_unlock( &my_mutex ), "pthread_mutex_unlock" );
+}
+
+inline void thread_monitor::cancel_wait() {
+    check( pthread_mutex_unlock( &my_mutex ), "pthread_mutex_unlock" );
+}
+#endif /* USE_PTHREAD */
+
+} // namespace internal
+} // namespace rml
+
+#endif /* __RML_thread_monitor_H */
diff --git a/dep/tbb/src/rml/server/wait_counter.h b/dep/tbb/src/rml/server/wait_counter.h
new file mode 100644
index 000000000..0951f9797
--- /dev/null
+++ b/dep/tbb/src/rml/server/wait_counter.h
@@ -0,0 +1,81 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __RML_wait_counter_H
+#define __RML_wait_counter_H
+
+#include "thread_monitor.h"
+#include "tbb/atomic.h"
+
+namespace rml {
+namespace internal {
+
+class wait_counter {
+    thread_monitor my_monitor;
+    tbb::atomic<int> my_count;
+    tbb::atomic<int> n_transients;
+public:
+    wait_counter() { 
+        // The "1" here is subtracted by the call to "wait".
+        my_count=1;
+        n_transients=0;
+    }
+
+    //! Wait for number of operator-- invocations to match number of operator++ invocations.
+    /** Exactly one thread should call this method. */
+    void wait() {
+        int k = --my_count;
+        __TBB_ASSERT( k>=0, "counter underflow" );
+        if( k>0 ) {
+            thread_monitor::cookie c;
+            my_monitor.prepare_wait(c);
+            if( my_count )
+                my_monitor.commit_wait(c);
+            else 
+                my_monitor.cancel_wait();
+        }
+        while( n_transients>0 )
+            __TBB_Yield();
+    }
+    void operator++() {
+        ++my_count;
+    }
+    void operator--() {
+        ++n_transients;
+        int k = --my_count;
+        __TBB_ASSERT( k>=0, "counter underflow" );
+        if( k==0 ) 
+            my_monitor.notify();
+        --n_transients;
+    }
+};
+
+} // namespace internal
+} // namespace rml
+
+#endif /* __RML_wait_counter_H */
diff --git a/dep/tbb/src/rml/server/win32-rml-export.def b/dep/tbb/src/rml/server/win32-rml-export.def
new file mode 100644
index 000000000..54be4b16e
--- /dev/null
+++ b/dep/tbb/src/rml/server/win32-rml-export.def
@@ -0,0 +1,35 @@
+; Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+;
+; This file is part of Threading Building Blocks.
+;
+; Threading Building Blocks is free software; you can redistribute it
+; and/or modify it under the terms of the GNU General Public License
+; version 2 as published by the Free Software Foundation.
+;
+; Threading Building Blocks is distributed in the hope that it will be
+; useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+; of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+; GNU General Public License for more details.
+;
+; You should have received a copy of the GNU General Public License
+; along with Threading Building Blocks; if not, write to the Free Software
+; Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+;
+; As a special exception, you may use this file as part of a free software
+; library without restriction.  Specifically, if other files instantiate
+; templates or use macros or inline functions from this file, or you compile
+; this file and link it with other files to produce an executable, this
+; file does not by itself cause the resulting executable to be covered by
+; the GNU General Public License.  This exception does not however
+; invalidate any other reasons why the executable file might be covered by
+; the GNU General Public License.
+
+EXPORTS
+
+__RML_open_factory
+__RML_close_factory
+__TBB_make_rml_server
+__KMP_make_rml_server
+__TBB_call_with_my_server_info
+__KMP_call_with_my_server_info
+
diff --git a/dep/tbb/src/rml/server/win64-rml-export.def b/dep/tbb/src/rml/server/win64-rml-export.def
new file mode 100644
index 000000000..54be4b16e
--- /dev/null
+++ b/dep/tbb/src/rml/server/win64-rml-export.def
@@ -0,0 +1,35 @@
+; Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+;
+; This file is part of Threading Building Blocks.
+;
+; Threading Building Blocks is free software; you can redistribute it
+; and/or modify it under the terms of the GNU General Public License
+; version 2 as published by the Free Software Foundation.
+;
+; Threading Building Blocks is distributed in the hope that it will be
+; useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+; of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+; GNU General Public License for more details.
+;
+; You should have received a copy of the GNU General Public License
+; along with Threading Building Blocks; if not, write to the Free Software
+; Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+;
+; As a special exception, you may use this file as part of a free software
+; library without restriction.  Specifically, if other files instantiate
+; templates or use macros or inline functions from this file, or you compile
+; this file and link it with other files to produce an executable, this
+; file does not by itself cause the resulting executable to be covered by
+; the GNU General Public License.  This exception does not however
+; invalidate any other reasons why the executable file might be covered by
+; the GNU General Public License.
+
+EXPORTS
+
+__RML_open_factory
+__RML_close_factory
+__TBB_make_rml_server
+__KMP_make_rml_server
+__TBB_call_with_my_server_info
+__KMP_call_with_my_server_info
+
diff --git a/dep/tbb/src/tbb/cache_aligned_allocator.cpp b/dep/tbb/src/tbb/cache_aligned_allocator.cpp
new file mode 100644
index 000000000..18e3d13cf
--- /dev/null
+++ b/dep/tbb/src/tbb/cache_aligned_allocator.cpp
@@ -0,0 +1,329 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "tbb/cache_aligned_allocator.h"
+#include "tbb/tbb_allocator.h"
+#include "tbb_misc.h"
+#include "dynamic_link.h"
+#include <cstdlib>
+
+#if _WIN32||_WIN64
+#include <windows.h>
+#else
+#include <dlfcn.h>
+#endif /* _WIN32||_WIN64 */
+
+using namespace std;
+
+#if __TBB_WEAK_SYMBOLS
+
+#pragma weak scalable_malloc
+#pragma weak scalable_free
+
+extern "C" {
+    void* scalable_malloc( size_t );
+    void scalable_free( void* );
+}
+
+#endif /* __TBB_WEAK_SYMBOLS */
+
+#define __TBB_IS_SCALABLE_MALLOC_FIX_READY 0
+
+namespace tbb {
+
+namespace internal {
+
+//! Dummy routine used for first indirect call via MallocHandler.
+static void* DummyMalloc( size_t size );
+
+//! Dummy routine used for first indirect call via FreeHandler.
+static void DummyFree( void * ptr );
+
+//! Handler for memory allocation
+static void* (*MallocHandler)( size_t size ) = &DummyMalloc;
+
+//! Handler for memory deallocation
+static void (*FreeHandler)( void* pointer ) = &DummyFree;
+
+//! Table describing the how to link the handlers.
+static const dynamic_link_descriptor MallocLinkTable[] = {
+    DLD(scalable_malloc, MallocHandler),
+    DLD(scalable_free, FreeHandler),
+};
+
+#if __TBB_IS_SCALABLE_MALLOC_FIX_READY 
+//! Dummy routine used for first indirect call via padded_allocate_handler.
+static void* dummy_padded_allocate( size_t bytes, size_t alignment );
+
+//! Dummy routine used for first indirect call via padded_free_handler.
+static void dummy_padded_free( void * ptr );
+
+// ! Allocates memory using standard malloc. It is used when scalable_allocator is not available
+static void* padded_allocate( size_t bytes, size_t alignment );
+
+// ! Allocates memory using scalable_malloc
+static void* padded_allocate_via_scalable_malloc( size_t bytes, size_t alignment );
+
+// ! Allocates memory using standard free. It is used when scalable_allocator is not available
+static void padded_free( void* p );
+
+//! Handler for padded memory allocation
+static void* (*padded_allocate_handler)( size_t bytes, size_t alignment ) = &dummy_padded_allocate;
+
+//! Handler for padded memory deallocation
+static void (*padded_free_handler)( void* p ) = &dummy_padded_free;
+
+#endif // #if __TBB_IS_SCALABLE_MALLOC_FIX_READY 
+
+
+#if TBB_USE_DEBUG
+#define DEBUG_SUFFIX "_debug"
+#else
+#define DEBUG_SUFFIX
+#endif /* TBB_USE_DEBUG */
+
+// MALLOCLIB_NAME is the name of the TBB memory allocator library.
+#if _WIN32||_WIN64
+#define MALLOCLIB_NAME "tbbmalloc" DEBUG_SUFFIX ".dll"
+#elif __APPLE__
+#define MALLOCLIB_NAME "libtbbmalloc" DEBUG_SUFFIX ".dylib"
+#elif __linux__
+#define MALLOCLIB_NAME "libtbbmalloc" DEBUG_SUFFIX  __TBB_STRING(.so.TBB_COMPATIBLE_INTERFACE_VERSION)
+#elif __FreeBSD__ || __sun
+#define MALLOCLIB_NAME "libtbbmalloc" DEBUG_SUFFIX ".so"
+#else
+#error Unknown OS
+#endif
+
+//! Initialize the allocation/free handler pointers.
+/** Caller is responsible for ensuring this routine is called exactly once.
+    The routine attempts to dynamically link with the TBB memory allocator.
+    If that allocator is not found, it links to malloc and free. */
+void initialize_cache_aligned_allocator() {
+    __TBB_ASSERT( MallocHandler==&DummyMalloc, NULL );
+    bool success = dynamic_link( MALLOCLIB_NAME, MallocLinkTable, 2 );
+    if( !success ) {
+        // If unsuccessful, set the handlers to the default routines.
+        // This must be done now, and not before FillDynanmicLinks runs, because if other
+        // threads call the handlers, we want them to go through the DoOneTimeInitializations logic,
+        // which forces them to wait.
+        FreeHandler = &free;
+        MallocHandler = &malloc;
+#if __TBB_IS_SCALABLE_MALLOC_FIX_READY 
+        padded_allocate_handler = &padded_allocate;
+        padded_free_handler = &padded_free;
+    }else{
+        padded_allocate_handler = &padded_allocate_via_scalable_malloc;
+        __TBB_ASSERT(FreeHandler != &free && FreeHandler != &DummyFree, NULL);
+        padded_free_handler = FreeHandler;
+#endif // __TBB_IS_SCALABLE_MALLOC_FIX_READY 
+    }
+#if !__TBB_RML_STATIC
+    PrintExtraVersionInfo( "ALLOCATOR", success?"scalable_malloc":"malloc" );
+#endif
+}
+
+//! Defined in task.cpp
+extern void DoOneTimeInitializations();
+
+//! Executed on very first call through MallocHandler
+static void* DummyMalloc( size_t size ) {
+    DoOneTimeInitializations();
+    __TBB_ASSERT( MallocHandler!=&DummyMalloc, NULL );
+    return (*MallocHandler)( size );
+}
+
+//! Executed on very first call throught FreeHandler
+static void DummyFree( void * ptr ) {
+    DoOneTimeInitializations();
+    __TBB_ASSERT( FreeHandler!=&DummyFree, NULL );
+    (*FreeHandler)( ptr );
+}
+
+#if __TBB_IS_SCALABLE_MALLOC_FIX_READY 
+//! Executed on very first call through padded_allocate_handler
+static void* dummy_padded_allocate( size_t bytes, size_t alignment ) {
+    DoOneTimeInitializations();
+    __TBB_ASSERT( padded_allocate_handler!=&dummy_padded_allocate, NULL );
+    return (*padded_allocate_handler)(bytes, alignment);
+}
+
+//! Executed on very first call throught padded_free_handler
+static void dummy_padded_free( void * ptr ) {
+    DoOneTimeInitializations();
+    __TBB_ASSERT( padded_free_handler!=&dummy_padded_free, NULL );
+    (*padded_free_handler)( ptr );
+}    
+#endif // __TBB_IS_SCALABLE_MALLOC_FIX_READY 
+
+static size_t NFS_LineSize = 128;
+
+size_t NFS_GetLineSize() {
+    return NFS_LineSize;
+}
+
+//! Requests for blocks this size and higher are handled via malloc/free,
+const size_t BigSize = 4096;
+
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+    // unary minus operator applied to unsigned type, result still unsigned
+    #pragma warning( disable: 4146 4706 )
+#endif
+
+void* NFS_Allocate( size_t n, size_t element_size, void* /*hint*/ ) {
+    size_t m = NFS_LineSize;
+    __TBB_ASSERT( m<=NFS_MaxLineSize, "illegal value for NFS_LineSize" );
+    __TBB_ASSERT( (m & m-1)==0, "must be power of two" );
+    size_t bytes = n*element_size;
+#if __TBB_IS_SCALABLE_MALLOC_FIX_READY 
+
+    if (bytes<n || bytes+m<bytes) {
+        // Overflow
+        throw bad_alloc();
+    }
+    
+    void* result = (*padded_allocate_handler)( bytes, m );
+#else
+    unsigned char* base;
+    if( bytes<n || bytes+m<bytes || !(base=(unsigned char*)(bytes>=BigSize?malloc(m+bytes):(*MallocHandler)(m+bytes))) ) {
+        // Overflow
+        throw bad_alloc();
+    }
+    // Round up to next line
+    unsigned char* result = (unsigned char*)((uintptr)(base+m)&-m);
+    // Record where block actually starts.  Use low order bit to record whether we used malloc or MallocHandler.
+    ((uintptr*)result)[-1] = uintptr(base)|(bytes>=BigSize);
+#endif // __TBB_IS_SCALABLE_MALLOC_FIX_READY    
+    /** The test may fail with TBB_IS_SCALABLE_MALLOC_FIX_READY = 1 
+        because scalable_malloc returns addresses aligned to 64 when large block is allocated */
+    __TBB_ASSERT( ((size_t)result&(m-1)) == 0, "The address returned isn't aligned to cache line size" );
+    return result;
+}
+
+void NFS_Free( void* p ) {
+#if __TBB_IS_SCALABLE_MALLOC_FIX_READY 
+    (*padded_free_handler)( p );
+#else
+    if( p ) {
+        __TBB_ASSERT( (uintptr)p>=0x4096, "attempt to free block not obtained from cache_aligned_allocator" );
+        // Recover where block actually starts
+        unsigned char* base = ((unsigned char**)p)[-1];
+        __TBB_ASSERT( (void*)((uintptr)(base+NFS_LineSize)&-NFS_LineSize)==p, "not allocated by NFS_Allocate?" );
+        if( uintptr(base)&1 ) {
+            // Is a big block - use free
+            free(base-1);
+        } else {
+            // Is a small block - use scalable allocator
+            (*FreeHandler)( base );
+        }
+    }
+#endif // __TBB_IS_SCALABLE_MALLOC_FIX_READY
+}
+
+#if __TBB_IS_SCALABLE_MALLOC_FIX_READY
+static void* padded_allocate_via_scalable_malloc( size_t bytes, size_t alignment  ) {  
+    unsigned char* base;
+    if( !(base=(unsigned char*)(*MallocHandler)((bytes+alignment)&-alignment))) {
+        throw bad_alloc();
+    }        
+    return base; // scalable_malloc returns aligned pointer
+}
+
+static void* padded_allocate( size_t bytes, size_t alignment ) {    
+    unsigned char* base;
+    if( !(base=(unsigned char*)malloc(alignment+bytes)) ) {        
+        throw bad_alloc();
+    }
+    // Round up to the next line
+    unsigned char* result = (unsigned char*)((uintptr)(base+alignment)&-alignment);
+    // Record where block actually starts.
+    ((uintptr*)result)[-1] = uintptr(base);
+    return result;    
+}
+
+static void padded_free( void* p ) {
+    if( p ) {
+        __TBB_ASSERT( (uintptr)p>=0x4096, "attempt to free block not obtained from cache_aligned_allocator" );
+        // Recover where block actually starts
+        unsigned char* base = ((unsigned char**)p)[-1];
+        __TBB_ASSERT( (void*)((uintptr)(base+NFS_LineSize)&-NFS_LineSize)==p, "not allocated by NFS_Allocate?" );
+        free(base);
+    }
+}
+#endif // #if __TBB_IS_SCALABLE_MALLOC_FIX_READY
+
+void* __TBB_EXPORTED_FUNC allocate_via_handler_v3( size_t n ) {    
+    void* result;
+    result = (*MallocHandler) (n);
+    if (!result) {
+        // Overflow
+        throw bad_alloc();
+    }
+    return result;
+}
+
+void __TBB_EXPORTED_FUNC deallocate_via_handler_v3( void *p ) {
+    if( p ) {        
+        (*FreeHandler)( p );
+    }
+}
+
+bool __TBB_EXPORTED_FUNC is_malloc_used_v3() {
+    if (MallocHandler == &DummyMalloc) {
+        void* void_ptr = (*MallocHandler)(1);
+        (*FreeHandler)(void_ptr);
+    }
+    __TBB_ASSERT( MallocHandler!=&DummyMalloc && FreeHandler!=&DummyFree, NULL );
+    __TBB_ASSERT(MallocHandler==&malloc && FreeHandler==&free ||
+                  MallocHandler!=&malloc && FreeHandler!=&free, NULL );
+    return MallocHandler == &malloc;
+}
+
+} // namespace internal
+
+} // namespace tbb
+
+#if __TBB_RML_STATIC
+#include "tbb/atomic.h"
+static tbb::atomic<int> module_inited;
+namespace tbb {
+namespace internal {
+void DoOneTimeInitializations() {
+    if( module_inited!=2 ) {
+        if( module_inited.compare_and_swap(1, 0)==0 ) {
+            initialize_cache_aligned_allocator();
+            module_inited = 2;
+        } else {
+            do {
+                __TBB_Yield();
+            } while( module_inited!=2 );
+        }
+    }
+}
+}} //namespace tbb::internal
+#endif
diff --git a/dep/tbb/src/tbb/concurrent_hash_map.cpp b/dep/tbb/src/tbb/concurrent_hash_map.cpp
new file mode 100644
index 000000000..d3937102c
--- /dev/null
+++ b/dep/tbb/src/tbb/concurrent_hash_map.cpp
@@ -0,0 +1,66 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "tbb/concurrent_hash_map.h"
+
+namespace tbb {
+
+namespace internal {
+#if !TBB_NO_LEGACY
+struct hash_map_segment_base {
+    typedef spin_rw_mutex segment_mutex_t;
+    //! Type of a hash code.
+    typedef size_t hashcode_t;
+    //! Log2 of n_segment
+    static const size_t n_segment_bits = 6;
+    //! Maximum size of array of chains
+    static const size_t max_physical_size = size_t(1)<<(8*sizeof(hashcode_t)-n_segment_bits);
+    //! Mutex that protects this segment
+    segment_mutex_t my_mutex;
+    // Number of nodes
+    atomic<size_t> my_logical_size;
+    // Size of chains
+    /** Always zero or a power of two */
+    size_t my_physical_size;
+    //! True if my_logical_size>=my_physical_size.
+    /** Used to support Intel(R) Thread Checker. */
+    bool __TBB_EXPORTED_METHOD internal_grow_predicate() const;
+};
+
+bool hash_map_segment_base::internal_grow_predicate() const {
+    // Intel(R) Thread Checker considers the following reads to be races, so we hide them in the 
+    // library so that Intel(R) Thread Checker will ignore them.  The reads are used in a double-check
+    // context, so the program is nonetheless correct despite the race.
+    return my_logical_size >= my_physical_size && my_physical_size < max_physical_size;
+}
+#endif//!TBB_NO_LEGACY
+
+} // namespace internal
+
+} // namespace tbb
+
diff --git a/dep/tbb/src/tbb/concurrent_queue.cpp b/dep/tbb/src/tbb/concurrent_queue.cpp
new file mode 100644
index 000000000..33ce5910b
--- /dev/null
+++ b/dep/tbb/src/tbb/concurrent_queue.cpp
@@ -0,0 +1,841 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include <cstring>   // for memset()
+#include "tbb/tbb_stddef.h"
+#include "tbb/tbb_machine.h"
+#include "tbb/_concurrent_queue_internal.h"
+#include "itt_notify.h"
+#include <new>
+#if _WIN32||_WIN64
+#include <windows.h>
+#endif
+using namespace std;
+
+// enable sleep support
+#define __TBB_NO_BUSY_WAIT_IN_CONCURRENT_QUEUE 1
+
+#if defined(_MSC_VER) && defined(_Wp64)
+    // Workaround for overzealous compiler warnings in /Wp64 mode
+    #pragma warning (disable: 4267)
+#endif
+
+#define RECORD_EVENTS 0
+
+
+#if __TBB_NO_BUSY_WAIT_IN_CONCURRENT_QUEUE
+#if !_WIN32&&!_WIN64
+#include <pthread.h>
+#endif
+#endif
+
+namespace tbb {
+
+namespace internal {
+
+typedef concurrent_queue_base_v3 concurrent_queue_base;
+
+typedef size_t ticket;
+
+//! A queue using simple locking.
+/** For efficient, this class has no constructor.  
+    The caller is expected to zero-initialize it. */
+struct micro_queue {
+    typedef concurrent_queue_base::page page;
+
+    friend class micro_queue_pop_finalizer;
+
+    atomic<page*> head_page;
+    atomic<ticket> head_counter;
+
+    atomic<page*> tail_page;
+    atomic<ticket> tail_counter;
+
+    spin_mutex page_mutex;
+    
+    void push( const void* item, ticket k, concurrent_queue_base& base );
+
+    bool pop( void* dst, ticket k, concurrent_queue_base& base );
+
+    micro_queue& assign( const micro_queue& src, concurrent_queue_base& base );
+
+    page* make_copy ( concurrent_queue_base& base, const page* src_page, size_t begin_in_page, size_t end_in_page, ticket& g_index ) ;
+
+    void make_invalid( ticket k );
+};
+
+// we need to yank it out of micro_queue because of concurrent_queue_base::deallocate_page being virtual.
+class micro_queue_pop_finalizer: no_copy {
+    typedef concurrent_queue_base::page page;
+    ticket my_ticket;
+    micro_queue& my_queue;
+    page* my_page; 
+    concurrent_queue_base &base;
+public:
+    micro_queue_pop_finalizer( micro_queue& queue, concurrent_queue_base& b, ticket k, page* p ) :
+        my_ticket(k), my_queue(queue), my_page(p), base(b)
+    {}
+    ~micro_queue_pop_finalizer() {
+        page* p = my_page;
+        if( p ) {
+            spin_mutex::scoped_lock lock( my_queue.page_mutex );
+            page* q = p->next;
+            my_queue.head_page = q;
+            if( !q ) {
+                my_queue.tail_page = NULL;
+            }
+        }
+        my_queue.head_counter = my_ticket;
+        if( p )
+           base.deallocate_page( p );
+    }
+};
+
+//! Internal representation of a ConcurrentQueue.
+/** For efficient, this class has no constructor.  
+    The caller is expected to zero-initialize it. */
+class concurrent_queue_rep {
+public:
+#if __TBB_NO_BUSY_WAIT_IN_CONCURRENT_QUEUE
+# if _WIN32||_WIN64
+    typedef HANDLE waitvar_t;
+    typedef CRITICAL_SECTION mutexvar_t;
+# else 
+    typedef pthread_cond_t  waitvar_t;
+    typedef pthread_mutex_t mutexvar_t;
+# endif
+#endif /* __TBB_NO_BUSY_WAIT_IN_CONCURRENT_QUEUE */
+private:
+    friend struct micro_queue;
+
+    //! Approximately n_queue/golden ratio
+    static const size_t phi = 3;
+
+public:
+    //! Must be power of 2
+    static const size_t n_queue = 8; 
+
+    //! Map ticket to an array index
+    static size_t index( ticket k ) {
+        return k*phi%n_queue;
+    }
+
+#if __TBB_NO_BUSY_WAIT_IN_CONCURRENT_QUEUE
+    atomic<ticket> head_counter;
+    waitvar_t  var_wait_for_items;
+    mutexvar_t mtx_items_avail;
+    atomic<size_t> n_invalid_entries;
+    atomic<uint32_t> n_waiting_consumers;
+#if _WIN32||_WIN64
+    uint32_t consumer_wait_generation;
+    uint32_t n_consumers_to_wakeup;
+    char pad1[NFS_MaxLineSize-((sizeof(atomic<ticket>)+sizeof(waitvar_t)+sizeof(mutexvar_t)+sizeof(atomic<size_t>)+sizeof(atomic<uint32_t>)+sizeof(uint32_t)+sizeof(uint32_t))&(NFS_MaxLineSize-1))];
+#else
+    char pad1[NFS_MaxLineSize-((sizeof(atomic<ticket>)+sizeof(waitvar_t)+sizeof(mutexvar_t)+sizeof(atomic<size_t>)+sizeof(atomic<uint32_t>))&(NFS_MaxLineSize-1))];
+#endif
+
+    atomic<ticket> tail_counter;
+    waitvar_t  var_wait_for_slots;
+    mutexvar_t mtx_slots_avail;
+    atomic<uint32_t> n_waiting_producers;
+#if _WIN32||_WIN64
+    uint32_t producer_wait_generation;
+    uint32_t n_producers_to_wakeup;
+    char pad2[NFS_MaxLineSize-((sizeof(atomic<ticket>)+sizeof(waitvar_t)+sizeof(mutexvar_t)+sizeof(atomic<uint32_t>)+sizeof(uint32_t)+sizeof(uint32_t))&(NFS_MaxLineSize-1))];
+#else
+    char pad2[NFS_MaxLineSize-((sizeof(atomic<ticket>)+sizeof(waitvar_t)+sizeof(mutexvar_t)+sizeof(atomic<uint32_t>))&(NFS_MaxLineSize-1))];
+#endif
+#else /* !__TBB_NO_BUSY_WAIT_IN_CONCURRENT_QUEUE */
+    atomic<ticket> head_counter;
+    atomic<size_t> n_invalid_entries;
+    char pad1[NFS_MaxLineSize-sizeof(atomic<ticket>)-sizeof(atomic<size_t>)];
+    atomic<ticket> tail_counter;
+    char pad2[NFS_MaxLineSize-sizeof(atomic<ticket>)];
+#endif /* __TBB_NO_BUSY_WAIT_IN_CONCURRENT_QUEUE */
+    micro_queue array[n_queue];    
+
+    micro_queue& choose( ticket k ) {
+        // The formula here approximates LRU in a cache-oblivious way.
+        return array[index(k)];
+    }
+
+    //! Value for effective_capacity that denotes unbounded queue.
+    static const ptrdiff_t infinite_capacity = ptrdiff_t(~size_t(0)/2);
+};
+
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+    // unary minus operator applied to unsigned type, result still unsigned
+    #pragma warning( push )
+    #pragma warning( disable: 4146 )
+#endif
+
+static void* invalid_page;
+
+//------------------------------------------------------------------------
+// micro_queue
+//------------------------------------------------------------------------
+void micro_queue::push( const void* item, ticket k, concurrent_queue_base& base ) {
+    k &= -concurrent_queue_rep::n_queue;
+    page* p = NULL;
+    size_t index = k/concurrent_queue_rep::n_queue & (base.items_per_page-1);
+    if( !index ) {
+        try {
+            p = base.allocate_page();
+        } catch (...) {
+            ++base.my_rep->n_invalid_entries;
+            make_invalid( k );
+        }
+        p->mask = 0;
+        p->next = NULL;
+    }
+
+    if( tail_counter!=k ) {
+        atomic_backoff backoff;
+        do {
+            backoff.pause();
+            // no memory. throws an exception; assumes concurrent_queue_rep::n_queue>1
+            if( tail_counter&0x1 ) {
+                ++base.my_rep->n_invalid_entries;
+                throw bad_last_alloc();
+            }
+        } while( tail_counter!=k ) ;
+    }
+        
+    if( p ) {
+        spin_mutex::scoped_lock lock( page_mutex );
+        if( page* q = tail_page )
+            q->next = p;
+        else
+            head_page = p; 
+        tail_page = p;
+    } else {
+        p = tail_page;
+    }
+    ITT_NOTIFY( sync_acquired, p );
+
+    try {
+        base.copy_item( *p, index, item );
+        ITT_NOTIFY( sync_releasing, p );
+        // If no exception was thrown, mark item as present.
+        p->mask |= uintptr(1)<<index;
+        tail_counter += concurrent_queue_rep::n_queue; 
+    } catch (...) {
+        ++base.my_rep->n_invalid_entries;
+        tail_counter += concurrent_queue_rep::n_queue; 
+        throw;
+    }
+}
+
+bool micro_queue::pop( void* dst, ticket k, concurrent_queue_base& base ) {
+    k &= -concurrent_queue_rep::n_queue;
+    spin_wait_until_eq( head_counter, k );
+    spin_wait_while_eq( tail_counter, k );
+    page& p = *head_page;
+    __TBB_ASSERT( &p, NULL );
+    size_t index = k/concurrent_queue_rep::n_queue & (base.items_per_page-1);
+    bool success = false; 
+    {
+        micro_queue_pop_finalizer finalizer( *this, base, k+concurrent_queue_rep::n_queue, index==base.items_per_page-1 ? &p : NULL ); 
+        if( p.mask & uintptr(1)<<index ) {
+            success = true;
+#if DO_ITT_NOTIFY
+            if( ((intptr_t)dst&1) ) {
+                dst = (void*) ((intptr_t)dst&~1);
+                ITT_NOTIFY( sync_acquired, dst );
+            }
+#endif
+            ITT_NOTIFY( sync_acquired, head_page );
+            base.assign_and_destroy_item( dst, p, index );
+            ITT_NOTIFY( sync_releasing, head_page );
+        } else {
+            --base.my_rep->n_invalid_entries;
+        }
+    }
+    return success;
+}
+
+micro_queue& micro_queue::assign( const micro_queue& src, concurrent_queue_base& base )
+{
+    head_counter = src.head_counter;
+    tail_counter = src.tail_counter;
+    page_mutex   = src.page_mutex;
+
+    const page* srcp = src.head_page;
+    if( srcp ) {
+        ticket g_index = head_counter;
+        try {
+            size_t n_items  = (tail_counter-head_counter)/concurrent_queue_rep::n_queue;
+            size_t index = head_counter/concurrent_queue_rep::n_queue & (base.items_per_page-1);
+            size_t end_in_first_page = (index+n_items<base.items_per_page)?(index+n_items):base.items_per_page;
+
+            head_page = make_copy( base, srcp, index, end_in_first_page, g_index );
+            page* cur_page = head_page;
+
+            if( srcp != src.tail_page ) {
+                for( srcp = srcp->next; srcp!=src.tail_page; srcp=srcp->next ) {
+                    cur_page->next = make_copy( base, srcp, 0, base.items_per_page, g_index );
+                    cur_page = cur_page->next;
+                }
+
+                __TBB_ASSERT( srcp==src.tail_page, NULL );
+
+                size_t last_index = tail_counter/concurrent_queue_rep::n_queue & (base.items_per_page-1);
+                if( last_index==0 ) last_index = base.items_per_page;
+
+                cur_page->next = make_copy( base, srcp, 0, last_index, g_index );
+                cur_page = cur_page->next;
+            }
+            tail_page = cur_page;
+        } catch (...) {
+            make_invalid( g_index );
+        }
+    } else {
+        head_page = tail_page = NULL;
+    }
+    return *this;
+}
+
+concurrent_queue_base::page* micro_queue::make_copy( concurrent_queue_base& base, const concurrent_queue_base::page* src_page, size_t begin_in_page, size_t end_in_page, ticket& g_index )
+{
+    page* new_page = base.allocate_page();
+    new_page->next = NULL;
+    new_page->mask = src_page->mask;
+    for( ; begin_in_page!=end_in_page; ++begin_in_page, ++g_index )
+        if( new_page->mask & uintptr(1)<<begin_in_page )
+            base.copy_page_item( *new_page, begin_in_page, *src_page, begin_in_page );
+    return new_page;
+}
+
+void micro_queue::make_invalid( ticket k )
+{
+    static concurrent_queue_base::page dummy = {static_cast<page*>((void*)1), 0};
+    // mark it so that no more pushes are allowed.
+    invalid_page = &dummy;
+    {
+        spin_mutex::scoped_lock lock( page_mutex );
+        tail_counter = k+concurrent_queue_rep::n_queue+1;
+        if( page* q = tail_page )
+            q->next = static_cast<page*>(invalid_page);
+        else
+            head_page = static_cast<page*>(invalid_page); 
+        tail_page = static_cast<page*>(invalid_page);
+    }
+    throw;
+}
+
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+    #pragma warning( pop )
+#endif // warning 4146 is back
+
+//------------------------------------------------------------------------
+// concurrent_queue_base
+//------------------------------------------------------------------------
+concurrent_queue_base_v3::concurrent_queue_base_v3( size_t item_size ) {
+    items_per_page = item_size<=8 ? 32 :
+                     item_size<=16 ? 16 : 
+                     item_size<=32 ? 8 :
+                     item_size<=64 ? 4 :
+                     item_size<=128 ? 2 :
+                     1;
+    my_capacity = size_t(-1)/(item_size>1 ? item_size : 2); 
+    my_rep = cache_aligned_allocator<concurrent_queue_rep>().allocate(1);
+    __TBB_ASSERT( (size_t)my_rep % NFS_GetLineSize()==0, "alignment error" );
+    __TBB_ASSERT( (size_t)&my_rep->head_counter % NFS_GetLineSize()==0, "alignment error" );
+    __TBB_ASSERT( (size_t)&my_rep->tail_counter % NFS_GetLineSize()==0, "alignment error" );
+    __TBB_ASSERT( (size_t)&my_rep->array % NFS_GetLineSize()==0, "alignment error" );
+    memset(my_rep,0,sizeof(concurrent_queue_rep));
+    this->item_size = item_size;
+#if __TBB_NO_BUSY_WAIT_IN_CONCURRENT_QUEUE
+#if _WIN32||_WIN64
+    my_rep->var_wait_for_items = CreateEvent( NULL, TRUE/*manual reset*/, FALSE/*not signalled initially*/, NULL);
+    my_rep->var_wait_for_slots = CreateEvent( NULL, TRUE/*manual reset*/, FALSE/*not signalled initially*/, NULL);
+    InitializeCriticalSection( &my_rep->mtx_items_avail );
+    InitializeCriticalSection( &my_rep->mtx_slots_avail );
+#else 
+    // initialize pthread_mutex_t, and pthread_cond_t
+    pthread_mutexattr_t m_attr;
+    pthread_mutexattr_init( &m_attr );
+#if defined(PTHREAD_PRIO_INHERIT) && !__TBB_PRIO_INHERIT_BROKEN
+    pthread_mutexattr_setprotocol( &m_attr, PTHREAD_PRIO_INHERIT );
+#endif
+    pthread_mutex_init( &my_rep->mtx_items_avail, &m_attr );
+    pthread_mutex_init( &my_rep->mtx_slots_avail, &m_attr );
+    pthread_mutexattr_destroy( &m_attr );
+
+    pthread_condattr_t c_attr;
+    pthread_condattr_init( &c_attr );
+    pthread_cond_init( &my_rep->var_wait_for_items, &c_attr );
+    pthread_cond_init( &my_rep->var_wait_for_slots, &c_attr );
+    pthread_condattr_destroy( &c_attr );
+#endif
+#endif /* __TBB_NO_BUSY_WAIT_IN_CONCURRENT_QUEUE */
+}
+
+concurrent_queue_base_v3::~concurrent_queue_base_v3() {
+    size_t nq = my_rep->n_queue;
+    for( size_t i=0; i<nq; i++ )
+        __TBB_ASSERT( my_rep->array[i].tail_page==NULL, "pages were not freed properly" );
+#if __TBB_NO_BUSY_WAIT_IN_CONCURRENT_QUEUE
+# if _WIN32||_WIN64
+    CloseHandle( my_rep->var_wait_for_items );
+    CloseHandle( my_rep->var_wait_for_slots );
+    DeleteCriticalSection( &my_rep->mtx_items_avail );
+    DeleteCriticalSection( &my_rep->mtx_slots_avail );
+# else
+    pthread_mutex_destroy( &my_rep->mtx_items_avail );
+    pthread_mutex_destroy( &my_rep->mtx_slots_avail );
+    pthread_cond_destroy( &my_rep->var_wait_for_items );
+    pthread_cond_destroy( &my_rep->var_wait_for_slots );
+# endif
+#endif /* __TBB_NO_BUSY_WAIT_IN_CONCURRENT_QUEUE */
+    cache_aligned_allocator<concurrent_queue_rep>().deallocate(my_rep,1);
+}
+
+void concurrent_queue_base_v3::internal_push( const void* src ) {
+    concurrent_queue_rep& r = *my_rep;
+#if !__TBB_NO_BUSY_WAIT_IN_CONCURRENT_QUEUE
+    ticket k = r.tail_counter++;
+    ptrdiff_t e = my_capacity;
+    if( e<concurrent_queue_rep::infinite_capacity ) {
+        atomic_backoff backoff;
+        for(;;) {
+            if( (ptrdiff_t)(k-r.head_counter)<e ) break;
+            backoff.pause();
+            e = const_cast<volatile ptrdiff_t&>(my_capacity);
+        }
+    } 
+    r.choose(k).push(src,k,*this);
+#elif _WIN32||_WIN64
+    ticket k = r.tail_counter++;
+    ptrdiff_t e = my_capacity;
+    atomic_backoff backoff;
+#if DO_ITT_NOTIFY
+    bool sync_prepare_done = false;
+#endif
+
+    while( (ptrdiff_t)(k-r.head_counter)>=e ) {
+#if DO_ITT_NOTIFY
+        if( !sync_prepare_done ) {
+            ITT_NOTIFY( sync_prepare, &sync_prepare_done );
+            sync_prepare_done = true;
+        }
+#endif
+        if( !backoff.bounded_pause() ) {
+            EnterCriticalSection( &r.mtx_slots_avail );
+            r.n_waiting_producers++;
+            while( (ptrdiff_t)(k-r.head_counter)>=const_cast<volatile ptrdiff_t&>(my_capacity) ) {
+                uint32_t my_generation = r.producer_wait_generation;
+                for( ;; ) {
+                    LeaveCriticalSection( &r.mtx_slots_avail );
+                    WaitForSingleObject( r.var_wait_for_slots, INFINITE );
+                    EnterCriticalSection( &r.mtx_slots_avail );
+                    if( r.n_producers_to_wakeup > 0 && r.producer_wait_generation != my_generation )
+                        break;
+                }
+                if( --r.n_producers_to_wakeup == 0 )
+                    ResetEvent( r.var_wait_for_slots );
+            }
+            --r.n_waiting_producers;
+            LeaveCriticalSection( &r.mtx_slots_avail );
+            break;
+        }
+        e = const_cast<volatile ptrdiff_t&>(my_capacity);
+    }
+#if DO_ITT_NOTIFY
+    if( sync_prepare_done )
+        ITT_NOTIFY( sync_acquired, &sync_prepare_done );
+#endif
+
+    r.choose( k ).push( src, k, *this );
+
+    if( r.n_waiting_consumers>0 ) {
+        EnterCriticalSection( &r.mtx_items_avail );
+        if( r.n_waiting_consumers>0 ) {
+            r.consumer_wait_generation++;
+            r.n_consumers_to_wakeup = r.n_waiting_consumers;
+            SetEvent( r.var_wait_for_items );
+        }
+        LeaveCriticalSection( &r.mtx_items_avail );
+    }
+#else 
+    ticket k = r.tail_counter++;
+    ptrdiff_t e = my_capacity;
+    atomic_backoff backoff;
+#if DO_ITT_NOTIFY
+    bool sync_prepare_done = false;
+#endif
+    while( (ptrdiff_t)(k-r.head_counter)>=e ) {
+#if DO_ITT_NOTIFY
+        if( !sync_prepare_done ) {
+            ITT_NOTIFY( sync_prepare, &sync_prepare_done );
+            sync_prepare_done = true;
+        }
+#endif
+        if( !backoff.bounded_pause() ) {
+            // queue is full.  go to sleep. let them go to sleep in order.
+            pthread_mutex_lock( &r.mtx_slots_avail );
+            r.n_waiting_producers++;
+            while( (ptrdiff_t)(k-r.head_counter)>=const_cast<volatile ptrdiff_t&>(my_capacity) ) {
+                pthread_cond_wait( &r.var_wait_for_slots, &r.mtx_slots_avail );
+            }
+            --r.n_waiting_producers;
+            pthread_mutex_unlock( &r.mtx_slots_avail );
+            break;
+        }
+        e = const_cast<volatile ptrdiff_t&>(my_capacity);
+    }
+#if DO_ITT_NOTIFY
+    if( sync_prepare_done )
+        ITT_NOTIFY( sync_acquired, &sync_prepare_done );
+#endif
+    r.choose( k ).push( src, k, *this );
+
+    if( r.n_waiting_consumers>0 ) {
+        pthread_mutex_lock( &r.mtx_items_avail );
+        // pthread_cond_broadcast() wakes up all consumers. 
+        if( r.n_waiting_consumers>0 )
+            pthread_cond_broadcast( &r.var_wait_for_items );
+        pthread_mutex_unlock( &r.mtx_items_avail );
+    }
+#endif /* !__TBB_NO_BUSY_WAIT_IN_CONCURRENT_QUEUE */
+}
+
+void concurrent_queue_base_v3::internal_pop( void* dst ) {
+    concurrent_queue_rep& r = *my_rep;
+#if !__TBB_NO_BUSY_WAIT_IN_CONCURRENT_QUEUE
+    ticket k;
+    do {
+        k = r.head_counter++;
+    } while( !r.choose(k).pop(dst,k,*this) );
+#elif _WIN32||_WIN64
+    ticket k;
+    atomic_backoff backoff;
+#if DO_ITT_NOTIFY
+    bool sync_prepare_done = false;
+#endif
+    do {
+        k=r.head_counter++;
+        while( r.tail_counter<=k ) {
+#if DO_ITT_NOTIFY
+            if( !sync_prepare_done ) {
+                ITT_NOTIFY( sync_prepare, dst );
+                dst = (void*) ((intptr_t)dst | 1);
+                sync_prepare_done = true;
+            }
+#endif
+            // Queue is empty; pause and re-try a few times
+            if( !backoff.bounded_pause() ) {
+                // it is really empty.. go to sleep
+                EnterCriticalSection( &r.mtx_items_avail );
+                r.n_waiting_consumers++;
+                while( r.tail_counter<=k ) {
+                    uint32_t my_generation = r.consumer_wait_generation;
+                    for( ;; ) {
+                        LeaveCriticalSection( &r.mtx_items_avail );
+                        WaitForSingleObject( r.var_wait_for_items, INFINITE );
+                        EnterCriticalSection( &r.mtx_items_avail );
+                        if( r.n_consumers_to_wakeup > 0 && r.consumer_wait_generation != my_generation )
+                            break;
+                    }
+                    if( --r.n_consumers_to_wakeup == 0 )
+                        ResetEvent( r.var_wait_for_items );
+                }
+                --r.n_waiting_consumers;
+                LeaveCriticalSection( &r.mtx_items_avail );
+                backoff.reset();
+                break; // break from inner while
+            }
+        } // break to here
+    } while( !r.choose(k).pop(dst,k,*this) );
+
+    // wake up a producer..
+    if( r.n_waiting_producers>0 ) {
+        EnterCriticalSection( &r.mtx_slots_avail );
+        if( r.n_waiting_producers>0 ) {
+            r.producer_wait_generation++;
+            r.n_producers_to_wakeup = r.n_waiting_producers;
+            SetEvent( r.var_wait_for_slots );
+        }
+        LeaveCriticalSection( &r.mtx_slots_avail );
+    }
+#else 
+    ticket k;
+    atomic_backoff backoff;
+#if DO_ITT_NOTIFY
+    bool sync_prepare_done = false;
+#endif
+    do {
+        k = r.head_counter++;
+        while( r.tail_counter<=k ) {
+#if DO_ITT_NOTIFY
+            if( !sync_prepare_done ) {
+                ITT_NOTIFY( sync_prepare, dst );
+                dst = (void*) ((intptr_t)dst | 1);
+                sync_prepare_done = true;
+            }
+#endif
+            // Queue is empty; pause and re-try a few times
+            if( !backoff.bounded_pause() ) {
+                // it is really empty.. go to sleep
+                pthread_mutex_lock( &r.mtx_items_avail );
+                r.n_waiting_consumers++;
+                while( r.tail_counter<=k )
+                    pthread_cond_wait( &r.var_wait_for_items, &r.mtx_items_avail );
+                --r.n_waiting_consumers;
+                pthread_mutex_unlock( &r.mtx_items_avail );
+                backoff.reset();
+                break;
+            }
+        }
+    } while( !r.choose(k).pop(dst,k,*this) );
+
+    if( r.n_waiting_producers>0 ) {
+        pthread_mutex_lock( &r.mtx_slots_avail );
+        if( r.n_waiting_producers>0 )
+            pthread_cond_broadcast( &r.var_wait_for_slots );
+        pthread_mutex_unlock( &r.mtx_slots_avail );
+    }
+#endif /* !__TBB_NO_BUSY_WAIT_IN_CONCURRENT_QUEUE */
+}
+
+bool concurrent_queue_base_v3::internal_pop_if_present( void* dst ) {
+    concurrent_queue_rep& r = *my_rep;
+    ticket k;
+    do {
+        k = r.head_counter;
+        for(;;) {
+            if( r.tail_counter<=k ) {
+                // Queue is empty 
+                return false;
+            }
+            // Queue had item with ticket k when we looked.  Attempt to get that item.
+            ticket tk=k;
+            k = r.head_counter.compare_and_swap( tk+1, tk );
+            if( k==tk )
+                break;
+            // Another thread snatched the item, retry.
+        }
+    } while( !r.choose( k ).pop( dst, k, *this ) );
+
+#if __TBB_NO_BUSY_WAIT_IN_CONCURRENT_QUEUE
+#if _WIN32||_WIN64
+    // wake up a producer..
+    if( r.n_waiting_producers>0 ) {
+        EnterCriticalSection( &r.mtx_slots_avail );
+        if( r.n_waiting_producers>0 ) {
+            r.producer_wait_generation++;
+            r.n_producers_to_wakeup = r.n_waiting_producers;
+            SetEvent( r.var_wait_for_slots );
+        }
+        LeaveCriticalSection( &r.mtx_slots_avail );
+    }
+#else /* including MacOS */
+    if( r.n_waiting_producers>0 ) {
+        pthread_mutex_lock( &r.mtx_slots_avail );
+        if( r.n_waiting_producers>0 )
+            pthread_cond_broadcast( &r.var_wait_for_slots );
+        pthread_mutex_unlock( &r.mtx_slots_avail );
+    }
+#endif
+#endif /* __TBB_NO_BUSY_WAIT_IN_CONCURRENT_QUEUE */
+
+    return true;
+}
+
+bool concurrent_queue_base_v3::internal_push_if_not_full( const void* src ) {
+    concurrent_queue_rep& r = *my_rep;
+    ticket k = r.tail_counter;
+    for(;;) {
+        if( (ptrdiff_t)(k-r.head_counter)>=my_capacity ) {
+            // Queue is full
+            return false;
+        }
+        // Queue had empty slot with ticket k when we looked.  Attempt to claim that slot.
+        ticket tk=k;
+        k = r.tail_counter.compare_and_swap( tk+1, tk );
+        if( k==tk ) 
+            break;
+        // Another thread claimed the slot, so retry. 
+    }
+    r.choose(k).push(src,k,*this);
+
+#if __TBB_NO_BUSY_WAIT_IN_CONCURRENT_QUEUE
+#if _WIN32||_WIN64
+    if( r.n_waiting_consumers>0 ) {
+        EnterCriticalSection( &r.mtx_items_avail );
+        if( r.n_waiting_consumers>0 ) {
+            r.consumer_wait_generation++;
+            r.n_consumers_to_wakeup = r.n_waiting_consumers;
+            SetEvent( r.var_wait_for_items );
+        }
+        LeaveCriticalSection( &r.mtx_items_avail );
+    }
+#else /* including MacOS */
+    if( r.n_waiting_consumers>0 ) {
+        pthread_mutex_lock( &r.mtx_items_avail );
+        if( r.n_waiting_consumers>0 )
+            pthread_cond_broadcast( &r.var_wait_for_items );
+        pthread_mutex_unlock( &r.mtx_items_avail );
+    }
+#endif
+#endif /* __TBB_NO_BUSY_WAIT_IN_CONCURRENT_QUEUE */
+    return true;
+}
+
+ptrdiff_t concurrent_queue_base_v3::internal_size() const {
+    __TBB_ASSERT( sizeof(ptrdiff_t)<=sizeof(size_t), NULL );
+    return ptrdiff_t(my_rep->tail_counter-my_rep->head_counter-my_rep->n_invalid_entries);
+}
+
+bool concurrent_queue_base_v3::internal_empty() const {
+    ticket tc = my_rep->tail_counter;
+    ticket hc = my_rep->head_counter;
+    // if tc!=r.tail_counter, the queue was not empty at some point between the two reads.
+    return ( tc==my_rep->tail_counter && ptrdiff_t(tc-hc-my_rep->n_invalid_entries)<=0 );
+}
+
+void concurrent_queue_base_v3::internal_set_capacity( ptrdiff_t capacity, size_t /*item_size*/ ) {
+    my_capacity = capacity<0 ? concurrent_queue_rep::infinite_capacity : capacity;
+}
+
+void concurrent_queue_base_v3::internal_finish_clear() {
+    size_t nq = my_rep->n_queue;
+    for( size_t i=0; i<nq; ++i ) {
+        page* tp = my_rep->array[i].tail_page;
+        __TBB_ASSERT( my_rep->array[i].head_page==tp, "at most one page should remain" );
+        if( tp!=NULL) {
+            if( tp!=invalid_page ) deallocate_page( tp );
+            my_rep->array[i].tail_page = NULL;
+        }
+    }
+}
+
+void concurrent_queue_base_v3::internal_throw_exception() const {
+    throw bad_alloc();
+}
+
+void concurrent_queue_base_v3::assign( const concurrent_queue_base& src ) {
+    items_per_page = src.items_per_page;
+    my_capacity = src.my_capacity;
+
+    // copy concurrent_queue_rep.
+    my_rep->head_counter = src.my_rep->head_counter;
+    my_rep->tail_counter = src.my_rep->tail_counter;
+    my_rep->n_invalid_entries = src.my_rep->n_invalid_entries;
+
+    // copy micro_queues
+    for( size_t i = 0; i<my_rep->n_queue; ++i )
+        my_rep->array[i].assign( src.my_rep->array[i], *this);
+
+    __TBB_ASSERT( my_rep->head_counter==src.my_rep->head_counter && my_rep->tail_counter==src.my_rep->tail_counter, 
+            "the source concurrent queue should not be concurrently modified." );
+}
+
+//------------------------------------------------------------------------
+// concurrent_queue_iterator_rep
+//------------------------------------------------------------------------
+class concurrent_queue_iterator_rep: no_assign {
+public:
+    ticket head_counter;   
+    const concurrent_queue_base& my_queue;
+    concurrent_queue_base::page* array[concurrent_queue_rep::n_queue];
+    concurrent_queue_iterator_rep( const concurrent_queue_base& queue ) : 
+        head_counter(queue.my_rep->head_counter),
+        my_queue(queue)
+    {
+        const concurrent_queue_rep& rep = *queue.my_rep;
+        for( size_t k=0; k<concurrent_queue_rep::n_queue; ++k )
+            array[k] = rep.array[k].head_page;
+    }
+    //! Set item to point to kth element.  Return true if at end of queue or item is marked valid; false otherwise.
+    bool get_item( void*& item, size_t k ) {
+        if( k==my_queue.my_rep->tail_counter ) {
+            item = NULL;
+            return true;
+        } else {
+            concurrent_queue_base::page* p = array[concurrent_queue_rep::index(k)];
+            __TBB_ASSERT(p,NULL);
+            size_t i = k/concurrent_queue_rep::n_queue & (my_queue.items_per_page-1);
+            item = static_cast<unsigned char*>(static_cast<void*>(p+1)) + my_queue.item_size*i;
+            return (p->mask & uintptr(1)<<i)!=0;
+        }
+    }
+};
+
+//------------------------------------------------------------------------
+// concurrent_queue_iterator_base
+//------------------------------------------------------------------------
+concurrent_queue_iterator_base_v3::concurrent_queue_iterator_base_v3( const concurrent_queue_base& queue ) {
+    my_rep = cache_aligned_allocator<concurrent_queue_iterator_rep>().allocate(1);
+    new( my_rep ) concurrent_queue_iterator_rep(queue);
+    size_t k = my_rep->head_counter;
+    if( !my_rep->get_item(my_item, k) ) advance();
+}
+
+void concurrent_queue_iterator_base_v3::assign( const concurrent_queue_iterator_base& other ) {
+    if( my_rep!=other.my_rep ) {
+        if( my_rep ) {
+            cache_aligned_allocator<concurrent_queue_iterator_rep>().deallocate(my_rep, 1);
+            my_rep = NULL;
+        }
+        if( other.my_rep ) {
+            my_rep = cache_aligned_allocator<concurrent_queue_iterator_rep>().allocate(1);
+            new( my_rep ) concurrent_queue_iterator_rep( *other.my_rep );
+        }
+    }
+    my_item = other.my_item;
+}
+
+void concurrent_queue_iterator_base_v3::advance() {
+    __TBB_ASSERT( my_item, "attempt to increment iterator past end of queue" );  
+    size_t k = my_rep->head_counter;
+    const concurrent_queue_base& queue = my_rep->my_queue;
+#if TBB_USE_ASSERT
+    void* tmp;
+    my_rep->get_item(tmp,k);
+    __TBB_ASSERT( my_item==tmp, NULL );
+#endif /* TBB_USE_ASSERT */
+    size_t i = k/concurrent_queue_rep::n_queue & (queue.items_per_page-1);
+    if( i==queue.items_per_page-1 ) {
+        concurrent_queue_base::page*& root = my_rep->array[concurrent_queue_rep::index(k)];
+        root = root->next;
+    }
+    // advance k
+    my_rep->head_counter = ++k;
+    if( !my_rep->get_item(my_item, k) ) advance();
+}
+
+concurrent_queue_iterator_base_v3::~concurrent_queue_iterator_base_v3() {
+    //delete my_rep;
+    cache_aligned_allocator<concurrent_queue_iterator_rep>().deallocate(my_rep, 1);
+    my_rep = NULL;
+}
+
+} // namespace internal
+
+} // namespace tbb
diff --git a/dep/tbb/src/tbb/concurrent_vector.cpp b/dep/tbb/src/tbb/concurrent_vector.cpp
new file mode 100644
index 000000000..7dc51f490
--- /dev/null
+++ b/dep/tbb/src/tbb/concurrent_vector.cpp
@@ -0,0 +1,574 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "tbb/concurrent_vector.h"
+#include "tbb/cache_aligned_allocator.h"
+#include "tbb/tbb_exception.h"
+#include "tbb_misc.h"
+#include "itt_notify.h"
+#include <cstring>
+
+#if defined(_MSC_VER) && defined(_Wp64)
+    // Workaround for overzealous compiler warnings in /Wp64 mode
+    #pragma warning (disable: 4267)
+#endif
+
+using namespace std;
+
+namespace tbb {
+
+namespace internal {
+    class concurrent_vector_base_v3::helper :no_assign {
+public:
+    //! memory page size
+    static const size_type page_size = 4096;
+
+    inline static bool incompact_predicate(size_type size) { // assert size != 0, see source/test/test_vector_layout.cpp
+        return size < page_size || ((size-1)%page_size < page_size/2 && size < page_size * 128); // for more details
+    }
+
+    inline static size_type find_segment_end(const concurrent_vector_base_v3 &v) {
+        segment_t *s = v.my_segment;
+        segment_index_t u = s==v.my_storage? pointers_per_short_table : pointers_per_long_table;
+        segment_index_t k = 0;
+        while( k < u && s[k].array > internal::vector_allocation_error_flag )
+            ++k;
+        return k;
+    }
+
+    //! assign first segment size. k - is index of last segment to be allocated, not a count of segments
+    inline static void assign_first_segment_if_neccessary(concurrent_vector_base_v3 &v, segment_index_t k) {
+        if( !v.my_first_block ) {
+            /* There was a suggestion to set first segment according to incompact_predicate:
+            while( k && !helper::incompact_predicate(segment_size( k ) * element_size) )
+                --k; // while previous vector size is compact, decrement
+            // reasons to not do it:
+            // * constructor(n) is not ready to accept fragmented segments
+            // * backward compatibility due to that constructor
+            // * current version gives additional guarantee and faster init.
+            // * two calls to reserve() will give the same effect.
+            */
+            v.my_first_block.compare_and_swap(k+1, 0); // store number of segments
+        }
+    }
+
+    inline static void *allocate_segment(concurrent_vector_base_v3 &v, size_type n) {
+        void *ptr = v.vector_allocator_ptr(v, n);
+        if(!ptr) throw bad_alloc(); // check for bad allocation, throw exception
+        return ptr;
+    }
+
+    //! Publish segment so other threads can see it.
+    inline static void publish_segment( segment_t& s, void* rhs ) {
+    // see also itt_store_pointer_with_release_v3()
+        ITT_NOTIFY( sync_releasing, &s.array );
+        __TBB_store_with_release( s.array, rhs );
+    }
+
+    static size_type enable_segment(concurrent_vector_base_v3 &v, size_type k, size_type element_size) {
+        segment_t* s = v.my_segment; // TODO: optimize out as argument? Optimize accesses to my_first_block
+        __TBB_ASSERT( s[k].array <= internal::vector_allocation_error_flag, "concurrent operation during growth?" );
+        if( !k ) {
+            assign_first_segment_if_neccessary(v, default_initial_segments-1);
+            try {
+                publish_segment(s[0], allocate_segment(v, segment_size(v.my_first_block) ) );
+            } catch(...) { // intercept exception here, assign internal::vector_allocation_error_flag value, re-throw exception
+                publish_segment(s[0], internal::vector_allocation_error_flag); throw;
+            }
+            return 2;
+        }
+        size_type m = segment_size(k);
+        if( !v.my_first_block ) // push_back only
+            spin_wait_while_eq( v.my_first_block, segment_index_t(0) );
+        if( k < v.my_first_block ) {
+            // s[0].array is changed only once ( 0 -> !0 ) and points to uninitialized memory
+            void *array0 = __TBB_load_with_acquire(s[0].array);
+            if( !array0 ) {
+                // sync_prepare called only if there is a wait
+                ITT_NOTIFY(sync_prepare, &s[0].array );
+                spin_wait_while_eq( s[0].array, (void*)0 );
+                array0 = __TBB_load_with_acquire(s[0].array);
+            }
+            ITT_NOTIFY(sync_acquired, &s[0].array);
+            if( array0 <= internal::vector_allocation_error_flag ) { // check for internal::vector_allocation_error_flag of initial segment
+                publish_segment(s[k], internal::vector_allocation_error_flag); // and assign internal::vector_allocation_error_flag here
+                throw bad_last_alloc(); // throw custom exception
+            }
+            publish_segment( s[k],
+                    static_cast<void*>( static_cast<char*>(array0) + segment_base(k)*element_size )
+            );
+        } else {
+            try {
+                publish_segment(s[k], allocate_segment(v, m));
+            } catch(...) { // intercept exception here, assign internal::vector_allocation_error_flag value, re-throw exception
+                publish_segment(s[k], internal::vector_allocation_error_flag); throw;
+            }
+        }
+        return m;
+    }
+
+    inline static void extend_table_if_necessary(concurrent_vector_base_v3 &v, size_type k, size_type start ) {
+        if(k >= pointers_per_short_table && v.my_segment == v.my_storage)
+            extend_segment_table(v, start );
+    }
+
+    static void extend_segment_table(concurrent_vector_base_v3 &v, size_type start) {
+        if( start > segment_size(pointers_per_short_table) ) start = segment_size(pointers_per_short_table);
+        // If other threads are trying to set pointers in the short segment, wait for them to finish their
+        // assigments before we copy the short segment to the long segment. Note: grow_to_at_least depends on it
+        for( segment_index_t i = 0; segment_base(i) < start && v.my_segment == v.my_storage; i++ )
+            if(!v.my_storage[i].array) {
+                ITT_NOTIFY(sync_prepare, &v.my_storage[i].array);
+                atomic_backoff backoff;
+                do backoff.pause(); while( v.my_segment == v.my_storage && !v.my_storage[i].array );
+                ITT_NOTIFY(sync_acquired, &v.my_storage[i].array);
+            }
+        if( v.my_segment != v.my_storage ) return;
+
+        segment_t* s = (segment_t*)NFS_Allocate( pointers_per_long_table, sizeof(segment_t), NULL );
+        // if( !s ) throw bad_alloc() -- implemented in NFS_Allocate
+        memset( s, 0, pointers_per_long_table*sizeof(segment_t) );
+        for( segment_index_t i = 0; i < pointers_per_short_table; i++)
+            s[i] = v.my_storage[i];
+        if( v.my_segment.compare_and_swap( s, v.my_storage ) != v.my_storage )
+            NFS_Free( s );
+    }
+
+    inline static segment_t &acquire_segment(concurrent_vector_base_v3 &v, size_type index, size_type element_size, bool owner) {
+        segment_t &s = v.my_segment[index]; // TODO: pass v.my_segment as arument
+        if( !__TBB_load_with_acquire(s.array) ) { // do not check for internal::vector_allocation_error_flag 
+            if( owner ) {
+                enable_segment( v, index, element_size );
+            } else {
+                ITT_NOTIFY(sync_prepare, &s.array);
+                spin_wait_while_eq( s.array, (void*)0 );
+                ITT_NOTIFY(sync_acquired, &s.array);
+            }
+        } else {
+            ITT_NOTIFY(sync_acquired, &s.array);
+        }
+        if( s.array <= internal::vector_allocation_error_flag ) // check for internal::vector_allocation_error_flag
+            throw bad_last_alloc(); // throw custom exception, because it's hard to recover after internal::vector_allocation_error_flag correctly
+        return s;
+    }
+
+    ///// non-static fields of helper for exception-safe iteration across segments
+    segment_t *table;// TODO: review all segment_index_t as just short type
+    size_type first_block, k, sz, start, finish, element_size;
+    helper(segment_t *segments, size_type fb, size_type esize, size_type index, size_type s, size_type f) throw()
+        : table(segments), first_block(fb), k(index), sz(0), start(s), finish(f), element_size(esize) {}
+    inline void first_segment() throw() {
+        __TBB_ASSERT( start <= finish, NULL );
+        __TBB_ASSERT( first_block || !finish, NULL );
+        if( k < first_block ) k = 0; // process solid segment at a time
+        size_type base = segment_base( k );
+        __TBB_ASSERT( base <= start, NULL );
+        finish -= base; start -= base; // rebase as offsets from segment k
+        sz = k ? base : segment_size( first_block ); // sz==base for k>0
+    }
+    inline void next_segment() throw() {
+        finish -= sz; start = 0; // offsets from next segment
+        if( !k ) k = first_block;
+        else { ++k; sz <<= 1; }
+    }
+    template<typename F>
+    inline size_type apply(const F &func) {
+        first_segment();
+        while( sz < finish ) { // work for more than one segment
+            func( table[k], static_cast<char*>(table[k].array)+element_size*start, sz-start );
+            next_segment();
+        }
+        func( table[k], static_cast<char*>(table[k].array)+element_size*start, finish-start );
+        return k;
+    }
+    inline void *get_segment_ptr(size_type index, bool wait) {
+        segment_t &s = table[index];
+        if( !__TBB_load_with_acquire(s.array) && wait ) {
+            ITT_NOTIFY(sync_prepare, &s.array);
+            spin_wait_while_eq( s.array, (void*)0 );
+            ITT_NOTIFY(sync_acquired, &s.array);
+        }
+        return s.array;
+    }
+    ~helper() {
+        if( sz >= finish ) return; // the work is done correctly
+        if( !sz ) { // allocation failed, restore the table
+            segment_index_t k_start = k, k_end = segment_index_of(finish-1);
+            if( segment_base( k_start ) < start )
+                get_segment_ptr(k_start++, true); // wait
+            if( k_start < first_block ) {
+                void *array0 = get_segment_ptr(0, start>0); // wait if necessary
+                if( array0 && !k_start ) ++k_start;
+                if( array0 <= internal::vector_allocation_error_flag )
+                    for(; k_start < first_block && k_start <= k_end; ++k_start )
+                        publish_segment(table[k_start], internal::vector_allocation_error_flag);
+                else for(; k_start < first_block && k_start <= k_end; ++k_start )
+                        publish_segment(table[k_start], static_cast<void*>(
+                            static_cast<char*>(array0) + segment_base(k_start)*element_size) );
+            }
+            for(; k_start <= k_end; ++k_start ) // not in first block
+                if( !__TBB_load_with_acquire(table[k_start].array) )
+                    publish_segment(table[k_start], internal::vector_allocation_error_flag);
+            // fill alocated items
+            first_segment();
+            goto recover;
+        }
+        while( sz <= finish ) { // there is still work for at least one segment
+            next_segment();
+recover:
+            void *array = table[k].array;
+            if( array > internal::vector_allocation_error_flag )
+                std::memset( static_cast<char*>(array)+element_size*start, 0, ((sz<finish?sz:finish) - start)*element_size );
+            else __TBB_ASSERT( array == internal::vector_allocation_error_flag, NULL );
+        }
+    }
+    struct init_body {
+        internal_array_op2 func;
+        const void *arg;
+        init_body(internal_array_op2 init, const void *src) : func(init), arg(src) {}
+        void operator()(segment_t &, void *begin, size_type n) const {
+            func( begin, arg, n );
+        }
+    };
+    struct safe_init_body {
+        internal_array_op2 func;
+        const void *arg;
+        safe_init_body(internal_array_op2 init, const void *src) : func(init), arg(src) {}
+        void operator()(segment_t &s, void *begin, size_type n) const {
+            if( s.array <= internal::vector_allocation_error_flag )
+                throw bad_last_alloc(); // throw custom exception
+            func( begin, arg, n );
+        }
+    };
+    struct destroy_body {
+        internal_array_op1 func;
+        destroy_body(internal_array_op1 destroy) : func(destroy) {}
+        void operator()(segment_t &s, void *begin, size_type n) const {
+            if( s.array > internal::vector_allocation_error_flag )
+                func( begin, n );
+        }
+    };
+};
+
+concurrent_vector_base_v3::~concurrent_vector_base_v3() {
+    segment_t* s = my_segment;
+    if( s != my_storage ) {
+        // Clear short segment.
+        for( segment_index_t i = 0; i < pointers_per_short_table; i++)
+            my_storage[i].array = NULL;
+#if TBB_USE_DEBUG
+        for( segment_index_t i = 0; i < pointers_per_long_table; i++)
+            __TBB_ASSERT( my_segment[i].array <= internal::vector_allocation_error_flag, "Segment should have been freed. Please recompile with new TBB before using exceptions.");
+#endif
+        my_segment = my_storage;
+        NFS_Free( s );
+    }
+}
+
+concurrent_vector_base_v3::size_type concurrent_vector_base_v3::internal_capacity() const {
+    return segment_base( helper::find_segment_end(*this) );
+}
+
+void concurrent_vector_base_v3::internal_throw_exception(size_type t) const {
+    switch(t) {
+        case 0: throw out_of_range("Index out of requested size range");
+        case 1: throw range_error ("Index out of allocated segment slots");
+        case 2: throw range_error ("Index is not allocated");
+    }
+}
+
+void concurrent_vector_base_v3::internal_reserve( size_type n, size_type element_size, size_type max_size ) {
+    if( n>max_size ) {
+        throw length_error("argument to concurrent_vector::reserve exceeds concurrent_vector::max_size()");
+    }
+    __TBB_ASSERT( n, NULL );
+    helper::assign_first_segment_if_neccessary(*this, segment_index_of(n-1));
+    segment_index_t k = helper::find_segment_end(*this);
+    try {
+        for( ; segment_base(k)<n; ++k ) {
+            helper::extend_table_if_necessary(*this, k, 0);
+            if(my_segment[k].array <= internal::vector_allocation_error_flag)
+                helper::enable_segment(*this, k, element_size);
+        }
+    } catch(...) {
+        my_segment[k].array = NULL; throw; // repair and rethrow
+    }
+}
+
+void concurrent_vector_base_v3::internal_copy( const concurrent_vector_base_v3& src, size_type element_size, internal_array_op2 copy ) {
+    size_type n = src.my_early_size;
+    __TBB_ASSERT( my_segment == my_storage, NULL);
+    if( n ) {
+        helper::assign_first_segment_if_neccessary(*this, segment_index_of(n-1));
+        size_type b;
+        for( segment_index_t k=0; (b=segment_base(k))<n; ++k ) {
+            if( (src.my_segment == (segment_t*)src.my_storage && k >= pointers_per_short_table)
+                || src.my_segment[k].array <= internal::vector_allocation_error_flag ) {
+                my_early_size = b; break;
+            }
+            helper::extend_table_if_necessary(*this, k, 0);
+            size_type m = helper::enable_segment(*this, k, element_size);
+            if( m > n-b ) m = n-b;
+            my_early_size = b+m;
+            copy( my_segment[k].array, src.my_segment[k].array, m );
+        }
+    }
+}
+
+void concurrent_vector_base_v3::internal_assign( const concurrent_vector_base_v3& src, size_type element_size, internal_array_op1 destroy, internal_array_op2 assign, internal_array_op2 copy ) {
+    size_type n = src.my_early_size;
+    while( my_early_size>n ) { // TODO: improve
+        segment_index_t k = segment_index_of( my_early_size-1 );
+        size_type b=segment_base(k);
+        size_type new_end = b>=n ? b : n;
+        __TBB_ASSERT( my_early_size>new_end, NULL );
+        if( my_segment[k].array <= internal::vector_allocation_error_flag) // check vector was broken before
+            throw bad_last_alloc(); // throw custom exception
+        // destructors are supposed to not throw any exceptions
+        destroy( (char*)my_segment[k].array+element_size*(new_end-b), my_early_size-new_end );
+        my_early_size = new_end;
+    }
+    size_type dst_initialized_size = my_early_size;
+    my_early_size = n;
+    helper::assign_first_segment_if_neccessary(*this, segment_index_of(n));
+    size_type b;
+    for( segment_index_t k=0; (b=segment_base(k))<n; ++k ) {
+        if( (src.my_segment == (segment_t*)src.my_storage && k >= pointers_per_short_table)
+            || src.my_segment[k].array <= internal::vector_allocation_error_flag ) { // if source is damaged
+                my_early_size = b; break; // TODO: it may cause undestructed items
+        }
+        helper::extend_table_if_necessary(*this, k, 0);
+        if( !my_segment[k].array )
+            helper::enable_segment(*this, k, element_size);
+        else if( my_segment[k].array <= internal::vector_allocation_error_flag )
+            throw bad_last_alloc(); // throw custom exception
+        size_type m = k? segment_size(k) : 2;
+        if( m > n-b ) m = n-b;
+        size_type a = 0;
+        if( dst_initialized_size>b ) {
+            a = dst_initialized_size-b;
+            if( a>m ) a = m;
+            assign( my_segment[k].array, src.my_segment[k].array, a );
+            m -= a;
+            a *= element_size;
+        }
+        if( m>0 )
+            copy( (char*)my_segment[k].array+a, (char*)src.my_segment[k].array+a, m );
+    }
+    __TBB_ASSERT( src.my_early_size==n, "detected use of concurrent_vector::operator= with right side that was concurrently modified" );
+}
+
+void* concurrent_vector_base_v3::internal_push_back( size_type element_size, size_type& index ) {
+    __TBB_ASSERT( sizeof(my_early_size)==sizeof(uintptr), NULL );
+    size_type tmp = __TBB_FetchAndIncrementWacquire(&my_early_size);
+    index = tmp;
+    segment_index_t k_old = segment_index_of( tmp );
+    size_type base = segment_base(k_old);
+    helper::extend_table_if_necessary(*this, k_old, tmp);
+    segment_t& s = helper::acquire_segment(*this, k_old, element_size, base==tmp);
+    size_type j_begin = tmp-base;
+    return (void*)((char*)s.array+element_size*j_begin);
+}
+
+void concurrent_vector_base_v3::internal_grow_to_at_least( size_type new_size, size_type element_size, internal_array_op2 init, const void *src ) {
+    internal_grow_to_at_least_with_result( new_size, element_size, init, src );
+}
+
+concurrent_vector_base_v3::size_type concurrent_vector_base_v3::internal_grow_to_at_least_with_result( size_type new_size, size_type element_size, internal_array_op2 init, const void *src ) {
+    size_type e = my_early_size;
+    while( e<new_size ) {
+        size_type f = my_early_size.compare_and_swap(new_size,e);
+        if( f==e ) {
+            internal_grow( e, new_size, element_size, init, src );
+            break;
+        }
+        e = f;
+    }
+    // Check/wait for segments allocation completes
+    segment_index_t i, k_old = segment_index_of( new_size-1 );
+    if( k_old >= pointers_per_short_table && my_segment == my_storage ) {
+        spin_wait_while_eq( my_segment, my_storage );
+    }
+    for( i = 0; i <= k_old; ++i ) {
+        segment_t &s = my_segment[i];
+        if(!s.array) {
+            ITT_NOTIFY(sync_prepare, &s.array);
+            atomic_backoff backoff;
+            do backoff.pause();
+            while( !__TBB_load_with_acquire(my_segment[i].array) ); // my_segment may change concurrently
+            ITT_NOTIFY(sync_acquired, &s.array);
+        }
+        if( my_segment[i].array <= internal::vector_allocation_error_flag )
+            throw bad_last_alloc();
+    }
+#if TBB_USE_DEBUG
+    size_type capacity = internal_capacity();
+    __TBB_ASSERT( capacity >= new_size, NULL);
+#endif
+    return e;
+}
+
+concurrent_vector_base_v3::size_type concurrent_vector_base_v3::internal_grow_by( size_type delta, size_type element_size, internal_array_op2 init, const void *src ) {
+    size_type result = my_early_size.fetch_and_add(delta);
+    internal_grow( result, result+delta, element_size, init, src );
+    return result;
+}
+
+void concurrent_vector_base_v3::internal_grow( const size_type start, size_type finish, size_type element_size, internal_array_op2 init, const void *src ) {
+    __TBB_ASSERT( start<finish, "start must be less than finish" );
+    segment_index_t k_start = segment_index_of(start), k_end = segment_index_of(finish-1);
+    helper::assign_first_segment_if_neccessary(*this, k_end);
+    helper::extend_table_if_necessary(*this, k_end, start);
+    helper range(my_segment, my_first_block, element_size, k_start, start, finish);
+    for(; k_end > k_start && k_end >= range.first_block; --k_end ) // allocate segments in reverse order
+        helper::acquire_segment(*this, k_end, element_size, true/*for k_end>k_start*/);
+    for(; k_start <= k_end; ++k_start ) // but allocate first block in straight order
+        helper::acquire_segment(*this, k_start, element_size, segment_base( k_start ) >= start );
+    range.apply( helper::init_body(init, src) );
+}
+
+void concurrent_vector_base_v3::internal_resize( size_type n, size_type element_size, size_type max_size, const void *src,
+                                                internal_array_op1 destroy, internal_array_op2 init ) {
+    size_type j = my_early_size;
+    if( n > j ) { // construct items
+        internal_reserve(n, element_size, max_size);
+        my_early_size = n;
+        helper for_each(my_segment, my_first_block, element_size, segment_index_of(j), j, n);
+        for_each.apply( helper::safe_init_body(init, src) );
+    } else {
+        my_early_size = n;
+        helper for_each(my_segment, my_first_block, element_size, segment_index_of(n), n, j);
+        for_each.apply( helper::destroy_body(destroy) );
+    }
+}
+
+concurrent_vector_base_v3::segment_index_t concurrent_vector_base_v3::internal_clear( internal_array_op1 destroy ) {
+    __TBB_ASSERT( my_segment, NULL );
+    size_type j = my_early_size;
+    my_early_size = 0;
+    helper for_each(my_segment, my_first_block, 0, 0, 0, j); // element_size is safe to be zero if 'start' is zero
+    j = for_each.apply( helper::destroy_body(destroy) );
+    size_type i = helper::find_segment_end(*this);
+    return j < i? i : j+1;
+}
+
+void *concurrent_vector_base_v3::internal_compact( size_type element_size, void *table, internal_array_op1 destroy, internal_array_op2 copy )
+{
+    const size_type my_size = my_early_size;
+    const segment_index_t k_end = helper::find_segment_end(*this); // allocated segments
+    const segment_index_t k_stop = my_size? segment_index_of(my_size-1) + 1 : 0; // number of segments to store existing items: 0=>0; 1,2=>1; 3,4=>2; [5-8]=>3;..
+    const segment_index_t first_block = my_first_block; // number of merged segments, getting values from atomics
+
+    segment_index_t k = first_block;
+    if(k_stop < first_block)
+        k = k_stop;
+    else
+        while (k < k_stop && helper::incompact_predicate(segment_size( k ) * element_size) ) k++;
+    if(k_stop == k_end && k == first_block)
+        return NULL;
+
+    segment_t *const segment_table = my_segment;
+    internal_segments_table &old = *static_cast<internal_segments_table*>( table );
+    memset(&old, 0, sizeof(old));
+
+    if ( k != first_block && k ) // first segment optimization
+    {
+        // exception can occur here
+        void *seg = old.table[0] = helper::allocate_segment( *this, segment_size(k) );
+        old.first_block = k; // fill info for freeing new segment if exception occurs
+        // copy items to the new segment
+        size_type my_segment_size = segment_size( first_block );
+        for (segment_index_t i = 0, j = 0; i < k && j < my_size; j = my_segment_size) {
+            __TBB_ASSERT( segment_table[i].array > internal::vector_allocation_error_flag, NULL);
+            void *s = static_cast<void*>(
+                static_cast<char*>(seg) + segment_base(i)*element_size );
+            if(j + my_segment_size >= my_size) my_segment_size = my_size - j;
+            try { // exception can occur here
+                copy( s, segment_table[i].array, my_segment_size );
+            } catch(...) { // destroy all the already copied items
+                helper for_each(reinterpret_cast<segment_t*>(&old.table[0]), old.first_block, element_size,
+                    0, 0, segment_base(i)+my_segment_size);
+                for_each.apply( helper::destroy_body(destroy) );
+                throw;
+            }
+            my_segment_size = i? segment_size( ++i ) : segment_size( i = first_block );
+        }
+        // commit the changes
+        memcpy(old.table, segment_table, k * sizeof(segment_t));
+        for (segment_index_t i = 0; i < k; i++) {
+            segment_table[i].array = static_cast<void*>(
+                static_cast<char*>(seg) + segment_base(i)*element_size );
+        }
+        old.first_block = first_block; my_first_block = k; // now, first_block != my_first_block
+        // destroy original copies
+        my_segment_size = segment_size( first_block ); // old.first_block actually
+        for (segment_index_t i = 0, j = 0; i < k && j < my_size; j = my_segment_size) {
+            if(j + my_segment_size >= my_size) my_segment_size = my_size - j;
+            // destructors are supposed to not throw any exceptions
+            destroy( old.table[i], my_segment_size );
+            my_segment_size = i? segment_size( ++i ) : segment_size( i = first_block );
+        }
+    }
+    // free unnecessary segments allocated by reserve() call
+    if ( k_stop < k_end ) {
+        old.first_block = first_block;
+        memcpy(old.table+k_stop, segment_table+k_stop, (k_end-k_stop) * sizeof(segment_t));
+        memset(segment_table+k_stop, 0, (k_end-k_stop) * sizeof(segment_t));
+        if( !k ) my_first_block = 0;
+    }
+    return table;
+}
+
+void concurrent_vector_base_v3::internal_swap(concurrent_vector_base_v3& v)
+{
+    size_type my_sz = my_early_size, v_sz = v.my_early_size;
+    if(!my_sz && !v_sz) return;
+    size_type tmp = my_first_block; my_first_block = v.my_first_block; v.my_first_block = tmp;
+    bool my_short = (my_segment == my_storage), v_short  = (v.my_segment == v.my_storage);
+    if ( my_short && v_short ) { // swap both tables
+        char tbl[pointers_per_short_table * sizeof(segment_t)];
+        memcpy(tbl, my_storage, pointers_per_short_table * sizeof(segment_t));
+        memcpy(my_storage, v.my_storage, pointers_per_short_table * sizeof(segment_t));
+        memcpy(v.my_storage, tbl, pointers_per_short_table * sizeof(segment_t));
+    }
+    else if ( my_short ) { // my -> v
+        memcpy(v.my_storage, my_storage, pointers_per_short_table * sizeof(segment_t));
+        my_segment = v.my_segment; v.my_segment = v.my_storage;
+    }
+    else if ( v_short ) { // v -> my
+        memcpy(my_storage, v.my_storage, pointers_per_short_table * sizeof(segment_t));
+        v.my_segment = my_segment; my_segment = my_storage;
+    } else {
+        segment_t *ptr = my_segment; my_segment = v.my_segment; v.my_segment = ptr;
+    }
+    my_early_size = v_sz; v.my_early_size = my_sz;
+}
+
+} // namespace internal
+
+} // tbb
diff --git a/dep/tbb/src/tbb/dynamic_link.cpp b/dep/tbb/src/tbb/dynamic_link.cpp
new file mode 100644
index 000000000..f6de51099
--- /dev/null
+++ b/dep/tbb/src/tbb/dynamic_link.cpp
@@ -0,0 +1,133 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "dynamic_link.h"
+
+#ifndef LIBRARY_ASSERT
+#include "tbb/tbb_stddef.h"
+#define LIBRARY_ASSERT(x,y) __TBB_ASSERT(x,y)
+#endif /* LIBRARY_ASSERT */
+
+#if _WIN32||_WIN64
+    #include <malloc.h>     /* alloca */
+#else
+    #include <dlfcn.h>
+#if __FreeBSD__
+    #include <stdlib.h>     /* alloca */
+#else
+    #include <alloca.h>
+#endif
+#endif
+
+OPEN_INTERNAL_NAMESPACE
+
+#if __TBB_WEAK_SYMBOLS
+
+bool dynamic_link( void*, const dynamic_link_descriptor descriptors[], size_t n, size_t required )
+{
+    if ( required == ~(size_t)0 )
+        required = n;
+    LIBRARY_ASSERT( required<=n, "Number of required entry points exceeds their total number" );
+    size_t k = 0;
+    // Check if the first required entries are present in what was loaded into our process
+    while ( k < required && descriptors[k].ptr )
+        ++k;
+    if ( k < required )
+        return false;
+    // Commit all the entry points.
+    for ( k = 0; k < n; ++k )
+        *descriptors[k].handler = (pointer_to_handler) descriptors[k].ptr;
+    return true;
+}
+
+#else /* !__TBB_WEAK_SYMBOLS */
+
+bool dynamic_link( void* module, const dynamic_link_descriptor descriptors[], size_t n, size_t required )
+{
+    pointer_to_handler *h = (pointer_to_handler*)alloca(n * sizeof(pointer_to_handler));
+    if ( required == ~(size_t)0 )
+        required = n;
+    LIBRARY_ASSERT( required<=n, "Number of required entry points exceeds their total number" );
+    size_t k = 0;
+    for ( ; k < n; ++k ) {
+#if _WIN32||_WIN64
+        h[k] = pointer_to_handler(GetProcAddress( (HMODULE)module, descriptors[k].name ));
+#else
+        // Lvalue casting is used; this way icc -strict-ansi does not warn about nonstandard pointer conversion
+        (void *&)h[k] = dlsym( module, descriptors[k].name );
+#endif /* _WIN32||_WIN64 */
+        if ( !h[k] && k < required )
+            return false;
+    }
+    LIBRARY_ASSERT( k == n, "if required entries are initialized, all entries are expected to be walked");
+    // Commit the entry points.
+    // Cannot use memset here, because the writes must be atomic.
+    for( k = 0; k < n; ++k )
+        *descriptors[k].handler = h[k];
+    return true;
+}
+
+#endif /* !__TBB_WEAK_SYMBOLS */
+bool dynamic_link( const char* library, const dynamic_link_descriptor descriptors[], size_t n, size_t required, dynamic_link_handle* handle )
+{
+#if _WIN32||_WIN64
+    // Interpret non-NULL handle parameter as request to really link against another library.
+    if ( !handle && dynamic_link( GetModuleHandle(NULL), descriptors, n, required ) )
+        // Target library was statically linked into this executable
+        return true;
+    // Prevent Windows from displaying silly message boxes if it fails to load library
+    // (e.g. because of MS runtime problems - one of those crazy manifest related ones)
+    UINT prev_mode = SetErrorMode (SEM_FAILCRITICALERRORS);
+    dynamic_link_handle module = LoadLibrary (library);
+    SetErrorMode (prev_mode);
+#else
+    dynamic_link_handle module = dlopen( library, RTLD_LAZY ); 
+#endif /* _WIN32||_WIN64 */
+    if( module ) {
+        if( !dynamic_link( module, descriptors, n, required ) ) {
+            // Return true if the library is there and it contains all the expected entry points.
+            dynamic_unlink(module);
+            module = NULL;
+        }
+    }
+    if( handle ) 
+        *handle = module;
+    return module!=NULL;
+}
+
+void dynamic_unlink( dynamic_link_handle handle ) {
+    if( handle ) {
+#if _WIN32||_WIN64
+        FreeLibrary( handle );
+#else
+        dlclose( handle );
+#endif /* _WIN32||_WIN64 */    
+    }
+}
+
+CLOSE_INTERNAL_NAMESPACE
diff --git a/dep/tbb/src/tbb/dynamic_link.h b/dep/tbb/src/tbb/dynamic_link.h
new file mode 100644
index 000000000..1439eca7e
--- /dev/null
+++ b/dep/tbb/src/tbb/dynamic_link.h
@@ -0,0 +1,102 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_dynamic_link
+#define __TBB_dynamic_link
+
+// Support for dynamically linking to a shared library.
+// By default, the symbols defined here go in namespace tbb::internal.
+// The symbols can be put in another namespace by defining the preprocessor
+// symbols OPEN_INTERNAL_NAMESPACE and CLOSE_INTERNAL_NAMESPACE to open and
+// close the other namespace.  See default definition below for an example.
+
+#ifndef OPEN_INTERNAL_NAMESPACE
+#define OPEN_INTERNAL_NAMESPACE namespace tbb { namespace internal {
+#define CLOSE_INTERNAL_NAMESPACE }}
+#endif /* OPEN_INTERNAL_NAMESPACE */
+
+#include <stddef.h>
+#if _WIN32||_WIN64
+#include <windows.h>
+#endif /* _WIN32||_WIN64 */
+
+OPEN_INTERNAL_NAMESPACE
+
+//! Type definition for a pointer to a void somefunc(void)
+typedef void (*pointer_to_handler)();
+
+// Double cast through the void* from func_ptr in DLD macro is necessary to 
+// prevent warnings from some compilers (g++ 4.1)
+#if __TBB_WEAK_SYMBOLS
+
+#define DLD(s,h) {(pointer_to_handler)&s, (pointer_to_handler*)(void*)(&h)}
+//! Association between a handler name and location of pointer to it.
+struct dynamic_link_descriptor {
+    //! pointer to the handler
+    pointer_to_handler ptr;
+    //! Pointer to the handler
+    pointer_to_handler* handler;
+};
+
+#else /* !__TBB_WEAK_SYMBOLS */
+
+#define DLD(s,h) {#s, (pointer_to_handler*)(void*)(&h)}
+//! Association between a handler name and location of pointer to it.
+struct dynamic_link_descriptor {
+    //! Name of the handler
+    const char* name;
+    //! Pointer to the handler
+    pointer_to_handler* handler;
+};
+
+#endif /* !__TBB_WEAK_SYMBOLS */
+
+#if _WIN32||_WIN64
+typedef HMODULE dynamic_link_handle;
+#else 
+typedef void* dynamic_link_handle;
+#endif /* _WIN32||_WIN64 */
+
+//! Fill in dynamically linked handlers.
+/** 'n' is the length of the array descriptors[].
+    'required' is the number of the initial entries in the array descriptors[] 
+    that have to be found in order for the call to succeed. If the library and 
+    all the required handlers are found, then the corresponding handler pointers 
+    are set, and the return value is true.  Otherwise the original array of 
+    descriptors is left untouched and the return value is false. **/
+bool dynamic_link( const char* libraryname, 
+                   const dynamic_link_descriptor descriptors[], 
+                   size_t n, 
+                   size_t required = ~(size_t)0,
+                   dynamic_link_handle* handle = 0 );
+
+void dynamic_unlink( dynamic_link_handle handle );
+
+CLOSE_INTERNAL_NAMESPACE
+
+#endif /* __TBB_dynamic_link */
diff --git a/dep/tbb/src/tbb/enumerable_thread_specific.cpp b/dep/tbb/src/tbb/enumerable_thread_specific.cpp
new file mode 100644
index 000000000..f576fb3b6
--- /dev/null
+++ b/dep/tbb/src/tbb/enumerable_thread_specific.cpp
@@ -0,0 +1,172 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "tbb/enumerable_thread_specific.h"
+#include "tbb/concurrent_queue.h"
+#include "tbb/cache_aligned_allocator.h"
+#include "tbb/atomic.h"
+#include "tbb/spin_mutex.h"
+
+namespace tbb {
+
+    namespace internal {
+
+        // Manages fake TLS keys and fake TLS space
+        // Uses only a single native TLS key through use of an enumerable_thread_specific< ... , ets_key_per_instance >
+        class tls_single_key_manager {
+
+            // Typedefs
+            typedef concurrent_vector<void *> local_vector_type;
+            typedef enumerable_thread_specific< local_vector_type, cache_aligned_allocator<local_vector_type>, ets_key_per_instance > my_ets_type;
+            typedef local_vector_type::size_type fake_key_t;
+
+            // The fake TLS space
+            my_ets_type my_vectors;
+
+            // The next never-yet-assigned fake TLS key
+            atomic< fake_key_t > next_key;
+
+            // A Q of fake TLS keys that can be reused
+            typedef spin_mutex free_mutex_t;
+            free_mutex_t free_mutex;
+
+            struct free_node_t {
+                fake_key_t key;        
+                free_node_t *next;
+            };
+
+            cache_aligned_allocator< free_node_t > my_allocator;
+            free_node_t *free_stack;
+
+            bool pop_if_present( fake_key_t &k ) { 
+                free_node_t *n = NULL;
+                {
+                    free_mutex_t::scoped_lock(free_mutex);
+                    n = free_stack;
+                    if (n) free_stack = n->next;
+                }
+                if ( n ) {
+                    k = n->key;
+                    my_allocator.deallocate(n,1);
+                    return true;
+                }
+                return false;
+            }
+
+            void push( fake_key_t &k ) { 
+                free_node_t *n = my_allocator.allocate(1); 
+                n->key = k;
+                {
+                    free_mutex_t::scoped_lock(free_mutex);
+                    n->next = free_stack;
+                    free_stack = n;
+                }
+            }
+
+        public:
+
+            tls_single_key_manager() : free_stack(NULL)  {
+                next_key = 0;
+            }
+
+            ~tls_single_key_manager() {
+                free_node_t *n = free_stack; 
+                while (n != NULL) {
+                    free_node_t *next = n->next;
+                    my_allocator.deallocate(n,1);
+                    n = next;
+                }
+            }
+
+            // creates or finds an available fake TLS key
+            inline void create_key( fake_key_t &k ) {
+                if ( !(free_stack && pop_if_present( k )) ) {
+                    k = next_key.fetch_and_add(1);     
+                } 
+            }
+
+            // resets the fake TLS space associated with the key and then recycles the key
+            inline void destroy_key( fake_key_t &k ) {
+                for ( my_ets_type::iterator i = my_vectors.begin(); i != my_vectors.end(); ++i ) {
+                    local_vector_type &ivec = *i;
+                    if (ivec.size() > k) 
+                        ivec[k] = NULL;
+                } 
+                push(k);
+            }
+
+            // sets the fake TLS space to point to the given value for this thread
+            inline void set_tls( fake_key_t &k, void *value ) {
+                local_vector_type &my_vector = my_vectors.local();
+                local_vector_type::size_type size = my_vector.size();
+
+                if ( size <= k ) { 
+                    // We use grow_by so that we can initialize the pointers to NULL
+                    my_vector.grow_by( k - size + 1, NULL );
+                }
+                my_vector[k] = value;
+            }
+
+            inline void *get_tls( fake_key_t &k ) {
+                local_vector_type &my_vector = my_vectors.local();
+                if (my_vector.size() > k) 
+                    return my_vector[k];
+                else
+                    return NULL;
+            }
+
+        };
+
+        // The single static instance of tls_single_key_manager
+        static tls_single_key_manager tls_key_manager;
+
+        // The EXPORTED functions
+        void
+        tls_single_key_manager_v4::create_key( tls_key_t &k) {
+            tls_key_manager.create_key( k );
+        }
+
+        void
+        tls_single_key_manager_v4::destroy_key( tls_key_t &k) {
+            tls_key_manager.destroy_key( k );
+        }
+
+        void
+        tls_single_key_manager_v4::set_tls( tls_key_t &k, void *value) {
+             tls_key_manager.set_tls( k, value);
+        }
+
+        void *
+        tls_single_key_manager_v4::get_tls( tls_key_t &k ) {
+            return tls_key_manager.get_tls( k );
+        }
+ 
+    }
+
+}
+
diff --git a/dep/tbb/src/tbb/gate.h b/dep/tbb/src/tbb/gate.h
new file mode 100644
index 000000000..fb1283621
--- /dev/null
+++ b/dep/tbb/src/tbb/gate.h
@@ -0,0 +1,221 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef _TBB_Gate_H
+#define _TBB_Gate_H
+
+#include "itt_notify.h"
+
+namespace tbb {
+
+namespace internal {
+
+#if __TBB_RML
+//! Fake version of Gate for use with RML.
+/** Really just an atomic intptr_t with a compare-and-swap operation,
+    but wrapped in syntax that makes it look like a normal Gate object,
+    in order to minimize source changes for RML in task.cpp. */
+class Gate {
+public:
+    typedef intptr_t state_t;
+   
+    //! Get current state of gate
+    state_t get_state() const {
+        return state;
+    }
+
+#if defined(_MSC_VER) && defined(_Wp64)
+    // Workaround for overzealous compiler warnings in /Wp64 mode
+    #pragma warning (disable: 4244)
+#endif
+
+    bool try_update( intptr_t value, intptr_t comparand ) {
+        return state.compare_and_swap(value,comparand)==comparand;
+    }
+private:
+    atomic<state_t> state;
+};
+
+#elif __TBB_USE_FUTEX
+
+//! Implementation of Gate based on futex.
+/** Use this futex-based implementation where possible, because it is the simplest and usually fastest. */
+class Gate {
+public:
+    typedef intptr_t state_t;
+
+    Gate() {
+        ITT_SYNC_CREATE(&state, SyncType_Scheduler, SyncObj_Gate);
+    }
+
+    //! Get current state of gate
+    state_t get_state() const {
+        return state;
+    }
+    //! Update state=value if state==comparand (flip==false) or state!=comparand (flip==true)
+    void try_update( intptr_t value, intptr_t comparand, bool flip=false ) {
+        __TBB_ASSERT( comparand!=0 || value!=0, "either value or comparand must be non-zero" );
+retry:
+        state_t old_state = state;
+        // First test for condition without using atomic operation
+        if( flip ? old_state!=comparand : old_state==comparand ) {
+            // Now atomically retest condition and set.
+            state_t s = state.compare_and_swap( value, old_state );
+            if( s==old_state ) {
+                // compare_and_swap succeeded
+                if( value!=0 )   
+                    futex_wakeup_all( &state );  // Update was successful and new state is not SNAPSHOT_EMPTY
+            } else {
+                // compare_and_swap failed.  But for != case, failure may be spurious for our purposes if
+                // the value there is nonetheless not equal to value.  This is a fairly rare event, so
+                // there is no need for backoff.  In event of such a failure, we must retry.
+                if( flip && s!=value ) 
+                    goto retry;
+            }
+        }
+    }
+    //! Wait for state!=0.
+    void wait() {
+        if( state==0 )
+            futex_wait( &state, 0 );
+    }
+private:
+    atomic<state_t> state;
+};
+
+#elif USE_WINTHREAD
+
+class Gate {
+public:
+    typedef intptr_t state_t;
+private:
+    //! If state==0, then thread executing wait() suspend until state becomes non-zero.
+    state_t state;
+    CRITICAL_SECTION critical_section;
+    HANDLE event;
+public:
+    //! Initialize with count=0
+    Gate() : state(0) {
+        event = CreateEvent( NULL, true, false, NULL );
+        InitializeCriticalSection( &critical_section );
+        ITT_SYNC_CREATE(&event, SyncType_Scheduler, SyncObj_Gate);
+        ITT_SYNC_CREATE(&critical_section, SyncType_Scheduler, SyncObj_GateLock);
+    }
+    ~Gate() {
+        // Fake prepare/acquired pair for Intel(R) Parallel Amplifier to correctly attribute the operations below
+        ITT_NOTIFY( sync_prepare, &event );
+        CloseHandle( event );
+        DeleteCriticalSection( &critical_section );
+        ITT_NOTIFY( sync_acquired, &event );
+    }
+    //! Get current state of gate
+    state_t get_state() const {
+        return state;
+    }
+    //! Update state=value if state==comparand (flip==false) or state!=comparand (flip==true)
+    void try_update( intptr_t value, intptr_t comparand, bool flip=false ) {
+        __TBB_ASSERT( comparand!=0 || value!=0, "either value or comparand must be non-zero" );
+        EnterCriticalSection( &critical_section );
+        state_t old = state;
+        if( flip ? old!=comparand : old==comparand ) {
+            state = value;
+            if( !old )
+                SetEvent( event );
+            else if( !value )
+                ResetEvent( event );
+        }
+        LeaveCriticalSection( &critical_section );
+    }
+    //! Wait for state!=0.
+    void wait() {
+        if( state==0 ) {
+            WaitForSingleObject( event, INFINITE );
+        }
+    }
+};
+
+#elif USE_PTHREAD
+
+class Gate {
+public:
+    typedef intptr_t state_t;
+private:
+    //! If state==0, then thread executing wait() suspend until state becomes non-zero.
+    state_t state;
+    pthread_mutex_t mutex;
+    pthread_cond_t cond;
+public:
+    //! Initialize with count=0
+    Gate() : state(0)
+    {
+        pthread_mutex_init( &mutex, NULL );
+        pthread_cond_init( &cond, NULL);
+        ITT_SYNC_CREATE(&cond, SyncType_Scheduler, SyncObj_Gate);
+        ITT_SYNC_CREATE(&mutex, SyncType_Scheduler, SyncObj_GateLock);
+    }
+    ~Gate() {
+        pthread_cond_destroy( &cond );
+        pthread_mutex_destroy( &mutex );
+    }
+    //! Get current state of gate
+    state_t get_state() const {
+        return state;
+    }
+    //! Update state=value if state==comparand (flip==false) or state!=comparand (flip==true)
+    void try_update( intptr_t value, intptr_t comparand, bool flip=false ) {
+        __TBB_ASSERT( comparand!=0 || value!=0, "either value or comparand must be non-zero" );
+        pthread_mutex_lock( &mutex );
+        state_t old = state;
+        if( flip ? old!=comparand : old==comparand ) {
+            state = value;
+            if( !old )
+                pthread_cond_broadcast( &cond );
+        }
+        pthread_mutex_unlock( &mutex );
+    }
+    //! Wait for state!=0.
+    void wait() {
+        if( state==0 ) {
+            pthread_mutex_lock( &mutex );
+            while( state==0 ) {
+                pthread_cond_wait( &cond, &mutex );
+            }
+            pthread_mutex_unlock( &mutex );
+        }
+    }
+};
+
+#else
+#error Must define USE_PTHREAD or USE_WINTHREAD
+#endif  /* threading kind */
+
+} // namespace Internal
+
+} // namespace ThreadingBuildingBlocks
+
+#endif /* _TBB_Gate_H */
diff --git a/dep/tbb/src/tbb/ia32-masm/atomic_support.asm b/dep/tbb/src/tbb/ia32-masm/atomic_support.asm
new file mode 100644
index 000000000..e22bc1caf
--- /dev/null
+++ b/dep/tbb/src/tbb/ia32-masm/atomic_support.asm
@@ -0,0 +1,196 @@
+; Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+;
+; This file is part of Threading Building Blocks.
+;
+; Threading Building Blocks is free software; you can redistribute it
+; and/or modify it under the terms of the GNU General Public License
+; version 2 as published by the Free Software Foundation.
+;
+; Threading Building Blocks is distributed in the hope that it will be
+; useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+; of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+; GNU General Public License for more details.
+;
+; You should have received a copy of the GNU General Public License
+; along with Threading Building Blocks; if not, write to the Free Software
+; Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+;
+; As a special exception, you may use this file as part of a free software
+; library without restriction.  Specifically, if other files instantiate
+; templates or use macros or inline functions from this file, or you compile
+; this file and link it with other files to produce an executable, this
+; file does not by itself cause the resulting executable to be covered by
+; the GNU General Public License.  This exception does not however
+; invalidate any other reasons why the executable file might be covered by
+; the GNU General Public License.
+
+.686
+.model flat,c
+.code 
+	ALIGN 4
+	PUBLIC c __TBB_machine_fetchadd1
+__TBB_machine_fetchadd1:
+	mov edx,4[esp]
+	mov eax,8[esp]
+	lock xadd [edx],al
+	ret
+.code 
+	ALIGN 4
+	PUBLIC c __TBB_machine_fetchstore1
+__TBB_machine_fetchstore1:
+	mov edx,4[esp]
+	mov eax,8[esp]
+	lock xchg [edx],al
+	ret
+.code 
+	ALIGN 4
+	PUBLIC c __TBB_machine_cmpswp1
+__TBB_machine_cmpswp1:
+	mov edx,4[esp]
+	mov ecx,8[esp]
+	mov eax,12[esp]
+	lock cmpxchg [edx],cl
+	ret
+.code 
+	ALIGN 4
+	PUBLIC c __TBB_machine_fetchadd2
+__TBB_machine_fetchadd2:
+	mov edx,4[esp]
+	mov eax,8[esp]
+	lock xadd [edx],ax
+	ret
+.code 
+	ALIGN 4
+	PUBLIC c __TBB_machine_fetchstore2
+__TBB_machine_fetchstore2:
+	mov edx,4[esp]
+	mov eax,8[esp]
+	lock xchg [edx],ax
+	ret
+.code 
+	ALIGN 4
+	PUBLIC c __TBB_machine_cmpswp2
+__TBB_machine_cmpswp2:
+	mov edx,4[esp]
+	mov ecx,8[esp]
+	mov eax,12[esp]
+	lock cmpxchg [edx],cx
+	ret
+.code 
+	ALIGN 4
+	PUBLIC c __TBB_machine_fetchadd4
+__TBB_machine_fetchadd4:
+	mov edx,4[esp]
+	mov eax,8[esp]
+	lock xadd [edx],eax
+	ret
+.code 
+	ALIGN 4
+	PUBLIC c __TBB_machine_fetchstore4
+__TBB_machine_fetchstore4:
+	mov edx,4[esp]
+	mov eax,8[esp]
+	lock xchg [edx],eax
+	ret
+.code 
+	ALIGN 4
+	PUBLIC c __TBB_machine_cmpswp4
+__TBB_machine_cmpswp4:
+	mov edx,4[esp]
+	mov ecx,8[esp]
+	mov eax,12[esp]
+	lock cmpxchg [edx],ecx
+	ret
+.code 
+	ALIGN 4
+	PUBLIC c __TBB_machine_fetchadd8
+__TBB_machine_fetchadd8:
+	push ebx
+	push edi
+	mov edi,12[esp]
+	mov eax,[edi]
+	mov edx,4[edi]
+__TBB_machine_fetchadd8_loop:
+	mov ebx,16[esp]
+	mov ecx,20[esp]
+	add ebx,eax
+	adc ecx,edx
+	lock cmpxchg8b qword ptr [edi]
+	jnz __TBB_machine_fetchadd8_loop
+	pop edi
+	pop ebx
+	ret
+.code 
+	ALIGN 4
+	PUBLIC c __TBB_machine_fetchstore8
+__TBB_machine_fetchstore8:
+	push ebx
+	push edi
+	mov edi,12[esp]
+	mov ebx,16[esp]
+	mov ecx,20[esp]
+	mov eax,[edi]
+	mov edx,4[edi]
+__TBB_machine_fetchstore8_loop:
+	lock cmpxchg8b qword ptr [edi]
+	jnz __TBB_machine_fetchstore8_loop
+	pop edi
+	pop ebx
+	ret
+.code 
+	ALIGN 4
+	PUBLIC c __TBB_machine_cmpswp8
+__TBB_machine_cmpswp8:
+	push ebx
+	push edi
+	mov edi,12[esp]
+	mov ebx,16[esp]
+	mov ecx,20[esp]
+	mov eax,24[esp]
+	mov edx,28[esp]
+	lock cmpxchg8b qword ptr [edi]
+	pop edi
+	pop ebx
+	ret
+.code 
+	ALIGN 4
+	PUBLIC c __TBB_machine_load8
+__TBB_machine_Load8:
+	; If location is on stack, compiler may have failed to align it correctly, so we do dynamic check.
+	mov ecx,4[esp]
+	test ecx,7
+	jne load_slow
+	; Load within a cache line
+	sub esp,12
+	fild qword ptr [ecx]
+	fistp qword ptr [esp]
+	mov eax,[esp]
+	mov edx,4[esp]
+	add esp,12
+	ret
+load_slow:
+	; Load is misaligned. Use cmpxchg8b.
+	push ebx
+	push edi
+	mov edi,ecx
+	xor eax,eax
+	xor ebx,ebx
+	xor ecx,ecx
+	xor edx,edx
+	lock cmpxchg8b qword ptr [edi]
+	pop edi
+	pop ebx
+	ret
+EXTRN __TBB_machine_store8_slow:PROC
+.code 
+	ALIGN 4
+	PUBLIC c __TBB_machine_store8
+__TBB_machine_Store8:
+	; If location is on stack, compiler may have failed to align it correctly, so we do dynamic check.
+	mov ecx,4[esp]
+	test ecx,7
+	jne __TBB_machine_store8_slow ;; tail call to tbb_misc.cpp
+	fild qword ptr 8[esp]
+	fistp qword ptr [ecx]
+	ret
+end
diff --git a/dep/tbb/src/tbb/ia32-masm/lock_byte.asm b/dep/tbb/src/tbb/ia32-masm/lock_byte.asm
new file mode 100644
index 000000000..4f560c487
--- /dev/null
+++ b/dep/tbb/src/tbb/ia32-masm/lock_byte.asm
@@ -0,0 +1,46 @@
+; Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+;
+; This file is part of Threading Building Blocks.
+;
+; Threading Building Blocks is free software; you can redistribute it
+; and/or modify it under the terms of the GNU General Public License
+; version 2 as published by the Free Software Foundation.
+;
+; Threading Building Blocks is distributed in the hope that it will be
+; useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+; of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+; GNU General Public License for more details.
+;
+; You should have received a copy of the GNU General Public License
+; along with Threading Building Blocks; if not, write to the Free Software
+; Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+;
+; As a special exception, you may use this file as part of a free software
+; library without restriction.  Specifically, if other files instantiate
+; templates or use macros or inline functions from this file, or you compile
+; this file and link it with other files to produce an executable, this
+; file does not by itself cause the resulting executable to be covered by
+; the GNU General Public License.  This exception does not however
+; invalidate any other reasons why the executable file might be covered by
+; the GNU General Public License.
+
+; DO NOT EDIT - AUTOMATICALLY GENERATED FROM .s FILE
+.686
+.model flat,c
+.code 
+	ALIGN 4
+	PUBLIC c __TBB_machine_trylockbyte
+__TBB_machine_trylockbyte:
+	mov edx,4[esp]
+	mov al,[edx]
+	mov cl,1
+	test al,1
+	jnz __TBB_machine_trylockbyte_contended
+	lock cmpxchg [edx],cl
+	jne __TBB_machine_trylockbyte_contended
+	mov eax,1
+	ret
+__TBB_machine_trylockbyte_contended:
+	xor eax,eax
+	ret
+end
diff --git a/dep/tbb/src/tbb/ia64-gas/atomic_support.s b/dep/tbb/src/tbb/ia64-gas/atomic_support.s
new file mode 100644
index 000000000..17502894f
--- /dev/null
+++ b/dep/tbb/src/tbb/ia64-gas/atomic_support.s
@@ -0,0 +1,678 @@
+// Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+//
+// This file is part of Threading Building Blocks.
+//
+// Threading Building Blocks is free software; you can redistribute it
+// and/or modify it under the terms of the GNU General Public License
+// version 2 as published by the Free Software Foundation.
+//
+// Threading Building Blocks is distributed in the hope that it will be
+// useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+// of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with Threading Building Blocks; if not, write to the Free Software
+// Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+//
+// As a special exception, you may use this file as part of a free software
+// library without restriction.  Specifically, if other files instantiate
+// templates or use macros or inline functions from this file, or you compile
+// this file and link it with other files to produce an executable, this
+// file does not by itself cause the resulting executable to be covered by
+// the GNU General Public License.  This exception does not however
+// invalidate any other reasons why the executable file might be covered by
+// the GNU General Public License.
+
+// DO NOT EDIT - AUTOMATICALLY GENERATED FROM tools/generate_atomic/ipf_generate.sh
+# 1 "<stdin>"
+# 1 "<built-in>"
+# 1 "<command line>"
+# 1 "<stdin>"
+
+
+
+
+
+        .section .text
+        .align 16
+
+
+        .proc __TBB_machine_fetchadd1__TBB_full_fence#
+        .global __TBB_machine_fetchadd1__TBB_full_fence#
+__TBB_machine_fetchadd1__TBB_full_fence:
+{
+        mf
+        br __TBB_machine_fetchadd1acquire
+}
+        .endp __TBB_machine_fetchadd1__TBB_full_fence#
+
+        .proc __TBB_machine_fetchadd1acquire#
+        .global __TBB_machine_fetchadd1acquire#
+__TBB_machine_fetchadd1acquire:
+
+
+
+
+
+
+
+        ld1 r9=[r32]
+;;
+Retry_1acquire:
+        mov ar.ccv=r9
+        mov r8=r9;
+        add r10=r9,r33
+;;
+        cmpxchg1.acq r9=[r32],r10,ar.ccv
+;;
+        cmp.ne p7,p0=r8,r9
+  (p7) br.cond.dpnt Retry_1acquire
+        br.ret.sptk.many b0
+# 49 "<stdin>"
+        .endp __TBB_machine_fetchadd1acquire#
+# 62 "<stdin>"
+        .section .text
+        .align 16
+        .proc __TBB_machine_fetchstore1__TBB_full_fence#
+        .global __TBB_machine_fetchstore1__TBB_full_fence#
+__TBB_machine_fetchstore1__TBB_full_fence:
+        mf
+;;
+        xchg1 r8=[r32],r33
+        br.ret.sptk.many b0
+        .endp __TBB_machine_fetchstore1__TBB_full_fence#
+
+
+        .proc __TBB_machine_fetchstore1acquire#
+        .global __TBB_machine_fetchstore1acquire#
+__TBB_machine_fetchstore1acquire:
+        xchg1 r8=[r32],r33
+        br.ret.sptk.many b0
+        .endp __TBB_machine_fetchstore1acquire#
+# 88 "<stdin>"
+        .section .text
+        .align 16
+
+
+        .proc __TBB_machine_cmpswp1__TBB_full_fence#
+        .global __TBB_machine_cmpswp1__TBB_full_fence#
+__TBB_machine_cmpswp1__TBB_full_fence:
+{
+        mf
+        br __TBB_machine_cmpswp1acquire
+}
+        .endp __TBB_machine_cmpswp1__TBB_full_fence#
+
+        .proc __TBB_machine_cmpswp1acquire#
+        .global __TBB_machine_cmpswp1acquire#
+__TBB_machine_cmpswp1acquire:
+
+        zxt1 r34=r34
+;;
+
+        mov ar.ccv=r34
+;;
+        cmpxchg1.acq r8=[r32],r33,ar.ccv
+        br.ret.sptk.many b0
+        .endp __TBB_machine_cmpswp1acquire#
+// DO NOT EDIT - AUTOMATICALLY GENERATED FROM tools/generate_atomic/ipf_generate.sh
+# 1 "<stdin>"
+# 1 "<built-in>"
+# 1 "<command line>"
+# 1 "<stdin>"
+
+
+
+
+
+        .section .text
+        .align 16
+
+
+        .proc __TBB_machine_fetchadd2__TBB_full_fence#
+        .global __TBB_machine_fetchadd2__TBB_full_fence#
+__TBB_machine_fetchadd2__TBB_full_fence:
+{
+        mf
+        br __TBB_machine_fetchadd2acquire
+}
+        .endp __TBB_machine_fetchadd2__TBB_full_fence#
+
+        .proc __TBB_machine_fetchadd2acquire#
+        .global __TBB_machine_fetchadd2acquire#
+__TBB_machine_fetchadd2acquire:
+
+
+
+
+
+
+
+        ld2 r9=[r32]
+;;
+Retry_2acquire:
+        mov ar.ccv=r9
+        mov r8=r9;
+        add r10=r9,r33
+;;
+        cmpxchg2.acq r9=[r32],r10,ar.ccv
+;;
+        cmp.ne p7,p0=r8,r9
+  (p7) br.cond.dpnt Retry_2acquire
+        br.ret.sptk.many b0
+# 49 "<stdin>"
+        .endp __TBB_machine_fetchadd2acquire#
+# 62 "<stdin>"
+        .section .text
+        .align 16
+        .proc __TBB_machine_fetchstore2__TBB_full_fence#
+        .global __TBB_machine_fetchstore2__TBB_full_fence#
+__TBB_machine_fetchstore2__TBB_full_fence:
+        mf
+;;
+        xchg2 r8=[r32],r33
+        br.ret.sptk.many b0
+        .endp __TBB_machine_fetchstore2__TBB_full_fence#
+
+
+        .proc __TBB_machine_fetchstore2acquire#
+        .global __TBB_machine_fetchstore2acquire#
+__TBB_machine_fetchstore2acquire:
+        xchg2 r8=[r32],r33
+        br.ret.sptk.many b0
+        .endp __TBB_machine_fetchstore2acquire#
+# 88 "<stdin>"
+        .section .text
+        .align 16
+
+
+        .proc __TBB_machine_cmpswp2__TBB_full_fence#
+        .global __TBB_machine_cmpswp2__TBB_full_fence#
+__TBB_machine_cmpswp2__TBB_full_fence:
+{
+        mf
+        br __TBB_machine_cmpswp2acquire
+}
+        .endp __TBB_machine_cmpswp2__TBB_full_fence#
+
+        .proc __TBB_machine_cmpswp2acquire#
+        .global __TBB_machine_cmpswp2acquire#
+__TBB_machine_cmpswp2acquire:
+
+        zxt2 r34=r34
+;;
+
+        mov ar.ccv=r34
+;;
+        cmpxchg2.acq r8=[r32],r33,ar.ccv
+        br.ret.sptk.many b0
+        .endp __TBB_machine_cmpswp2acquire#
+// DO NOT EDIT - AUTOMATICALLY GENERATED FROM tools/generate_atomic/ipf_generate.sh
+# 1 "<stdin>"
+# 1 "<built-in>"
+# 1 "<command line>"
+# 1 "<stdin>"
+
+
+
+
+
+        .section .text
+        .align 16
+
+
+        .proc __TBB_machine_fetchadd4__TBB_full_fence#
+        .global __TBB_machine_fetchadd4__TBB_full_fence#
+__TBB_machine_fetchadd4__TBB_full_fence:
+{
+        mf
+        br __TBB_machine_fetchadd4acquire
+}
+        .endp __TBB_machine_fetchadd4__TBB_full_fence#
+
+        .proc __TBB_machine_fetchadd4acquire#
+        .global __TBB_machine_fetchadd4acquire#
+__TBB_machine_fetchadd4acquire:
+
+        cmp.eq p6,p0=1,r33
+        cmp.eq p8,p0=-1,r33
+  (p6) br.cond.dptk Inc_4acquire
+  (p8) br.cond.dpnt Dec_4acquire
+;;
+
+        ld4 r9=[r32]
+;;
+Retry_4acquire:
+        mov ar.ccv=r9
+        mov r8=r9;
+        add r10=r9,r33
+;;
+        cmpxchg4.acq r9=[r32],r10,ar.ccv
+;;
+        cmp.ne p7,p0=r8,r9
+  (p7) br.cond.dpnt Retry_4acquire
+        br.ret.sptk.many b0
+
+Inc_4acquire:
+        fetchadd4.acq r8=[r32],1
+        br.ret.sptk.many b0
+Dec_4acquire:
+        fetchadd4.acq r8=[r32],-1
+        br.ret.sptk.many b0
+
+        .endp __TBB_machine_fetchadd4acquire#
+# 62 "<stdin>"
+        .section .text
+        .align 16
+        .proc __TBB_machine_fetchstore4__TBB_full_fence#
+        .global __TBB_machine_fetchstore4__TBB_full_fence#
+__TBB_machine_fetchstore4__TBB_full_fence:
+        mf
+;;
+        xchg4 r8=[r32],r33
+        br.ret.sptk.many b0
+        .endp __TBB_machine_fetchstore4__TBB_full_fence#
+
+
+        .proc __TBB_machine_fetchstore4acquire#
+        .global __TBB_machine_fetchstore4acquire#
+__TBB_machine_fetchstore4acquire:
+        xchg4 r8=[r32],r33
+        br.ret.sptk.many b0
+        .endp __TBB_machine_fetchstore4acquire#
+# 88 "<stdin>"
+        .section .text
+        .align 16
+
+
+        .proc __TBB_machine_cmpswp4__TBB_full_fence#
+        .global __TBB_machine_cmpswp4__TBB_full_fence#
+__TBB_machine_cmpswp4__TBB_full_fence:
+{
+        mf
+        br __TBB_machine_cmpswp4acquire
+}
+        .endp __TBB_machine_cmpswp4__TBB_full_fence#
+
+        .proc __TBB_machine_cmpswp4acquire#
+        .global __TBB_machine_cmpswp4acquire#
+__TBB_machine_cmpswp4acquire:
+
+        zxt4 r34=r34
+;;
+
+        mov ar.ccv=r34
+;;
+        cmpxchg4.acq r8=[r32],r33,ar.ccv
+        br.ret.sptk.many b0
+        .endp __TBB_machine_cmpswp4acquire#
+// DO NOT EDIT - AUTOMATICALLY GENERATED FROM tools/generate_atomic/ipf_generate.sh
+# 1 "<stdin>"
+# 1 "<built-in>"
+# 1 "<command line>"
+# 1 "<stdin>"
+
+
+
+
+
+        .section .text
+        .align 16
+
+
+        .proc __TBB_machine_fetchadd8__TBB_full_fence#
+        .global __TBB_machine_fetchadd8__TBB_full_fence#
+__TBB_machine_fetchadd8__TBB_full_fence:
+{
+        mf
+        br __TBB_machine_fetchadd8acquire
+}
+        .endp __TBB_machine_fetchadd8__TBB_full_fence#
+
+        .proc __TBB_machine_fetchadd8acquire#
+        .global __TBB_machine_fetchadd8acquire#
+__TBB_machine_fetchadd8acquire:
+
+        cmp.eq p6,p0=1,r33
+        cmp.eq p8,p0=-1,r33
+  (p6) br.cond.dptk Inc_8acquire
+  (p8) br.cond.dpnt Dec_8acquire
+;;
+
+        ld8 r9=[r32]
+;;
+Retry_8acquire:
+        mov ar.ccv=r9
+        mov r8=r9;
+        add r10=r9,r33
+;;
+        cmpxchg8.acq r9=[r32],r10,ar.ccv
+;;
+        cmp.ne p7,p0=r8,r9
+  (p7) br.cond.dpnt Retry_8acquire
+        br.ret.sptk.many b0
+
+Inc_8acquire:
+        fetchadd8.acq r8=[r32],1
+        br.ret.sptk.many b0
+Dec_8acquire:
+        fetchadd8.acq r8=[r32],-1
+        br.ret.sptk.many b0
+
+        .endp __TBB_machine_fetchadd8acquire#
+# 62 "<stdin>"
+        .section .text
+        .align 16
+        .proc __TBB_machine_fetchstore8__TBB_full_fence#
+        .global __TBB_machine_fetchstore8__TBB_full_fence#
+__TBB_machine_fetchstore8__TBB_full_fence:
+        mf
+;;
+        xchg8 r8=[r32],r33
+        br.ret.sptk.many b0
+        .endp __TBB_machine_fetchstore8__TBB_full_fence#
+
+
+        .proc __TBB_machine_fetchstore8acquire#
+        .global __TBB_machine_fetchstore8acquire#
+__TBB_machine_fetchstore8acquire:
+        xchg8 r8=[r32],r33
+        br.ret.sptk.many b0
+        .endp __TBB_machine_fetchstore8acquire#
+# 88 "<stdin>"
+        .section .text
+        .align 16
+
+
+        .proc __TBB_machine_cmpswp8__TBB_full_fence#
+        .global __TBB_machine_cmpswp8__TBB_full_fence#
+__TBB_machine_cmpswp8__TBB_full_fence:
+{
+        mf
+        br __TBB_machine_cmpswp8acquire
+}
+        .endp __TBB_machine_cmpswp8__TBB_full_fence#
+
+        .proc __TBB_machine_cmpswp8acquire#
+        .global __TBB_machine_cmpswp8acquire#
+__TBB_machine_cmpswp8acquire:
+
+
+
+
+        mov ar.ccv=r34
+;;
+        cmpxchg8.acq r8=[r32],r33,ar.ccv
+        br.ret.sptk.many b0
+        .endp __TBB_machine_cmpswp8acquire#
+// DO NOT EDIT - AUTOMATICALLY GENERATED FROM tools/generate_atomic/ipf_generate.sh
+# 1 "<stdin>"
+# 1 "<built-in>"
+# 1 "<command line>"
+# 1 "<stdin>"
+
+
+
+
+
+        .section .text
+        .align 16
+# 19 "<stdin>"
+        .proc __TBB_machine_fetchadd1release#
+        .global __TBB_machine_fetchadd1release#
+__TBB_machine_fetchadd1release:
+
+
+
+
+
+
+
+        ld1 r9=[r32]
+;;
+Retry_1release:
+        mov ar.ccv=r9
+        mov r8=r9;
+        add r10=r9,r33
+;;
+        cmpxchg1.rel r9=[r32],r10,ar.ccv
+;;
+        cmp.ne p7,p0=r8,r9
+  (p7) br.cond.dpnt Retry_1release
+        br.ret.sptk.many b0
+# 49 "<stdin>"
+        .endp __TBB_machine_fetchadd1release#
+# 62 "<stdin>"
+        .section .text
+        .align 16
+        .proc __TBB_machine_fetchstore1release#
+        .global __TBB_machine_fetchstore1release#
+__TBB_machine_fetchstore1release:
+        mf
+;;
+        xchg1 r8=[r32],r33
+        br.ret.sptk.many b0
+        .endp __TBB_machine_fetchstore1release#
+# 88 "<stdin>"
+        .section .text
+        .align 16
+# 101 "<stdin>"
+        .proc __TBB_machine_cmpswp1release#
+        .global __TBB_machine_cmpswp1release#
+__TBB_machine_cmpswp1release:
+
+        zxt1 r34=r34
+;;
+
+        mov ar.ccv=r34
+;;
+        cmpxchg1.rel r8=[r32],r33,ar.ccv
+        br.ret.sptk.many b0
+        .endp __TBB_machine_cmpswp1release#
+// DO NOT EDIT - AUTOMATICALLY GENERATED FROM tools/generate_atomic/ipf_generate.sh
+# 1 "<stdin>"
+# 1 "<built-in>"
+# 1 "<command line>"
+# 1 "<stdin>"
+
+
+
+
+
+        .section .text
+        .align 16
+# 19 "<stdin>"
+        .proc __TBB_machine_fetchadd2release#
+        .global __TBB_machine_fetchadd2release#
+__TBB_machine_fetchadd2release:
+
+
+
+
+
+
+
+        ld2 r9=[r32]
+;;
+Retry_2release:
+        mov ar.ccv=r9
+        mov r8=r9;
+        add r10=r9,r33
+;;
+        cmpxchg2.rel r9=[r32],r10,ar.ccv
+;;
+        cmp.ne p7,p0=r8,r9
+  (p7) br.cond.dpnt Retry_2release
+        br.ret.sptk.many b0
+# 49 "<stdin>"
+        .endp __TBB_machine_fetchadd2release#
+# 62 "<stdin>"
+        .section .text
+        .align 16
+        .proc __TBB_machine_fetchstore2release#
+        .global __TBB_machine_fetchstore2release#
+__TBB_machine_fetchstore2release:
+        mf
+;;
+        xchg2 r8=[r32],r33
+        br.ret.sptk.many b0
+        .endp __TBB_machine_fetchstore2release#
+# 88 "<stdin>"
+        .section .text
+        .align 16
+# 101 "<stdin>"
+        .proc __TBB_machine_cmpswp2release#
+        .global __TBB_machine_cmpswp2release#
+__TBB_machine_cmpswp2release:
+
+        zxt2 r34=r34
+;;
+
+        mov ar.ccv=r34
+;;
+        cmpxchg2.rel r8=[r32],r33,ar.ccv
+        br.ret.sptk.many b0
+        .endp __TBB_machine_cmpswp2release#
+// DO NOT EDIT - AUTOMATICALLY GENERATED FROM tools/generate_atomic/ipf_generate.sh
+# 1 "<stdin>"
+# 1 "<built-in>"
+# 1 "<command line>"
+# 1 "<stdin>"
+
+
+
+
+
+        .section .text
+        .align 16
+# 19 "<stdin>"
+        .proc __TBB_machine_fetchadd4release#
+        .global __TBB_machine_fetchadd4release#
+__TBB_machine_fetchadd4release:
+
+        cmp.eq p6,p0=1,r33
+        cmp.eq p8,p0=-1,r33
+  (p6) br.cond.dptk Inc_4release
+  (p8) br.cond.dpnt Dec_4release
+;;
+
+        ld4 r9=[r32]
+;;
+Retry_4release:
+        mov ar.ccv=r9
+        mov r8=r9;
+        add r10=r9,r33
+;;
+        cmpxchg4.rel r9=[r32],r10,ar.ccv
+;;
+        cmp.ne p7,p0=r8,r9
+  (p7) br.cond.dpnt Retry_4release
+        br.ret.sptk.many b0
+
+Inc_4release:
+        fetchadd4.rel r8=[r32],1
+        br.ret.sptk.many b0
+Dec_4release:
+        fetchadd4.rel r8=[r32],-1
+        br.ret.sptk.many b0
+
+        .endp __TBB_machine_fetchadd4release#
+# 62 "<stdin>"
+        .section .text
+        .align 16
+        .proc __TBB_machine_fetchstore4release#
+        .global __TBB_machine_fetchstore4release#
+__TBB_machine_fetchstore4release:
+        mf
+;;
+        xchg4 r8=[r32],r33
+        br.ret.sptk.many b0
+        .endp __TBB_machine_fetchstore4release#
+# 88 "<stdin>"
+        .section .text
+        .align 16
+# 101 "<stdin>"
+        .proc __TBB_machine_cmpswp4release#
+        .global __TBB_machine_cmpswp4release#
+__TBB_machine_cmpswp4release:
+
+        zxt4 r34=r34
+;;
+
+        mov ar.ccv=r34
+;;
+        cmpxchg4.rel r8=[r32],r33,ar.ccv
+        br.ret.sptk.many b0
+        .endp __TBB_machine_cmpswp4release#
+// DO NOT EDIT - AUTOMATICALLY GENERATED FROM tools/generate_atomic/ipf_generate.sh
+# 1 "<stdin>"
+# 1 "<built-in>"
+# 1 "<command line>"
+# 1 "<stdin>"
+
+
+
+
+
+        .section .text
+        .align 16
+# 19 "<stdin>"
+        .proc __TBB_machine_fetchadd8release#
+        .global __TBB_machine_fetchadd8release#
+__TBB_machine_fetchadd8release:
+
+        cmp.eq p6,p0=1,r33
+        cmp.eq p8,p0=-1,r33
+  (p6) br.cond.dptk Inc_8release
+  (p8) br.cond.dpnt Dec_8release
+;;
+
+        ld8 r9=[r32]
+;;
+Retry_8release:
+        mov ar.ccv=r9
+        mov r8=r9;
+        add r10=r9,r33
+;;
+        cmpxchg8.rel r9=[r32],r10,ar.ccv
+;;
+        cmp.ne p7,p0=r8,r9
+  (p7) br.cond.dpnt Retry_8release
+        br.ret.sptk.many b0
+
+Inc_8release:
+        fetchadd8.rel r8=[r32],1
+        br.ret.sptk.many b0
+Dec_8release:
+        fetchadd8.rel r8=[r32],-1
+        br.ret.sptk.many b0
+
+        .endp __TBB_machine_fetchadd8release#
+# 62 "<stdin>"
+        .section .text
+        .align 16
+        .proc __TBB_machine_fetchstore8release#
+        .global __TBB_machine_fetchstore8release#
+__TBB_machine_fetchstore8release:
+        mf
+;;
+        xchg8 r8=[r32],r33
+        br.ret.sptk.many b0
+        .endp __TBB_machine_fetchstore8release#
+# 88 "<stdin>"
+        .section .text
+        .align 16
+# 101 "<stdin>"
+        .proc __TBB_machine_cmpswp8release#
+        .global __TBB_machine_cmpswp8release#
+__TBB_machine_cmpswp8release:
+
+
+
+
+        mov ar.ccv=r34
+;;
+        cmpxchg8.rel r8=[r32],r33,ar.ccv
+        br.ret.sptk.many b0
+        .endp __TBB_machine_cmpswp8release#
diff --git a/dep/tbb/src/tbb/ia64-gas/ia64_misc.s b/dep/tbb/src/tbb/ia64-gas/ia64_misc.s
new file mode 100644
index 000000000..999bfb9ba
--- /dev/null
+++ b/dep/tbb/src/tbb/ia64-gas/ia64_misc.s
@@ -0,0 +1,35 @@
+// Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+//
+// This file is part of Threading Building Blocks.
+//
+// Threading Building Blocks is free software; you can redistribute it
+// and/or modify it under the terms of the GNU General Public License
+// version 2 as published by the Free Software Foundation.
+//
+// Threading Building Blocks is distributed in the hope that it will be
+// useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+// of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with Threading Building Blocks; if not, write to the Free Software
+// Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+//
+// As a special exception, you may use this file as part of a free software
+// library without restriction.  Specifically, if other files instantiate
+// templates or use macros or inline functions from this file, or you compile
+// this file and link it with other files to produce an executable, this
+// file does not by itself cause the resulting executable to be covered by
+// the GNU General Public License.  This exception does not however
+// invalidate any other reasons why the executable file might be covered by
+// the GNU General Public License.
+
+	// RSE backing store pointer retrieval
+    .section .text
+    .align 16
+    .proc __TBB_get_bsp#
+    .global __TBB_get_bsp#
+__TBB_get_bsp:
+        mov r8=ar.bsp
+        br.ret.sptk.many b0
+    .endp __TBB_get_bsp#
diff --git a/dep/tbb/src/tbb/ia64-gas/lock_byte.s b/dep/tbb/src/tbb/ia64-gas/lock_byte.s
new file mode 100644
index 000000000..e7f199d89
--- /dev/null
+++ b/dep/tbb/src/tbb/ia64-gas/lock_byte.s
@@ -0,0 +1,54 @@
+// Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+//
+// This file is part of Threading Building Blocks.
+//
+// Threading Building Blocks is free software; you can redistribute it
+// and/or modify it under the terms of the GNU General Public License
+// version 2 as published by the Free Software Foundation.
+//
+// Threading Building Blocks is distributed in the hope that it will be
+// useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+// of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with Threading Building Blocks; if not, write to the Free Software
+// Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+//
+// As a special exception, you may use this file as part of a free software
+// library without restriction.  Specifically, if other files instantiate
+// templates or use macros or inline functions from this file, or you compile
+// this file and link it with other files to produce an executable, this
+// file does not by itself cause the resulting executable to be covered by
+// the GNU General Public License.  This exception does not however
+// invalidate any other reasons why the executable file might be covered by
+// the GNU General Public License.
+
+	// Support for class TinyLock
+	.section .text
+	.align 16
+	// unsigned int __TBB_machine_trylockbyte( byte& flag );
+	// r32 = address of flag 
+	.proc  __TBB_machine_trylockbyte#
+	.global __TBB_machine_trylockbyte#
+ADDRESS_OF_FLAG=r32
+RETCODE=r8
+FLAG=r9
+BUSY=r10
+SCRATCH=r11
+__TBB_machine_trylockbyte:
+        ld1.acq FLAG=[ADDRESS_OF_FLAG]
+        mov BUSY=1
+        mov RETCODE=0
+;;
+        cmp.ne p6,p0=0,FLAG
+        mov ar.ccv=r0
+(p6)    br.ret.sptk.many b0
+;;
+        cmpxchg1.acq SCRATCH=[ADDRESS_OF_FLAG],BUSY,ar.ccv  // Try to acquire lock
+;;
+        cmp.eq p6,p0=0,SCRATCH
+;;
+(p6)    mov RETCODE=1
+   	br.ret.sptk.many b0	
+	.endp __TBB_machine_trylockbyte#
diff --git a/dep/tbb/src/tbb/ia64-gas/log2.s b/dep/tbb/src/tbb/ia64-gas/log2.s
new file mode 100644
index 000000000..2a4288898
--- /dev/null
+++ b/dep/tbb/src/tbb/ia64-gas/log2.s
@@ -0,0 +1,67 @@
+// Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+//
+// This file is part of Threading Building Blocks.
+//
+// Threading Building Blocks is free software; you can redistribute it
+// and/or modify it under the terms of the GNU General Public License
+// version 2 as published by the Free Software Foundation.
+//
+// Threading Building Blocks is distributed in the hope that it will be
+// useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+// of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with Threading Building Blocks; if not, write to the Free Software
+// Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+//
+// As a special exception, you may use this file as part of a free software
+// library without restriction.  Specifically, if other files instantiate
+// templates or use macros or inline functions from this file, or you compile
+// this file and link it with other files to produce an executable, this
+// file does not by itself cause the resulting executable to be covered by
+// the GNU General Public License.  This exception does not however
+// invalidate any other reasons why the executable file might be covered by
+// the GNU General Public License.
+
+	// Support for class ConcurrentVector
+	.section .text
+	.align 16
+	// unsigned long __TBB_machine_lg( unsigned long x );
+	// r32 = x
+	.proc  __TBB_machine_lg#
+	.global __TBB_machine_lg#
+__TBB_machine_lg:
+        shr r16=r32,1	// .x
+;;
+        shr r17=r32,2	// ..x
+	or r32=r32,r16	// xx
+;;
+	shr r16=r32,3	// ...xx
+	or r32=r32,r17  // xxx
+;;
+	shr r17=r32,5	// .....xxx
+	or r32=r32,r16  // xxxxx
+;;
+	shr r16=r32,8	// ........xxxxx
+	or r32=r32,r17	// xxxxxxxx
+;;
+	shr r17=r32,13
+	or r32=r32,r16	// 13x
+;;
+	shr r16=r32,21
+	or r32=r32,r17	// 21x
+;;
+	shr r17=r32,34  
+	or r32=r32,r16	// 34x
+;;
+	shr r16=r32,55
+	or r32=r32,r17  // 55x
+;;
+	or r32=r32,r16  // 64x
+;;
+	popcnt r8=r32
+;;
+	add r8=-1,r8
+   	br.ret.sptk.many b0	
+	.endp __TBB_machine_lg#
diff --git a/dep/tbb/src/tbb/ia64-gas/pause.s b/dep/tbb/src/tbb/ia64-gas/pause.s
new file mode 100644
index 000000000..bead89bcd
--- /dev/null
+++ b/dep/tbb/src/tbb/ia64-gas/pause.s
@@ -0,0 +1,41 @@
+// Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+//
+// This file is part of Threading Building Blocks.
+//
+// Threading Building Blocks is free software; you can redistribute it
+// and/or modify it under the terms of the GNU General Public License
+// version 2 as published by the Free Software Foundation.
+//
+// Threading Building Blocks is distributed in the hope that it will be
+// useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+// of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with Threading Building Blocks; if not, write to the Free Software
+// Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+//
+// As a special exception, you may use this file as part of a free software
+// library without restriction.  Specifically, if other files instantiate
+// templates or use macros or inline functions from this file, or you compile
+// this file and link it with other files to produce an executable, this
+// file does not by itself cause the resulting executable to be covered by
+// the GNU General Public License.  This exception does not however
+// invalidate any other reasons why the executable file might be covered by
+// the GNU General Public License.
+
+	.section .text
+	.align 16
+	// void __TBB_machine_pause( long count );
+	// r32 = count
+	.proc  __TBB_machine_pause#
+	.global __TBB_machine_pause#
+count = r32
+__TBB_machine_pause:
+        hint.m 0
+	add count=-1,count
+;;
+	cmp.eq p6,p7=0,count
+(p7)	br.cond.dpnt __TBB_machine_pause
+(p6)   	br.ret.sptk.many b0	
+	.endp __TBB_machine_pause#
diff --git a/dep/tbb/src/tbb/ibm_aix51/atomic_support.c b/dep/tbb/src/tbb/ibm_aix51/atomic_support.c
new file mode 100644
index 000000000..2e052d772
--- /dev/null
+++ b/dep/tbb/src/tbb/ibm_aix51/atomic_support.c
@@ -0,0 +1,55 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include <stdint.h>
+#include <sys/atomic_op.h>
+
+/* This file must be compiled with gcc.  The IBM compiler doesn't seem to
+   support inline assembly statements (October 2007). */
+
+#ifdef __GNUC__
+
+int32_t __TBB_machine_cas_32 (volatile void* ptr, int32_t value, int32_t comparand) { 
+    __asm__ __volatile__ ("sync\n");  /* memory release operation */
+    compare_and_swap ((atomic_p) ptr, &comparand, value);
+    __asm__ __volatile__ ("sync\n");  /* memory acquire operation */
+    return comparand;
+}
+
+int64_t __TBB_machine_cas_64 (volatile void* ptr, int64_t value, int64_t comparand) { 
+    __asm__ __volatile__ ("sync\n");  /* memory release operation */
+    compare_and_swaplp ((atomic_l) ptr, &comparand, value);
+    __asm__ __volatile__ ("sync\n");  /* memory acquire operation */
+    return comparand;
+}
+
+void __TBB_machine_flush () { 
+    __asm__ __volatile__ ("sync\n");
+}
+
+#endif /* __GNUC__ */
diff --git a/dep/tbb/src/tbb/index.html b/dep/tbb/src/tbb/index.html
new file mode 100644
index 000000000..c927b94a4
--- /dev/null
+++ b/dep/tbb/src/tbb/index.html
@@ -0,0 +1,32 @@
+<HTML>
+<BODY>
+
+<H2>Overview</H2>
+This directory contains the source code of the TBB core components.
+
+<H2>Directories</H2>
+<DL>
+<DT><A HREF="tools_api">tools_api</A>
+<DD>Source code of the interface components provided by the Intel&reg; Parallel Studio tools.
+<DT><A HREF="intel64-masm">intel64-masm</A>
+<DD>Assembly code for the Intel&reg; 64 architecture.
+<DT><A HREF="ia32-masm">ia32-masm</A>
+<DD>Assembly code for IA32 architecture.
+<DT><A HREF="ia64-gas">ia64-gas</A>
+<DD>Assembly code for IA64 architecture.
+<DT><A HREF="ibm_aix51">ibm_aix51</A>
+<DD>Assembly code for AIX 5.1 port.
+</DL>
+
+<HR>
+<A HREF="../index.html">Up to parent directory</A>
+<p></p>
+Copyright &copy; 2005-2009 Intel Corporation.  All Rights Reserved.
+<p></p>
+Intel, Pentium, Intel Xeon, Itanium, Intel XScale and VTune are 
+registered trademarks or trademarks of Intel Corporation or its 
+subsidiaries in the United States and other countries. 
+<p></p>
+* Other names and brands may be claimed as the property of others.
+</BODY>
+</HTML>
diff --git a/dep/tbb/src/tbb/intel64-masm/atomic_support.asm b/dep/tbb/src/tbb/intel64-masm/atomic_support.asm
new file mode 100644
index 000000000..86a240864
--- /dev/null
+++ b/dep/tbb/src/tbb/intel64-masm/atomic_support.asm
@@ -0,0 +1,80 @@
+; Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+;
+; This file is part of Threading Building Blocks.
+;
+; Threading Building Blocks is free software; you can redistribute it
+; and/or modify it under the terms of the GNU General Public License
+; version 2 as published by the Free Software Foundation.
+;
+; Threading Building Blocks is distributed in the hope that it will be
+; useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+; of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+; GNU General Public License for more details.
+;
+; You should have received a copy of the GNU General Public License
+; along with Threading Building Blocks; if not, write to the Free Software
+; Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+;
+; As a special exception, you may use this file as part of a free software
+; library without restriction.  Specifically, if other files instantiate
+; templates or use macros or inline functions from this file, or you compile
+; this file and link it with other files to produce an executable, this
+; file does not by itself cause the resulting executable to be covered by
+; the GNU General Public License.  This exception does not however
+; invalidate any other reasons why the executable file might be covered by
+; the GNU General Public License.
+
+; DO NOT EDIT - AUTOMATICALLY GENERATED FROM .s FILE
+.code 
+	ALIGN 8
+	PUBLIC __TBB_machine_fetchadd1
+__TBB_machine_fetchadd1:
+	mov rax,rdx
+	lock xadd [rcx],al
+	ret
+.code 
+	ALIGN 8
+	PUBLIC __TBB_machine_fetchstore1
+__TBB_machine_fetchstore1:
+	mov rax,rdx
+	lock xchg [rcx],al
+	ret
+.code 
+	ALIGN 8
+	PUBLIC __TBB_machine_cmpswp1
+__TBB_machine_cmpswp1:
+	mov rax,r8
+	lock cmpxchg [rcx],dl
+	ret
+.code 
+	ALIGN 8
+	PUBLIC __TBB_machine_fetchadd2
+__TBB_machine_fetchadd2:
+	mov rax,rdx
+	lock xadd [rcx],ax
+	ret
+.code 
+	ALIGN 8
+	PUBLIC __TBB_machine_fetchstore2
+__TBB_machine_fetchstore2:
+	mov rax,rdx
+	lock xchg [rcx],ax
+	ret
+.code 
+	ALIGN 8
+	PUBLIC __TBB_machine_cmpswp2
+__TBB_machine_cmpswp2:
+	mov rax,r8
+	lock cmpxchg [rcx],dx
+	ret
+.code
+        ALIGN 8
+        PUBLIC __TBB_machine_pause
+__TBB_machine_pause:
+L1:
+        dw 090f3H; pause
+        add ecx,-1
+        jne L1
+        ret
+end
+
diff --git a/dep/tbb/src/tbb/itt_notify.cpp b/dep/tbb/src/tbb/itt_notify.cpp
new file mode 100644
index 000000000..27ebbfff9
--- /dev/null
+++ b/dep/tbb/src/tbb/itt_notify.cpp
@@ -0,0 +1,273 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "itt_notify.h"
+#include "tbb/tbb_machine.h"
+
+#include <stdio.h>
+
+namespace tbb {
+    namespace internal {
+
+#if __TBB_NEW_ITT_NOTIFY
+#if DO_ITT_NOTIFY
+
+    extern "C" int __TBB_load_ittnotify();
+
+    bool InitializeITT () {
+        return __TBB_load_ittnotify() != 0;
+    }
+
+
+#endif /* DO_ITT_NOTIFY */
+#endif /* __TBB_NEW_ITT_NOTIFY */
+
+    void itt_store_pointer_with_release_v3( void* dst, void* src ) {
+        ITT_NOTIFY(sync_releasing, dst);
+        __TBB_store_with_release(*static_cast<void**>(dst),src);
+    }
+
+    void* itt_load_pointer_with_acquire_v3( const void* src ) {
+        void* result = __TBB_load_with_acquire(*static_cast<void*const*>(src));
+        ITT_NOTIFY(sync_acquired, const_cast<void*>(src));
+        return result;
+    }
+
+    void* itt_load_pointer_v3( const void* src ) {
+        void* result = *static_cast<void*const*>(src);
+        return result;
+    }
+
+    void itt_set_sync_name_v3( void* obj, const tchar* name) {
+        ITT_SYNC_RENAME(obj, name);
+        (void)obj, (void)name;  // Prevents compiler warning when ITT support is switched off
+    }
+
+    } // namespace internal
+} // namespace tbb
+
+
+#if !__TBB_NEW_ITT_NOTIFY
+
+#include "tbb_misc.h"
+#include "dynamic_link.h"
+#include "tbb/cache_aligned_allocator.h" /* NFS_MaxLineSize */
+
+#if _WIN32||_WIN64
+    #include <windows.h>
+#else /* !WIN */
+    #include <dlfcn.h>
+#if __TBB_WEAK_SYMBOLS
+    #pragma weak __itt_notify_sync_prepare
+    #pragma weak __itt_notify_sync_acquired
+    #pragma weak __itt_notify_sync_releasing
+    #pragma weak __itt_notify_sync_cancel
+    #pragma weak __itt_thr_name_set
+    #pragma weak __itt_thread_set_name
+    #pragma weak __itt_sync_create
+    #pragma weak __itt_sync_rename
+    extern "C" {
+        void __itt_notify_sync_prepare(void *p);
+        void __itt_notify_sync_cancel(void *p);
+        void __itt_notify_sync_acquired(void *p);
+        void __itt_notify_sync_releasing(void *p);
+        int __itt_thr_name_set (void* p, int len);
+        void __itt_thread_set_name (const char* name);
+        void __itt_sync_create( void* obj, const char* name, const char* type, int attribute );
+        void __itt_sync_rename( void* obj, const char* new_name );
+    }
+#endif /* __TBB_WEAK_SYMBOLS */
+#endif /* !WIN */
+
+namespace tbb {
+namespace internal {
+
+#if DO_ITT_NOTIFY
+
+
+//! Table describing the __itt_notify handlers.
+static const dynamic_link_descriptor ITT_HandlerTable[] = {
+    DLD( __itt_notify_sync_prepare, ITT_Handler_sync_prepare),
+    DLD( __itt_notify_sync_acquired, ITT_Handler_sync_acquired),
+    DLD( __itt_notify_sync_releasing, ITT_Handler_sync_releasing),
+    DLD( __itt_notify_sync_cancel, ITT_Handler_sync_cancel),
+# if _WIN32||_WIN64
+    DLD( __itt_thr_name_setW, ITT_Handler_thr_name_set),
+    DLD( __itt_thread_set_nameW, ITT_Handler_thread_set_name),
+# else
+    DLD( __itt_thr_name_set, ITT_Handler_thr_name_set),
+    DLD( __itt_thread_set_name, ITT_Handler_thread_set_name),
+# endif /* _WIN32 || _WIN64 */
+
+
+# if _WIN32||_WIN64
+    DLD( __itt_sync_createW, ITT_Handler_sync_create),
+    DLD( __itt_sync_renameW, ITT_Handler_sync_rename)
+# else
+    DLD( __itt_sync_create, ITT_Handler_sync_create),
+    DLD( __itt_sync_rename, ITT_Handler_sync_rename)
+# endif
+};
+
+static const int ITT_HandlerTable_size = 
+    sizeof(ITT_HandlerTable)/sizeof(dynamic_link_descriptor);
+
+// LIBITTNOTIFY_NAME is the name of the ITT notification library 
+# if _WIN32||_WIN64
+#  define LIBITTNOTIFY_NAME "libittnotify.dll"
+# elif __linux__
+#  define LIBITTNOTIFY_NAME "libittnotify.so"
+# else
+#  error Intel(R) Threading Tools not provided for this OS
+# endif
+
+//! Performs tools support initialization.
+/** Is called by DoOneTimeInitializations and ITT_DoOneTimeInitialization in 
+    a protected (one-time) manner. Not to be invoked directly. **/
+bool InitializeITT() {
+    bool result = false;
+    // Check if we are running under a performance or correctness tool
+    bool t_checker = GetBoolEnvironmentVariable("KMP_FOR_TCHECK");
+    bool t_profiler = GetBoolEnvironmentVariable("KMP_FOR_TPROFILE");
+	__TBB_ASSERT(!(t_checker&&t_profiler), NULL);
+    if ( t_checker || t_profiler ) {
+        // Yes, we are in the tool mode. Try to load libittnotify library.
+        result = dynamic_link( LIBITTNOTIFY_NAME, ITT_HandlerTable, ITT_HandlerTable_size, 4 );
+    }
+    if (result){
+        if ( t_checker ) {
+            current_tool = ITC;
+        } else if ( t_profiler ) {
+            current_tool = ITP;
+        }
+    } else {
+        // Clear away the proxy (dummy) handlers
+        for (int i = 0; i < ITT_HandlerTable_size; i++)
+            *ITT_HandlerTable[i].handler = NULL;
+        current_tool = NONE;
+    }
+    PrintExtraVersionInfo( "ITT", result?"yes":"no" );
+    return result;
+}
+
+//! Performs one-time initialization of tools interoperability mechanisms.
+/** Defined in task.cpp. Makes a protected do-once call to InitializeITT(). **/
+void ITT_DoOneTimeInitialization();
+
+/** The following dummy_xxx functions are proxies that correspond to tool notification 
+    APIs and are used to initialize corresponding pointers to the tool notifications
+    (ITT_Handler_xxx). When the first call to ITT_Handler_xxx takes place before 
+    the whole library initialization (done by DoOneTimeInitializations) happened,
+    the proxy handler performs initialization of the tools support. After this
+    ITT_Handler_xxx will be set to either tool notification pointer or NULL. **/
+void dummy_sync_prepare( volatile void* ptr ) {
+    ITT_DoOneTimeInitialization();
+    __TBB_ASSERT( ITT_Handler_sync_prepare!=&dummy_sync_prepare, NULL );
+    if (ITT_Handler_sync_prepare)
+        (*ITT_Handler_sync_prepare) (ptr);
+}
+
+void dummy_sync_acquired( volatile void* ptr ) {
+    ITT_DoOneTimeInitialization();
+    __TBB_ASSERT( ITT_Handler_sync_acquired!=&dummy_sync_acquired, NULL );
+    if (ITT_Handler_sync_acquired)
+        (*ITT_Handler_sync_acquired) (ptr);
+}
+
+void dummy_sync_releasing( volatile void* ptr ) {
+    ITT_DoOneTimeInitialization();
+    __TBB_ASSERT( ITT_Handler_sync_releasing!=&dummy_sync_releasing, NULL );
+    if (ITT_Handler_sync_releasing)
+        (*ITT_Handler_sync_releasing) (ptr);
+}
+
+void dummy_sync_cancel( volatile void* ptr ) {
+    ITT_DoOneTimeInitialization();
+    __TBB_ASSERT( ITT_Handler_sync_cancel!=&dummy_sync_cancel, NULL );
+    if (ITT_Handler_sync_cancel)
+        (*ITT_Handler_sync_cancel) (ptr);
+}
+
+int dummy_thr_name_set( const tchar* str, int number ) {
+    ITT_DoOneTimeInitialization();
+    __TBB_ASSERT( ITT_Handler_thr_name_set!=&dummy_thr_name_set, NULL );
+    if (ITT_Handler_thr_name_set)
+        return (*ITT_Handler_thr_name_set) (str, number);
+    return -1;
+}
+
+void dummy_thread_set_name( const tchar* name ) {
+    ITT_DoOneTimeInitialization();
+    __TBB_ASSERT( ITT_Handler_thread_set_name!=&dummy_thread_set_name, NULL );
+    if (ITT_Handler_thread_set_name)
+        (*ITT_Handler_thread_set_name)( name );
+}
+
+void dummy_sync_create( void* obj, const tchar* objname, const tchar* objtype, int /*attribute*/ ) {
+    ITT_DoOneTimeInitialization();
+    __TBB_ASSERT( ITT_Handler_sync_create!=&dummy_sync_create, NULL );
+    ITT_SYNC_CREATE( obj, objtype, objname );
+}
+
+void dummy_sync_rename( void* obj, const tchar* new_name ) {
+    ITT_DoOneTimeInitialization();
+    __TBB_ASSERT( ITT_Handler_sync_rename!=&dummy_sync_rename, NULL );
+    ITT_SYNC_RENAME(obj, new_name);
+}
+
+
+
+//! Leading padding before the area where tool notification handlers are placed.
+/** Prevents cache lines where the handler pointers are stored from thrashing.
+    Defined as extern to prevent compiler from placing the padding arrays separately
+    from the handler pointers (which are declared as extern).
+    Declared separately from definition to get rid of compiler warnings. **/
+extern char __ITT_Handler_leading_padding[NFS_MaxLineSize];
+
+//! Trailing padding after the area where tool notification handlers are placed.
+extern char __ITT_Handler_trailing_padding[NFS_MaxLineSize];
+
+char __ITT_Handler_leading_padding[NFS_MaxLineSize] = {0};
+PointerToITT_Handler ITT_Handler_sync_prepare = &dummy_sync_prepare;
+PointerToITT_Handler ITT_Handler_sync_acquired = &dummy_sync_acquired;
+PointerToITT_Handler ITT_Handler_sync_releasing = &dummy_sync_releasing;
+PointerToITT_Handler ITT_Handler_sync_cancel = &dummy_sync_cancel;
+PointerToITT_thr_name_set ITT_Handler_thr_name_set = &dummy_thr_name_set;
+PointerToITT_thread_set_name ITT_Handler_thread_set_name = &dummy_thread_set_name;
+PointerToITT_sync_create ITT_Handler_sync_create = &dummy_sync_create;
+PointerToITT_sync_rename ITT_Handler_sync_rename = &dummy_sync_rename;
+char __ITT_Handler_trailing_padding[NFS_MaxLineSize] = {0};
+
+target_tool current_tool = TO_BE_INITIALIZED;
+
+#endif /* DO_ITT_NOTIFY */
+} // namespace internal 
+
+} // namespace tbb
+
+#endif /* !__TBB_NEW_ITT_NOTIFY */
diff --git a/dep/tbb/src/tbb/itt_notify.h b/dep/tbb/src/tbb/itt_notify.h
new file mode 100644
index 000000000..db8aefcb8
--- /dev/null
+++ b/dep/tbb/src/tbb/itt_notify.h
@@ -0,0 +1,206 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef _TBB_ITT_NOTIFY
+#define _TBB_ITT_NOTIFY
+
+#include "tbb/tbb_stddef.h"
+
+#if DO_ITT_NOTIFY
+#if __TBB_NEW_ITT_NOTIFY
+
+#if _WIN32||_WIN64
+    #ifndef UNICODE
+        #define UNICODE
+    #endif
+#endif /* WIN */
+
+#include "tools_api/ittnotify.h"
+
+#if _WIN32||_WIN64
+    #undef _T
+    #undef __itt_event_create
+    #define __itt_event_create __itt_event_createA
+#endif /* WIN */
+
+#endif /* __TBB_NEW_ITT_NOTIFY */
+
+#endif /* DO_ITT_NOTIFY */
+
+namespace tbb {
+//! Unicode support
+#if _WIN32||_WIN64
+    //! Unicode character type. Always wchar_t on Windows.
+    /** We do not use typedefs from Windows TCHAR family to keep consistence of TBB coding style. **/
+    typedef wchar_t tchar;
+    //! Standard Windows macro to markup the string literals. 
+    #define _T(string_literal) L ## string_literal
+#if !__TBB_NEW_ITT_NOTIFY
+    #define tstrlen wcslen
+#endif /* !__TBB_NEW_ITT_NOTIFY */
+#else /* !WIN */
+    typedef char tchar;
+    //! Standard Windows style macro to markup the string literals.
+    #define _T(string_literal) string_literal
+#if !__TBB_NEW_ITT_NOTIFY
+    #define tstrlen strlen
+#endif /* !__TBB_NEW_ITT_NOTIFY */
+#endif /* !WIN */
+} // namespace tbb
+
+#if DO_ITT_NOTIFY
+namespace tbb {
+    //! Display names of internal synchronization types
+    extern const tchar 
+            *SyncType_GlobalLock,
+            *SyncType_Scheduler;
+    //! Display names of internal synchronization components/scenarios
+    extern const tchar 
+            *SyncObj_SchedulerInitialization,
+            *SyncObj_SchedulersList,
+            *SyncObj_TaskStealingLoop,
+            *SyncObj_WorkerTaskPool,
+            *SyncObj_MasterTaskPool,
+            *SyncObj_GateLock,
+            *SyncObj_Gate,
+            *SyncObj_SchedulerTermination,
+            *SyncObj_ContextsList
+            ;
+
+    namespace internal {
+        void __TBB_EXPORTED_FUNC itt_set_sync_name_v3( void* obj, const tchar* name); 
+
+    } // namespace internal
+
+} // namespace tbb
+
+#if __TBB_NEW_ITT_NOTIFY
+// const_cast<void*>() is necessary to cast off volatility
+#define ITT_NOTIFY(name,obj)            __itt_notify_##name(const_cast<void*>(static_cast<volatile void*>(obj)))
+#define ITT_THREAD_SET_NAME(name)       __itt_thread_set_name(name)
+#define ITT_SYNC_CREATE(obj, type, name) __itt_sync_create(obj, type, name, 2)
+#define ITT_SYNC_RENAME(obj, name)      __itt_sync_rename(obj, name)
+#endif /* __TBB_NEW_ITT_NOTIFY */
+
+#else /* !DO_ITT_NOTIFY */
+
+#define ITT_NOTIFY(name,obj)            ((void)0)
+#define ITT_THREAD_SET_NAME(name)       ((void)0)
+#define ITT_SYNC_CREATE(obj, type, name) ((void)0)
+#define ITT_SYNC_RENAME(obj, name)      ((void)0)
+
+#endif /* !DO_ITT_NOTIFY */
+
+
+#if !__TBB_NEW_ITT_NOTIFY
+
+#if DO_ITT_NOTIFY
+
+namespace tbb {
+
+//! Identifies performance and correctness tools, which TBB sends special notifications to.
+/** Enumerators must be ORable bit values.
+
+    Initializing global tool indicator with TO_BE_INITIALIZED is required 
+    to avoid bypassing early notification calls made through targeted macros until
+    initialization is performed from somewhere else.
+
+    Yet this entails another problem. The first targeted calls that happen to go
+    into the proxy (dummy) handlers become promiscuous. **/
+enum target_tool {
+    NONE = 0ul,
+    ITC = 1ul,
+    ITP = 2ul,
+    TO_BE_INITIALIZED = ~0ul
+};
+
+namespace internal {
+
+//! Types of the tool notification functions (and corresponding proxy handlers). 
+typedef void (*PointerToITT_Handler)(volatile void*);
+typedef int  (*PointerToITT_thr_name_set)(const tchar*, int);
+typedef void (*PointerToITT_thread_set_name)(const tchar*);
+
+
+typedef void (*PointerToITT_sync_create)(void* obj, const tchar* type, const tchar* name, int attribute);
+typedef void (*PointerToITT_sync_rename)(void* obj, const tchar* new_name);
+
+extern PointerToITT_Handler ITT_Handler_sync_prepare;
+extern PointerToITT_Handler ITT_Handler_sync_acquired;
+extern PointerToITT_Handler ITT_Handler_sync_releasing;
+extern PointerToITT_Handler ITT_Handler_sync_cancel;
+extern PointerToITT_thr_name_set ITT_Handler_thr_name_set;
+extern PointerToITT_thread_set_name ITT_Handler_thread_set_name;
+extern PointerToITT_sync_create ITT_Handler_sync_create;
+extern PointerToITT_sync_rename ITT_Handler_sync_rename;
+
+extern target_tool current_tool;
+
+} // namespace internal 
+
+} // namespace tbb
+
+//! Glues two tokens together.
+#define ITT_HANDLER(name) tbb::internal::ITT_Handler_##name
+#define CALL_ITT_HANDLER(name, arglist) ( ITT_HANDLER(name) ? (void)ITT_HANDLER(name)arglist : (void)0 )
+
+//! Call routine itt_notify_(name) if corresponding handler is available.
+/** For example, use ITT_NOTIFY(sync_releasing,x) to invoke __itt_notify_sync_releasing(x).
+    Ordinarily, preprocessor token gluing games should be avoided.
+    But here, it seemed to be the best way to handle the issue. */
+#define ITT_NOTIFY(name,obj) CALL_ITT_HANDLER(name,(obj))
+//! The same as ITT_NOTIFY but also checks if we are running under appropriate tool.
+/** Parameter tools is an ORed set of target_tool enumerators. **/
+#define ITT_NOTIFY_TOOL(tools,name,obj) ( ITT_HANDLER(name) && ((tools) & tbb::internal::current_tool) ? ITT_HANDLER(name)(obj) : (void)0 )
+
+#define ITT_THREAD_SET_NAME(name) ( \
+            ITT_HANDLER(thread_set_name) ? ITT_HANDLER(thread_set_name)(name)   \
+                                         : CALL_ITT_HANDLER(thr_name_set,(name, tstrlen(name))) )
+
+
+/** 2 is the value of __itt_attr_mutex attribute. **/
+#define ITT_SYNC_CREATE(obj, type, name) CALL_ITT_HANDLER(sync_create,(obj, type, name, 2))
+#define ITT_SYNC_RENAME(obj, name) CALL_ITT_HANDLER(sync_rename,(obj, name))
+
+
+
+#else /* !DO_ITT_NOTIFY */
+
+#define ITT_NOTIFY_TOOL(tools,name,obj) ((void)0)
+
+#endif /* !DO_ITT_NOTIFY */
+
+#if DO_ITT_QUIET
+#define ITT_QUIET(x) (__itt_thr_mode_set(__itt_thr_prop_quiet,(x)?__itt_thr_state_set:__itt_thr_state_clr))
+#else
+#define ITT_QUIET(x) ((void)0)
+#endif /* DO_ITT_QUIET */
+
+#endif /* !__TBB_NEW_ITT_NOTIFY */
+
+#endif /* _TBB_ITT_NOTIFY */
diff --git a/dep/tbb/src/tbb/itt_notify_proxy.c b/dep/tbb/src/tbb/itt_notify_proxy.c
new file mode 100644
index 000000000..9d4e67222
--- /dev/null
+++ b/dep/tbb/src/tbb/itt_notify_proxy.c
@@ -0,0 +1,55 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "tbb/tbb_config.h"
+
+/* This declaration in particular shuts up "empty translation unit" warning */
+extern int __TBB_load_ittnotify();
+
+#if __TBB_NEW_ITT_NOTIFY
+#if DO_ITT_NOTIFY
+
+#if _WIN32||_WIN64
+    #ifndef UNICODE
+        #define UNICODE
+    #endif
+#endif /* WIN */
+
+extern void ITT_DoOneTimeInitialization();
+
+#define ITT_SIMPLE_INIT 1
+#define __itt_init_lib_name ITT_DoOneTimeInitialization
+
+#include "tools_api/ittnotify_static.c"
+
+int __TBB_load_ittnotify() {
+    return __itt_init_lib();
+}
+
+#endif /* DO_ITT_NOTIFY */
+#endif /* __TBB_NEW_ITT_NOTIFY */
diff --git a/dep/tbb/src/tbb/lin32-tbb-export.def b/dep/tbb/src/tbb/lin32-tbb-export.def
new file mode 100644
index 000000000..5fc2f53b4
--- /dev/null
+++ b/dep/tbb/src/tbb/lin32-tbb-export.def
@@ -0,0 +1,316 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "tbb/tbb_config.h"
+
+{
+global:
+
+/* cache_aligned_allocator.cpp */
+_ZN3tbb8internal12NFS_AllocateEjjPv;
+_ZN3tbb8internal15NFS_GetLineSizeEv;
+_ZN3tbb8internal8NFS_FreeEPv;
+_ZN3tbb8internal23allocate_via_handler_v3Ej;
+_ZN3tbb8internal25deallocate_via_handler_v3EPv;
+_ZN3tbb8internal17is_malloc_used_v3Ev;
+
+/* task.cpp v3 */
+_ZN3tbb4task13note_affinityEt;
+_ZN3tbb4task22internal_set_ref_countEi;
+_ZN3tbb4task28internal_decrement_ref_countEv;
+_ZN3tbb4task22spawn_and_wait_for_allERNS_9task_listE;
+_ZN3tbb4task4selfEv;
+_ZN3tbb4task7destroyERS0_;
+_ZNK3tbb4task26is_owned_by_current_threadEv;
+_ZN3tbb8internal19allocate_root_proxy4freeERNS_4taskE;
+_ZN3tbb8internal19allocate_root_proxy8allocateEj;
+_ZN3tbb8internal28affinity_partitioner_base_v36resizeEj;
+_ZNK3tbb8internal20allocate_child_proxy4freeERNS_4taskE;
+_ZNK3tbb8internal20allocate_child_proxy8allocateEj;
+_ZNK3tbb8internal27allocate_continuation_proxy4freeERNS_4taskE;
+_ZNK3tbb8internal27allocate_continuation_proxy8allocateEj;
+_ZNK3tbb8internal34allocate_additional_child_of_proxy4freeERNS_4taskE;
+_ZNK3tbb8internal34allocate_additional_child_of_proxy8allocateEj;
+_ZTIN3tbb4taskE;
+_ZTSN3tbb4taskE;
+_ZTVN3tbb4taskE;
+_ZN3tbb19task_scheduler_init19default_num_threadsEv;
+_ZN3tbb19task_scheduler_init10initializeEij;
+_ZN3tbb19task_scheduler_init10initializeEi;
+_ZN3tbb19task_scheduler_init9terminateEv;
+_ZN3tbb8internal26task_scheduler_observer_v37observeEb;
+_ZN3tbb10empty_task7executeEv;
+_ZN3tbb10empty_taskD0Ev;
+_ZN3tbb10empty_taskD1Ev;
+_ZTIN3tbb10empty_taskE;
+_ZTSN3tbb10empty_taskE;
+_ZTVN3tbb10empty_taskE;
+
+/* exception handling support */
+#if __TBB_EXCEPTIONS
+_ZNK3tbb8internal32allocate_root_with_context_proxy8allocateEj;
+_ZNK3tbb8internal32allocate_root_with_context_proxy4freeERNS_4taskE;
+_ZNK3tbb18task_group_context28is_group_execution_cancelledEv;
+_ZN3tbb18task_group_context22cancel_group_executionEv;
+_ZN3tbb18task_group_context26register_pending_exceptionEv;
+_ZN3tbb18task_group_context5resetEv;
+_ZN3tbb18task_group_context4initEv;
+_ZN3tbb18task_group_contextD1Ev;
+_ZN3tbb18task_group_contextD2Ev;
+_ZNK3tbb18captured_exception4nameEv;
+_ZNK3tbb18captured_exception4whatEv;
+_ZN3tbb18captured_exception10throw_selfEv;
+_ZN3tbb18captured_exception3setEPKcS2_;
+_ZN3tbb18captured_exception4moveEv;
+_ZN3tbb18captured_exception5clearEv;
+_ZN3tbb18captured_exception7destroyEv;
+_ZN3tbb18captured_exception8allocateEPKcS2_;
+_ZN3tbb18captured_exceptionD0Ev;
+_ZN3tbb18captured_exceptionD1Ev;
+_ZTIN3tbb18captured_exceptionE;
+_ZTSN3tbb18captured_exceptionE;
+_ZTVN3tbb18captured_exceptionE;
+_ZN3tbb13tbb_exceptionD2Ev;
+_ZTIN3tbb13tbb_exceptionE;
+_ZTSN3tbb13tbb_exceptionE;
+_ZTVN3tbb13tbb_exceptionE;
+_ZN3tbb14bad_last_allocD0Ev;
+_ZN3tbb14bad_last_allocD1Ev;
+_ZNK3tbb14bad_last_alloc4whatEv;
+_ZTIN3tbb14bad_last_allocE;
+_ZTSN3tbb14bad_last_allocE;
+_ZTVN3tbb14bad_last_allocE;
+#endif /* __TBB_EXCEPTIONS */
+
+/* tbb_misc.cpp */
+_ZN3tbb17assertion_failureEPKciS1_S1_;
+_ZN3tbb21set_assertion_handlerEPFvPKciS1_S1_E;
+_ZN3tbb8internal36get_initial_auto_partitioner_divisorEv;
+_ZN3tbb8internal13handle_perrorEiPKc;
+_ZN3tbb8internal15runtime_warningEPKcz;
+__TBB_machine_store8_slow_perf_warning;
+__TBB_machine_store8_slow;
+TBB_runtime_interface_version;
+_ZN3tbb8internal33throw_bad_last_alloc_exception_v4Ev;
+
+/* itt_notify.cpp */
+_ZN3tbb8internal32itt_load_pointer_with_acquire_v3EPKv;
+_ZN3tbb8internal33itt_store_pointer_with_release_v3EPvS1_;
+_ZN3tbb8internal20itt_set_sync_name_v3EPvPKc;
+_ZN3tbb8internal19itt_load_pointer_v3EPKv;
+
+/* pipeline.cpp */
+_ZTIN3tbb6filterE;
+_ZTSN3tbb6filterE;
+_ZTVN3tbb6filterE;
+_ZN3tbb6filterD2Ev;
+_ZN3tbb8pipeline10add_filterERNS_6filterE;
+_ZN3tbb8pipeline12inject_tokenERNS_4taskE;
+_ZN3tbb8pipeline13remove_filterERNS_6filterE;
+_ZN3tbb8pipeline3runEj;
+#if __TBB_EXCEPTIONS
+_ZN3tbb8pipeline3runEjRNS_18task_group_contextE;
+#endif
+_ZN3tbb8pipeline5clearEv;
+_ZN3tbb19thread_bound_filter12process_itemEv;
+_ZN3tbb19thread_bound_filter16try_process_itemEv;
+_ZTIN3tbb8pipelineE;
+_ZTSN3tbb8pipelineE;
+_ZTVN3tbb8pipelineE;
+_ZN3tbb8pipelineC1Ev;
+_ZN3tbb8pipelineC2Ev;
+_ZN3tbb8pipelineD0Ev;
+_ZN3tbb8pipelineD1Ev;
+_ZN3tbb8pipelineD2Ev;
+
+/* queuing_rw_mutex.cpp */
+_ZN3tbb16queuing_rw_mutex18internal_constructEv;
+_ZN3tbb16queuing_rw_mutex11scoped_lock17upgrade_to_writerEv;
+_ZN3tbb16queuing_rw_mutex11scoped_lock19downgrade_to_readerEv;
+_ZN3tbb16queuing_rw_mutex11scoped_lock7acquireERS0_b;
+_ZN3tbb16queuing_rw_mutex11scoped_lock7releaseEv;
+_ZN3tbb16queuing_rw_mutex11scoped_lock11try_acquireERS0_b;
+
+#if !TBB_NO_LEGACY
+/* spin_rw_mutex.cpp v2 */
+_ZN3tbb13spin_rw_mutex16internal_upgradeEPS0_;
+_ZN3tbb13spin_rw_mutex22internal_itt_releasingEPS0_;
+_ZN3tbb13spin_rw_mutex23internal_acquire_readerEPS0_;
+_ZN3tbb13spin_rw_mutex23internal_acquire_writerEPS0_;
+_ZN3tbb13spin_rw_mutex18internal_downgradeEPS0_;
+_ZN3tbb13spin_rw_mutex23internal_release_readerEPS0_;
+_ZN3tbb13spin_rw_mutex23internal_release_writerEPS0_;
+_ZN3tbb13spin_rw_mutex27internal_try_acquire_readerEPS0_;
+_ZN3tbb13spin_rw_mutex27internal_try_acquire_writerEPS0_;
+#endif
+
+/* spin_rw_mutex v3 */
+_ZN3tbb16spin_rw_mutex_v318internal_constructEv;
+_ZN3tbb16spin_rw_mutex_v316internal_upgradeEv;
+_ZN3tbb16spin_rw_mutex_v318internal_downgradeEv;
+_ZN3tbb16spin_rw_mutex_v323internal_acquire_readerEv;
+_ZN3tbb16spin_rw_mutex_v323internal_acquire_writerEv;
+_ZN3tbb16spin_rw_mutex_v323internal_release_readerEv;
+_ZN3tbb16spin_rw_mutex_v323internal_release_writerEv;
+_ZN3tbb16spin_rw_mutex_v327internal_try_acquire_readerEv;
+_ZN3tbb16spin_rw_mutex_v327internal_try_acquire_writerEv;
+
+/* spin_mutex.cpp */
+_ZN3tbb10spin_mutex18internal_constructEv;
+_ZN3tbb10spin_mutex11scoped_lock16internal_acquireERS0_;
+_ZN3tbb10spin_mutex11scoped_lock16internal_releaseEv;
+_ZN3tbb10spin_mutex11scoped_lock20internal_try_acquireERS0_;
+
+/* mutex.cpp */
+_ZN3tbb5mutex11scoped_lock16internal_acquireERS0_;
+_ZN3tbb5mutex11scoped_lock16internal_releaseEv;
+_ZN3tbb5mutex11scoped_lock20internal_try_acquireERS0_;
+_ZN3tbb5mutex16internal_destroyEv;
+_ZN3tbb5mutex18internal_constructEv;
+
+/* recursive_mutex.cpp */
+_ZN3tbb15recursive_mutex11scoped_lock16internal_acquireERS0_;
+_ZN3tbb15recursive_mutex11scoped_lock16internal_releaseEv;
+_ZN3tbb15recursive_mutex11scoped_lock20internal_try_acquireERS0_;
+_ZN3tbb15recursive_mutex16internal_destroyEv;
+_ZN3tbb15recursive_mutex18internal_constructEv;
+
+/* QueuingMutex.cpp */
+_ZN3tbb13queuing_mutex18internal_constructEv;
+_ZN3tbb13queuing_mutex11scoped_lock7acquireERS0_;
+_ZN3tbb13queuing_mutex11scoped_lock7releaseEv;
+_ZN3tbb13queuing_mutex11scoped_lock11try_acquireERS0_;
+
+#if !TBB_NO_LEGACY
+/* concurrent_hash_map */
+_ZNK3tbb8internal21hash_map_segment_base23internal_grow_predicateEv;
+
+/* concurrent_queue.cpp v2 */
+_ZN3tbb8internal21concurrent_queue_base12internal_popEPv;
+_ZN3tbb8internal21concurrent_queue_base13internal_pushEPKv;
+_ZN3tbb8internal21concurrent_queue_base21internal_set_capacityEij;
+_ZN3tbb8internal21concurrent_queue_base23internal_pop_if_presentEPv;
+_ZN3tbb8internal21concurrent_queue_base25internal_push_if_not_fullEPKv;
+_ZN3tbb8internal21concurrent_queue_baseC2Ej;
+_ZN3tbb8internal21concurrent_queue_baseD2Ev;
+_ZTIN3tbb8internal21concurrent_queue_baseE;
+_ZTSN3tbb8internal21concurrent_queue_baseE;
+_ZTVN3tbb8internal21concurrent_queue_baseE;
+_ZN3tbb8internal30concurrent_queue_iterator_base6assignERKS1_;
+_ZN3tbb8internal30concurrent_queue_iterator_base7advanceEv;
+_ZN3tbb8internal30concurrent_queue_iterator_baseC2ERKNS0_21concurrent_queue_baseE;
+_ZN3tbb8internal30concurrent_queue_iterator_baseD2Ev;
+_ZNK3tbb8internal21concurrent_queue_base13internal_sizeEv;
+#endif
+
+/* concurrent_queue v3 */
+/* constructors */
+_ZN3tbb8internal24concurrent_queue_base_v3C2Ej;
+_ZN3tbb8internal33concurrent_queue_iterator_base_v3C2ERKNS0_24concurrent_queue_base_v3E;
+/* destructors */
+_ZN3tbb8internal24concurrent_queue_base_v3D2Ev;
+_ZN3tbb8internal33concurrent_queue_iterator_base_v3D2Ev;
+/* typeinfo */
+_ZTIN3tbb8internal24concurrent_queue_base_v3E;
+_ZTSN3tbb8internal24concurrent_queue_base_v3E;
+/* vtable */
+_ZTVN3tbb8internal24concurrent_queue_base_v3E;
+/* methods */
+_ZN3tbb8internal33concurrent_queue_iterator_base_v37advanceEv;
+_ZN3tbb8internal33concurrent_queue_iterator_base_v36assignERKS1_;
+_ZN3tbb8internal24concurrent_queue_base_v313internal_pushEPKv;
+_ZN3tbb8internal24concurrent_queue_base_v325internal_push_if_not_fullEPKv;
+_ZN3tbb8internal24concurrent_queue_base_v312internal_popEPv;
+_ZN3tbb8internal24concurrent_queue_base_v323internal_pop_if_presentEPv;
+_ZN3tbb8internal24concurrent_queue_base_v321internal_set_capacityEij;
+_ZNK3tbb8internal24concurrent_queue_base_v313internal_sizeEv;
+_ZNK3tbb8internal24concurrent_queue_base_v314internal_emptyEv;
+_ZN3tbb8internal24concurrent_queue_base_v321internal_finish_clearEv;
+_ZNK3tbb8internal24concurrent_queue_base_v324internal_throw_exceptionEv;
+_ZN3tbb8internal24concurrent_queue_base_v36assignERKS1_;
+
+#if !TBB_NO_LEGACY
+/* concurrent_vector.cpp v2 */
+_ZN3tbb8internal22concurrent_vector_base13internal_copyERKS1_jPFvPvPKvjE;
+_ZN3tbb8internal22concurrent_vector_base14internal_clearEPFvPvjEb;
+_ZN3tbb8internal22concurrent_vector_base15internal_assignERKS1_jPFvPvjEPFvS4_PKvjESA_;
+_ZN3tbb8internal22concurrent_vector_base16internal_grow_byEjjPFvPvjE;
+_ZN3tbb8internal22concurrent_vector_base16internal_reserveEjjj;
+_ZN3tbb8internal22concurrent_vector_base18internal_push_backEjRj;
+_ZN3tbb8internal22concurrent_vector_base25internal_grow_to_at_leastEjjPFvPvjE;
+_ZNK3tbb8internal22concurrent_vector_base17internal_capacityEv;
+#endif
+
+/* concurrent_vector v3 */
+_ZN3tbb8internal25concurrent_vector_base_v313internal_copyERKS1_jPFvPvPKvjE;
+_ZN3tbb8internal25concurrent_vector_base_v314internal_clearEPFvPvjE;
+_ZN3tbb8internal25concurrent_vector_base_v315internal_assignERKS1_jPFvPvjEPFvS4_PKvjESA_;
+_ZN3tbb8internal25concurrent_vector_base_v316internal_grow_byEjjPFvPvPKvjES4_;
+_ZN3tbb8internal25concurrent_vector_base_v316internal_reserveEjjj;
+_ZN3tbb8internal25concurrent_vector_base_v318internal_push_backEjRj;
+_ZN3tbb8internal25concurrent_vector_base_v325internal_grow_to_at_leastEjjPFvPvPKvjES4_;
+_ZNK3tbb8internal25concurrent_vector_base_v317internal_capacityEv;
+_ZN3tbb8internal25concurrent_vector_base_v316internal_compactEjPvPFvS2_jEPFvS2_PKvjE;
+_ZN3tbb8internal25concurrent_vector_base_v313internal_swapERS1_;
+_ZNK3tbb8internal25concurrent_vector_base_v324internal_throw_exceptionEj;
+_ZN3tbb8internal25concurrent_vector_base_v3D2Ev;
+_ZN3tbb8internal25concurrent_vector_base_v315internal_resizeEjjjPKvPFvPvjEPFvS4_S3_jE;
+_ZN3tbb8internal25concurrent_vector_base_v337internal_grow_to_at_least_with_resultEjjPFvPvPKvjES4_;
+
+/* tbb_thread */
+_ZN3tbb8internal13tbb_thread_v314internal_startEPFPvS2_ES2_;
+_ZN3tbb8internal13tbb_thread_v320hardware_concurrencyEv;
+_ZN3tbb8internal13tbb_thread_v34joinEv;
+_ZN3tbb8internal13tbb_thread_v36detachEv;
+_ZN3tbb8internal15free_closure_v3EPv;
+_ZN3tbb8internal15thread_sleep_v3ERKNS_10tick_count10interval_tE;
+_ZN3tbb8internal15thread_yield_v3Ev;
+_ZN3tbb8internal16thread_get_id_v3Ev;
+_ZN3tbb8internal19allocate_closure_v3Ej;
+_ZN3tbb8internal7move_v3ERNS0_13tbb_thread_v3ES2_;
+
+local:
+
+/* TBB symbols */
+*3tbb*;
+*__TBB*;
+
+/* Intel Compiler (libirc) symbols */
+__intel_*;
+_intel_*;
+get_memcpy_largest_cachelinesize;
+get_memcpy_largest_cache_size;
+get_mem_ops_method;
+init_mem_ops_method;
+irc__get_msg;
+irc__print;
+override_mem_ops_method;
+set_memcpy_largest_cachelinesize;
+set_memcpy_largest_cache_size;
+
+};
diff --git a/dep/tbb/src/tbb/lin64-tbb-export.def b/dep/tbb/src/tbb/lin64-tbb-export.def
new file mode 100644
index 000000000..40b245b47
--- /dev/null
+++ b/dep/tbb/src/tbb/lin64-tbb-export.def
@@ -0,0 +1,311 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "tbb/tbb_config.h"
+
+{
+global:
+
+/* cache_aligned_allocator.cpp */
+_ZN3tbb8internal12NFS_AllocateEmmPv;
+_ZN3tbb8internal15NFS_GetLineSizeEv;
+_ZN3tbb8internal8NFS_FreeEPv;
+_ZN3tbb8internal23allocate_via_handler_v3Em;
+_ZN3tbb8internal25deallocate_via_handler_v3EPv;
+_ZN3tbb8internal17is_malloc_used_v3Ev;
+
+/* task.cpp v3 */
+_ZN3tbb4task13note_affinityEt;
+_ZN3tbb4task22internal_set_ref_countEi;
+_ZN3tbb4task28internal_decrement_ref_countEv;
+_ZN3tbb4task22spawn_and_wait_for_allERNS_9task_listE;
+_ZN3tbb4task4selfEv;
+_ZN3tbb4task7destroyERS0_;
+_ZNK3tbb4task26is_owned_by_current_threadEv;
+_ZN3tbb8internal19allocate_root_proxy4freeERNS_4taskE;
+_ZN3tbb8internal19allocate_root_proxy8allocateEm;
+_ZN3tbb8internal28affinity_partitioner_base_v36resizeEj;
+_ZNK3tbb8internal20allocate_child_proxy4freeERNS_4taskE;
+_ZNK3tbb8internal20allocate_child_proxy8allocateEm;
+_ZNK3tbb8internal27allocate_continuation_proxy4freeERNS_4taskE;
+_ZNK3tbb8internal27allocate_continuation_proxy8allocateEm;
+_ZNK3tbb8internal34allocate_additional_child_of_proxy4freeERNS_4taskE;
+_ZNK3tbb8internal34allocate_additional_child_of_proxy8allocateEm;
+_ZTIN3tbb4taskE;
+_ZTSN3tbb4taskE;
+_ZTVN3tbb4taskE;
+_ZN3tbb19task_scheduler_init19default_num_threadsEv;
+_ZN3tbb19task_scheduler_init10initializeEim;
+_ZN3tbb19task_scheduler_init10initializeEi;
+_ZN3tbb19task_scheduler_init9terminateEv;
+_ZN3tbb8internal26task_scheduler_observer_v37observeEb;
+_ZN3tbb10empty_task7executeEv;
+_ZN3tbb10empty_taskD0Ev;
+_ZN3tbb10empty_taskD1Ev;
+_ZTIN3tbb10empty_taskE;
+_ZTSN3tbb10empty_taskE;
+_ZTVN3tbb10empty_taskE;
+
+/* exception handling support */
+#if __TBB_EXCEPTIONS
+_ZNK3tbb8internal32allocate_root_with_context_proxy8allocateEm;
+_ZNK3tbb8internal32allocate_root_with_context_proxy4freeERNS_4taskE;
+_ZNK3tbb18task_group_context28is_group_execution_cancelledEv;
+_ZN3tbb18task_group_context22cancel_group_executionEv;
+_ZN3tbb18task_group_context26register_pending_exceptionEv;
+_ZN3tbb18task_group_context5resetEv;
+_ZN3tbb18task_group_context4initEv;
+_ZN3tbb18task_group_contextD1Ev;
+_ZN3tbb18task_group_contextD2Ev;
+_ZNK3tbb18captured_exception4nameEv;
+_ZNK3tbb18captured_exception4whatEv;
+_ZN3tbb18captured_exception10throw_selfEv;
+_ZN3tbb18captured_exception3setEPKcS2_;
+_ZN3tbb18captured_exception4moveEv;
+_ZN3tbb18captured_exception5clearEv;
+_ZN3tbb18captured_exception7destroyEv;
+_ZN3tbb18captured_exception8allocateEPKcS2_;
+_ZN3tbb18captured_exceptionD0Ev;
+_ZN3tbb18captured_exceptionD1Ev;
+_ZTIN3tbb18captured_exceptionE;
+_ZTSN3tbb18captured_exceptionE;
+_ZTVN3tbb18captured_exceptionE;
+_ZN3tbb13tbb_exceptionD2Ev;
+_ZTIN3tbb13tbb_exceptionE;
+_ZTSN3tbb13tbb_exceptionE;
+_ZTVN3tbb13tbb_exceptionE;
+_ZN3tbb14bad_last_allocD0Ev;
+_ZN3tbb14bad_last_allocD1Ev;
+_ZNK3tbb14bad_last_alloc4whatEv;
+_ZTIN3tbb14bad_last_allocE;
+_ZTSN3tbb14bad_last_allocE;
+_ZTVN3tbb14bad_last_allocE;
+#endif /* __TBB_EXCEPTIONS */
+
+/* tbb_misc.cpp */
+_ZN3tbb17assertion_failureEPKciS1_S1_;
+_ZN3tbb21set_assertion_handlerEPFvPKciS1_S1_E;
+_ZN3tbb8internal36get_initial_auto_partitioner_divisorEv;
+_ZN3tbb8internal13handle_perrorEiPKc;
+_ZN3tbb8internal15runtime_warningEPKcz;
+TBB_runtime_interface_version;
+_ZN3tbb8internal33throw_bad_last_alloc_exception_v4Ev;
+
+/* itt_notify.cpp */
+_ZN3tbb8internal32itt_load_pointer_with_acquire_v3EPKv;
+_ZN3tbb8internal33itt_store_pointer_with_release_v3EPvS1_;
+_ZN3tbb8internal20itt_set_sync_name_v3EPvPKc;
+_ZN3tbb8internal19itt_load_pointer_v3EPKv;
+
+/* pipeline.cpp */
+_ZTIN3tbb6filterE;
+_ZTSN3tbb6filterE;
+_ZTVN3tbb6filterE;
+_ZN3tbb6filterD2Ev;
+_ZN3tbb8pipeline10add_filterERNS_6filterE;
+_ZN3tbb8pipeline12inject_tokenERNS_4taskE;
+_ZN3tbb8pipeline13remove_filterERNS_6filterE;
+_ZN3tbb8pipeline3runEm;
+#if __TBB_EXCEPTIONS
+_ZN3tbb8pipeline3runEmRNS_18task_group_contextE;
+#endif
+_ZN3tbb8pipeline5clearEv;
+_ZN3tbb19thread_bound_filter12process_itemEv;
+_ZN3tbb19thread_bound_filter16try_process_itemEv;
+_ZTIN3tbb8pipelineE;
+_ZTSN3tbb8pipelineE;
+_ZTVN3tbb8pipelineE;
+_ZN3tbb8pipelineC1Ev;
+_ZN3tbb8pipelineC2Ev;
+_ZN3tbb8pipelineD0Ev;
+_ZN3tbb8pipelineD1Ev;
+_ZN3tbb8pipelineD2Ev;
+
+/* queuing_rw_mutex.cpp */
+_ZN3tbb16queuing_rw_mutex18internal_constructEv;
+_ZN3tbb16queuing_rw_mutex11scoped_lock17upgrade_to_writerEv;
+_ZN3tbb16queuing_rw_mutex11scoped_lock19downgrade_to_readerEv;
+_ZN3tbb16queuing_rw_mutex11scoped_lock7acquireERS0_b;
+_ZN3tbb16queuing_rw_mutex11scoped_lock7releaseEv;
+_ZN3tbb16queuing_rw_mutex11scoped_lock11try_acquireERS0_b;
+
+#if !TBB_NO_LEGACY
+/* spin_rw_mutex.cpp v2 */
+_ZN3tbb13spin_rw_mutex16internal_upgradeEPS0_;
+_ZN3tbb13spin_rw_mutex22internal_itt_releasingEPS0_;
+_ZN3tbb13spin_rw_mutex23internal_acquire_readerEPS0_;
+_ZN3tbb13spin_rw_mutex23internal_acquire_writerEPS0_;
+_ZN3tbb13spin_rw_mutex18internal_downgradeEPS0_;
+_ZN3tbb13spin_rw_mutex23internal_release_readerEPS0_;
+_ZN3tbb13spin_rw_mutex23internal_release_writerEPS0_;
+_ZN3tbb13spin_rw_mutex27internal_try_acquire_readerEPS0_;
+_ZN3tbb13spin_rw_mutex27internal_try_acquire_writerEPS0_;
+#endif
+
+/* spin_rw_mutex v3 */
+_ZN3tbb16spin_rw_mutex_v318internal_constructEv;
+_ZN3tbb16spin_rw_mutex_v316internal_upgradeEv;
+_ZN3tbb16spin_rw_mutex_v318internal_downgradeEv;
+_ZN3tbb16spin_rw_mutex_v323internal_acquire_readerEv;
+_ZN3tbb16spin_rw_mutex_v323internal_acquire_writerEv;
+_ZN3tbb16spin_rw_mutex_v323internal_release_readerEv;
+_ZN3tbb16spin_rw_mutex_v323internal_release_writerEv;
+_ZN3tbb16spin_rw_mutex_v327internal_try_acquire_readerEv;
+_ZN3tbb16spin_rw_mutex_v327internal_try_acquire_writerEv;
+
+/* spin_mutex.cpp */
+_ZN3tbb10spin_mutex11scoped_lock16internal_acquireERS0_;
+_ZN3tbb10spin_mutex11scoped_lock16internal_releaseEv;
+_ZN3tbb10spin_mutex11scoped_lock20internal_try_acquireERS0_;
+_ZN3tbb10spin_mutex18internal_constructEv;
+
+/* mutex.cpp */
+_ZN3tbb5mutex11scoped_lock16internal_acquireERS0_;
+_ZN3tbb5mutex11scoped_lock16internal_releaseEv;
+_ZN3tbb5mutex11scoped_lock20internal_try_acquireERS0_;
+_ZN3tbb5mutex16internal_destroyEv;
+_ZN3tbb5mutex18internal_constructEv;
+
+/* recursive_mutex.cpp */
+_ZN3tbb15recursive_mutex11scoped_lock16internal_acquireERS0_;
+_ZN3tbb15recursive_mutex11scoped_lock16internal_releaseEv;
+_ZN3tbb15recursive_mutex11scoped_lock20internal_try_acquireERS0_;
+_ZN3tbb15recursive_mutex16internal_destroyEv;
+_ZN3tbb15recursive_mutex18internal_constructEv;
+
+/* QueuingMutex.cpp */
+_ZN3tbb13queuing_mutex18internal_constructEv;
+_ZN3tbb13queuing_mutex11scoped_lock7acquireERS0_;
+_ZN3tbb13queuing_mutex11scoped_lock7releaseEv;
+_ZN3tbb13queuing_mutex11scoped_lock11try_acquireERS0_;
+
+#if !TBB_NO_LEGACY
+/* concurrent_hash_map */
+_ZNK3tbb8internal21hash_map_segment_base23internal_grow_predicateEv;
+
+/* concurrent_queue.cpp v2 */
+_ZN3tbb8internal21concurrent_queue_base12internal_popEPv;
+_ZN3tbb8internal21concurrent_queue_base13internal_pushEPKv;
+_ZN3tbb8internal21concurrent_queue_base21internal_set_capacityElm;
+_ZN3tbb8internal21concurrent_queue_base23internal_pop_if_presentEPv;
+_ZN3tbb8internal21concurrent_queue_base25internal_push_if_not_fullEPKv;
+_ZN3tbb8internal21concurrent_queue_baseC2Em;
+_ZN3tbb8internal21concurrent_queue_baseD2Ev;
+_ZTIN3tbb8internal21concurrent_queue_baseE;
+_ZTSN3tbb8internal21concurrent_queue_baseE;
+_ZTVN3tbb8internal21concurrent_queue_baseE;
+_ZN3tbb8internal30concurrent_queue_iterator_base6assignERKS1_;
+_ZN3tbb8internal30concurrent_queue_iterator_base7advanceEv;
+_ZN3tbb8internal30concurrent_queue_iterator_baseC2ERKNS0_21concurrent_queue_baseE;
+_ZN3tbb8internal30concurrent_queue_iterator_baseD2Ev;
+_ZNK3tbb8internal21concurrent_queue_base13internal_sizeEv;
+#endif
+
+/* concurrent_queue v3 */
+/* constructors */
+_ZN3tbb8internal24concurrent_queue_base_v3C2Em;
+_ZN3tbb8internal33concurrent_queue_iterator_base_v3C2ERKNS0_24concurrent_queue_base_v3E;
+/* destructors */
+_ZN3tbb8internal24concurrent_queue_base_v3D2Ev;
+_ZN3tbb8internal33concurrent_queue_iterator_base_v3D2Ev;
+/* typeinfo */
+_ZTIN3tbb8internal24concurrent_queue_base_v3E;
+_ZTSN3tbb8internal24concurrent_queue_base_v3E;
+/* vtable */
+_ZTVN3tbb8internal24concurrent_queue_base_v3E;
+/* methods */
+_ZN3tbb8internal33concurrent_queue_iterator_base_v36assignERKS1_;
+_ZN3tbb8internal33concurrent_queue_iterator_base_v37advanceEv;
+_ZN3tbb8internal24concurrent_queue_base_v313internal_pushEPKv;
+_ZN3tbb8internal24concurrent_queue_base_v325internal_push_if_not_fullEPKv;
+_ZN3tbb8internal24concurrent_queue_base_v312internal_popEPv;
+_ZN3tbb8internal24concurrent_queue_base_v323internal_pop_if_presentEPv;
+_ZN3tbb8internal24concurrent_queue_base_v321internal_finish_clearEv;
+_ZN3tbb8internal24concurrent_queue_base_v321internal_set_capacityElm;
+_ZNK3tbb8internal24concurrent_queue_base_v313internal_sizeEv;
+_ZNK3tbb8internal24concurrent_queue_base_v314internal_emptyEv;
+_ZNK3tbb8internal24concurrent_queue_base_v324internal_throw_exceptionEv;
+_ZN3tbb8internal24concurrent_queue_base_v36assignERKS1_;
+
+#if !TBB_NO_LEGACY
+/* concurrent_vector.cpp v2 */
+_ZN3tbb8internal22concurrent_vector_base13internal_copyERKS1_mPFvPvPKvmE;
+_ZN3tbb8internal22concurrent_vector_base14internal_clearEPFvPvmEb;
+_ZN3tbb8internal22concurrent_vector_base15internal_assignERKS1_mPFvPvmEPFvS4_PKvmESA_;
+_ZN3tbb8internal22concurrent_vector_base16internal_grow_byEmmPFvPvmE;
+_ZN3tbb8internal22concurrent_vector_base16internal_reserveEmmm;
+_ZN3tbb8internal22concurrent_vector_base18internal_push_backEmRm;
+_ZN3tbb8internal22concurrent_vector_base25internal_grow_to_at_leastEmmPFvPvmE;
+_ZNK3tbb8internal22concurrent_vector_base17internal_capacityEv;
+#endif
+
+/* concurrent_vector v3 */
+_ZN3tbb8internal25concurrent_vector_base_v313internal_copyERKS1_mPFvPvPKvmE;
+_ZN3tbb8internal25concurrent_vector_base_v314internal_clearEPFvPvmE;
+_ZN3tbb8internal25concurrent_vector_base_v315internal_assignERKS1_mPFvPvmEPFvS4_PKvmESA_;
+_ZN3tbb8internal25concurrent_vector_base_v316internal_grow_byEmmPFvPvPKvmES4_;
+_ZN3tbb8internal25concurrent_vector_base_v316internal_reserveEmmm;
+_ZN3tbb8internal25concurrent_vector_base_v318internal_push_backEmRm;
+_ZN3tbb8internal25concurrent_vector_base_v325internal_grow_to_at_leastEmmPFvPvPKvmES4_;
+_ZNK3tbb8internal25concurrent_vector_base_v317internal_capacityEv;
+_ZN3tbb8internal25concurrent_vector_base_v316internal_compactEmPvPFvS2_mEPFvS2_PKvmE;
+_ZN3tbb8internal25concurrent_vector_base_v313internal_swapERS1_;
+_ZNK3tbb8internal25concurrent_vector_base_v324internal_throw_exceptionEm;
+_ZN3tbb8internal25concurrent_vector_base_v3D2Ev;
+_ZN3tbb8internal25concurrent_vector_base_v315internal_resizeEmmmPKvPFvPvmEPFvS4_S3_mE;
+_ZN3tbb8internal25concurrent_vector_base_v337internal_grow_to_at_least_with_resultEmmPFvPvPKvmES4_;
+
+/* tbb_thread */
+_ZN3tbb8internal13tbb_thread_v320hardware_concurrencyEv;
+_ZN3tbb8internal13tbb_thread_v36detachEv;
+_ZN3tbb8internal16thread_get_id_v3Ev;
+_ZN3tbb8internal15free_closure_v3EPv;
+_ZN3tbb8internal13tbb_thread_v34joinEv;
+_ZN3tbb8internal13tbb_thread_v314internal_startEPFPvS2_ES2_;
+_ZN3tbb8internal19allocate_closure_v3Em;
+_ZN3tbb8internal7move_v3ERNS0_13tbb_thread_v3ES2_;
+_ZN3tbb8internal15thread_yield_v3Ev;
+_ZN3tbb8internal15thread_sleep_v3ERKNS_10tick_count10interval_tE;
+
+local:
+
+/* TBB symbols */
+*3tbb*;
+*__TBB*;
+
+/* Intel Compiler (libirc) symbols */
+__intel_*;
+_intel_*;
+get_msg_buf;
+get_text_buf;
+message_catalog;
+print_buf;
+irc__get_msg;
+irc__print;
+
+};
diff --git a/dep/tbb/src/tbb/lin64ipf-tbb-export.def b/dep/tbb/src/tbb/lin64ipf-tbb-export.def
new file mode 100644
index 000000000..22514d8f2
--- /dev/null
+++ b/dep/tbb/src/tbb/lin64ipf-tbb-export.def
@@ -0,0 +1,355 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "tbb/tbb_config.h"
+
+{
+global:
+
+/* cache_aligned_allocator.cpp */
+_ZN3tbb8internal12NFS_AllocateEmmPv;
+_ZN3tbb8internal15NFS_GetLineSizeEv;
+_ZN3tbb8internal8NFS_FreeEPv;
+_ZN3tbb8internal23allocate_via_handler_v3Em;
+_ZN3tbb8internal25deallocate_via_handler_v3EPv;
+_ZN3tbb8internal17is_malloc_used_v3Ev;
+
+/* task.cpp v3 */
+_ZN3tbb4task13note_affinityEt;
+_ZN3tbb4task22internal_set_ref_countEi;
+_ZN3tbb4task28internal_decrement_ref_countEv;
+_ZN3tbb4task22spawn_and_wait_for_allERNS_9task_listE;
+_ZN3tbb4task4selfEv;
+_ZN3tbb4task7destroyERS0_;
+_ZNK3tbb4task26is_owned_by_current_threadEv;
+_ZN3tbb8internal19allocate_root_proxy4freeERNS_4taskE;
+_ZN3tbb8internal19allocate_root_proxy8allocateEm;
+_ZN3tbb8internal28affinity_partitioner_base_v36resizeEj;
+_ZNK3tbb8internal20allocate_child_proxy4freeERNS_4taskE;
+_ZNK3tbb8internal20allocate_child_proxy8allocateEm;
+_ZNK3tbb8internal27allocate_continuation_proxy4freeERNS_4taskE;
+_ZNK3tbb8internal27allocate_continuation_proxy8allocateEm;
+_ZNK3tbb8internal34allocate_additional_child_of_proxy4freeERNS_4taskE;
+_ZNK3tbb8internal34allocate_additional_child_of_proxy8allocateEm;
+_ZTIN3tbb4taskE;
+_ZTSN3tbb4taskE;
+_ZTVN3tbb4taskE;
+_ZN3tbb19task_scheduler_init19default_num_threadsEv;
+_ZN3tbb19task_scheduler_init10initializeEim;
+_ZN3tbb19task_scheduler_init10initializeEi;
+_ZN3tbb19task_scheduler_init9terminateEv;
+_ZN3tbb8internal26task_scheduler_observer_v37observeEb;
+_ZN3tbb10empty_task7executeEv;
+_ZN3tbb10empty_taskD0Ev;
+_ZN3tbb10empty_taskD1Ev;
+_ZTIN3tbb10empty_taskE;
+_ZTSN3tbb10empty_taskE;
+_ZTVN3tbb10empty_taskE;
+
+/* exception handling support */
+#if __TBB_EXCEPTIONS
+_ZNK3tbb8internal32allocate_root_with_context_proxy8allocateEm;
+_ZNK3tbb8internal32allocate_root_with_context_proxy4freeERNS_4taskE;
+_ZNK3tbb18task_group_context28is_group_execution_cancelledEv;
+_ZN3tbb18task_group_context22cancel_group_executionEv;
+_ZN3tbb18task_group_context26register_pending_exceptionEv;
+_ZN3tbb18task_group_context5resetEv;
+_ZN3tbb18task_group_context4initEv;
+_ZN3tbb18task_group_contextD1Ev;
+_ZN3tbb18task_group_contextD2Ev;
+_ZNK3tbb18captured_exception4nameEv;
+_ZNK3tbb18captured_exception4whatEv;
+_ZN3tbb18captured_exception10throw_selfEv;
+_ZN3tbb18captured_exception3setEPKcS2_;
+_ZN3tbb18captured_exception4moveEv;
+_ZN3tbb18captured_exception5clearEv;
+_ZN3tbb18captured_exception7destroyEv;
+_ZN3tbb18captured_exception8allocateEPKcS2_;
+_ZN3tbb18captured_exceptionD0Ev;
+_ZN3tbb18captured_exceptionD1Ev;
+_ZTIN3tbb18captured_exceptionE;
+_ZTSN3tbb18captured_exceptionE;
+_ZTVN3tbb18captured_exceptionE;
+_ZN3tbb13tbb_exceptionD2Ev;
+_ZTIN3tbb13tbb_exceptionE;
+_ZTSN3tbb13tbb_exceptionE;
+_ZTVN3tbb13tbb_exceptionE;
+_ZN3tbb14bad_last_allocD0Ev;
+_ZN3tbb14bad_last_allocD1Ev;
+_ZNK3tbb14bad_last_alloc4whatEv;
+_ZTIN3tbb14bad_last_allocE;
+_ZTSN3tbb14bad_last_allocE;
+_ZTVN3tbb14bad_last_allocE;
+#endif /* __TBB_EXCEPTIONS */
+
+/* tbb_misc.cpp */
+_ZN3tbb17assertion_failureEPKciS1_S1_;
+_ZN3tbb21set_assertion_handlerEPFvPKciS1_S1_E;
+_ZN3tbb8internal36get_initial_auto_partitioner_divisorEv;
+_ZN3tbb8internal13handle_perrorEiPKc;
+_ZN3tbb8internal15runtime_warningEPKcz;
+TBB_runtime_interface_version;
+_ZN3tbb8internal33throw_bad_last_alloc_exception_v4Ev;
+
+/* itt_notify.cpp */
+_ZN3tbb8internal32itt_load_pointer_with_acquire_v3EPKv;
+_ZN3tbb8internal33itt_store_pointer_with_release_v3EPvS1_;
+_ZN3tbb8internal20itt_set_sync_name_v3EPvPKc;
+_ZN3tbb8internal19itt_load_pointer_v3EPKv;
+
+/* pipeline.cpp */
+_ZTIN3tbb6filterE;
+_ZTSN3tbb6filterE;
+_ZTVN3tbb6filterE;
+_ZN3tbb6filterD2Ev;
+_ZN3tbb8pipeline10add_filterERNS_6filterE;
+_ZN3tbb8pipeline12inject_tokenERNS_4taskE;
+_ZN3tbb8pipeline13remove_filterERNS_6filterE;
+_ZN3tbb8pipeline3runEm;
+#if __TBB_EXCEPTIONS
+_ZN3tbb8pipeline3runEmRNS_18task_group_contextE;
+#endif
+_ZN3tbb8pipeline5clearEv;
+_ZN3tbb19thread_bound_filter12process_itemEv;
+_ZN3tbb19thread_bound_filter16try_process_itemEv;
+_ZTIN3tbb8pipelineE;
+_ZTSN3tbb8pipelineE;
+_ZTVN3tbb8pipelineE;
+_ZN3tbb8pipelineC1Ev;
+_ZN3tbb8pipelineC2Ev;
+_ZN3tbb8pipelineD0Ev;
+_ZN3tbb8pipelineD1Ev;
+_ZN3tbb8pipelineD2Ev;
+
+/* queuing_rw_mutex.cpp */
+_ZN3tbb16queuing_rw_mutex18internal_constructEv;
+_ZN3tbb16queuing_rw_mutex11scoped_lock17upgrade_to_writerEv;
+_ZN3tbb16queuing_rw_mutex11scoped_lock19downgrade_to_readerEv;
+_ZN3tbb16queuing_rw_mutex11scoped_lock7acquireERS0_b;
+_ZN3tbb16queuing_rw_mutex11scoped_lock7releaseEv;
+_ZN3tbb16queuing_rw_mutex11scoped_lock11try_acquireERS0_b;
+
+#if !TBB_NO_LEGACY
+/* spin_rw_mutex.cpp v2 */
+_ZN3tbb13spin_rw_mutex16internal_upgradeEPS0_;
+_ZN3tbb13spin_rw_mutex22internal_itt_releasingEPS0_;
+_ZN3tbb13spin_rw_mutex23internal_acquire_readerEPS0_;
+_ZN3tbb13spin_rw_mutex23internal_acquire_writerEPS0_;
+_ZN3tbb13spin_rw_mutex18internal_downgradeEPS0_;
+_ZN3tbb13spin_rw_mutex23internal_release_readerEPS0_;
+_ZN3tbb13spin_rw_mutex23internal_release_writerEPS0_;
+_ZN3tbb13spin_rw_mutex27internal_try_acquire_readerEPS0_;
+_ZN3tbb13spin_rw_mutex27internal_try_acquire_writerEPS0_;
+#endif
+
+/* spin_rw_mutex v3 */
+_ZN3tbb16spin_rw_mutex_v318internal_constructEv;
+_ZN3tbb16spin_rw_mutex_v316internal_upgradeEv;
+_ZN3tbb16spin_rw_mutex_v318internal_downgradeEv;
+_ZN3tbb16spin_rw_mutex_v323internal_acquire_readerEv;
+_ZN3tbb16spin_rw_mutex_v323internal_acquire_writerEv;
+_ZN3tbb16spin_rw_mutex_v323internal_release_readerEv;
+_ZN3tbb16spin_rw_mutex_v323internal_release_writerEv;
+_ZN3tbb16spin_rw_mutex_v327internal_try_acquire_readerEv;
+_ZN3tbb16spin_rw_mutex_v327internal_try_acquire_writerEv;
+
+/* spin_mutex.cpp */
+_ZN3tbb10spin_mutex18internal_constructEv;
+_ZN3tbb10spin_mutex11scoped_lock16internal_acquireERS0_;
+_ZN3tbb10spin_mutex11scoped_lock16internal_releaseEv;
+_ZN3tbb10spin_mutex11scoped_lock20internal_try_acquireERS0_;
+
+/* mutex.cpp */
+_ZN3tbb5mutex11scoped_lock16internal_acquireERS0_;
+_ZN3tbb5mutex11scoped_lock16internal_releaseEv;
+_ZN3tbb5mutex11scoped_lock20internal_try_acquireERS0_;
+_ZN3tbb5mutex16internal_destroyEv;
+_ZN3tbb5mutex18internal_constructEv;
+
+/* recursive_mutex.cpp */
+_ZN3tbb15recursive_mutex11scoped_lock16internal_acquireERS0_;
+_ZN3tbb15recursive_mutex11scoped_lock16internal_releaseEv;
+_ZN3tbb15recursive_mutex11scoped_lock20internal_try_acquireERS0_;
+_ZN3tbb15recursive_mutex16internal_destroyEv;
+_ZN3tbb15recursive_mutex18internal_constructEv;
+
+/* QueuingMutex.cpp */
+_ZN3tbb13queuing_mutex18internal_constructEv;
+_ZN3tbb13queuing_mutex11scoped_lock7acquireERS0_;
+_ZN3tbb13queuing_mutex11scoped_lock7releaseEv;
+_ZN3tbb13queuing_mutex11scoped_lock11try_acquireERS0_;
+
+#if !TBB_NO_LEGACY
+/* concurrent_hash_map */
+_ZNK3tbb8internal21hash_map_segment_base23internal_grow_predicateEv;
+
+/* concurrent_queue.cpp v2 */
+_ZN3tbb8internal21concurrent_queue_base12internal_popEPv;
+_ZN3tbb8internal21concurrent_queue_base13internal_pushEPKv;
+_ZN3tbb8internal21concurrent_queue_base21internal_set_capacityElm;
+_ZN3tbb8internal21concurrent_queue_base23internal_pop_if_presentEPv;
+_ZN3tbb8internal21concurrent_queue_base25internal_push_if_not_fullEPKv;
+_ZN3tbb8internal21concurrent_queue_baseC2Em;
+_ZN3tbb8internal21concurrent_queue_baseD2Ev;
+_ZTIN3tbb8internal21concurrent_queue_baseE;
+_ZTSN3tbb8internal21concurrent_queue_baseE;
+_ZTVN3tbb8internal21concurrent_queue_baseE;
+_ZN3tbb8internal30concurrent_queue_iterator_base6assignERKS1_;
+_ZN3tbb8internal30concurrent_queue_iterator_base7advanceEv;
+_ZN3tbb8internal30concurrent_queue_iterator_baseC2ERKNS0_21concurrent_queue_baseE;
+_ZN3tbb8internal30concurrent_queue_iterator_baseD2Ev;
+_ZNK3tbb8internal21concurrent_queue_base13internal_sizeEv;
+#endif
+
+/* concurrent_queue v3 */
+/* constructors */
+_ZN3tbb8internal24concurrent_queue_base_v3C2Em;
+_ZN3tbb8internal33concurrent_queue_iterator_base_v3C2ERKNS0_24concurrent_queue_base_v3E;
+/* destructors */
+_ZN3tbb8internal24concurrent_queue_base_v3D2Ev;
+_ZN3tbb8internal33concurrent_queue_iterator_base_v3D2Ev;
+/* typeinfo */
+_ZTIN3tbb8internal24concurrent_queue_base_v3E;
+_ZTSN3tbb8internal24concurrent_queue_base_v3E;
+/* vtable */
+_ZTVN3tbb8internal24concurrent_queue_base_v3E;
+/* methods */
+_ZN3tbb8internal33concurrent_queue_iterator_base_v36assignERKS1_;
+_ZN3tbb8internal33concurrent_queue_iterator_base_v37advanceEv;
+_ZN3tbb8internal24concurrent_queue_base_v313internal_pushEPKv;
+_ZN3tbb8internal24concurrent_queue_base_v325internal_push_if_not_fullEPKv;
+_ZN3tbb8internal24concurrent_queue_base_v312internal_popEPv;
+_ZN3tbb8internal24concurrent_queue_base_v323internal_pop_if_presentEPv;
+_ZN3tbb8internal24concurrent_queue_base_v321internal_finish_clearEv;
+_ZN3tbb8internal24concurrent_queue_base_v321internal_set_capacityElm;
+_ZNK3tbb8internal24concurrent_queue_base_v313internal_sizeEv;
+_ZNK3tbb8internal24concurrent_queue_base_v314internal_emptyEv;
+_ZNK3tbb8internal24concurrent_queue_base_v324internal_throw_exceptionEv;
+_ZN3tbb8internal24concurrent_queue_base_v36assignERKS1_;
+
+#if !TBB_NO_LEGACY
+/* concurrent_vector.cpp v2 */
+_ZN3tbb8internal22concurrent_vector_base13internal_copyERKS1_mPFvPvPKvmE;
+_ZN3tbb8internal22concurrent_vector_base14internal_clearEPFvPvmEb;
+_ZN3tbb8internal22concurrent_vector_base15internal_assignERKS1_mPFvPvmEPFvS4_PKvmESA_;
+_ZN3tbb8internal22concurrent_vector_base16internal_grow_byEmmPFvPvmE;
+_ZN3tbb8internal22concurrent_vector_base16internal_reserveEmmm;
+_ZN3tbb8internal22concurrent_vector_base18internal_push_backEmRm;
+_ZN3tbb8internal22concurrent_vector_base25internal_grow_to_at_leastEmmPFvPvmE;
+_ZNK3tbb8internal22concurrent_vector_base17internal_capacityEv;
+#endif
+
+/* concurrent_vector v3 */
+_ZN3tbb8internal25concurrent_vector_base_v313internal_copyERKS1_mPFvPvPKvmE;
+_ZN3tbb8internal25concurrent_vector_base_v314internal_clearEPFvPvmE;
+_ZN3tbb8internal25concurrent_vector_base_v315internal_assignERKS1_mPFvPvmEPFvS4_PKvmESA_;
+_ZN3tbb8internal25concurrent_vector_base_v316internal_grow_byEmmPFvPvPKvmES4_;
+_ZN3tbb8internal25concurrent_vector_base_v316internal_reserveEmmm;
+_ZN3tbb8internal25concurrent_vector_base_v318internal_push_backEmRm;
+_ZN3tbb8internal25concurrent_vector_base_v325internal_grow_to_at_leastEmmPFvPvPKvmES4_;
+_ZNK3tbb8internal25concurrent_vector_base_v317internal_capacityEv;
+_ZN3tbb8internal25concurrent_vector_base_v316internal_compactEmPvPFvS2_mEPFvS2_PKvmE;
+_ZN3tbb8internal25concurrent_vector_base_v313internal_swapERS1_;
+_ZNK3tbb8internal25concurrent_vector_base_v324internal_throw_exceptionEm;
+_ZN3tbb8internal25concurrent_vector_base_v3D2Ev;
+_ZN3tbb8internal25concurrent_vector_base_v315internal_resizeEmmmPKvPFvPvmEPFvS4_S3_mE;
+_ZN3tbb8internal25concurrent_vector_base_v337internal_grow_to_at_least_with_resultEmmPFvPvPKvmES4_;
+
+/* tbb_thread */
+_ZN3tbb8internal13tbb_thread_v320hardware_concurrencyEv;
+_ZN3tbb8internal13tbb_thread_v36detachEv;
+_ZN3tbb8internal16thread_get_id_v3Ev;
+_ZN3tbb8internal15free_closure_v3EPv;
+_ZN3tbb8internal13tbb_thread_v34joinEv;
+_ZN3tbb8internal13tbb_thread_v314internal_startEPFPvS2_ES2_;
+_ZN3tbb8internal19allocate_closure_v3Em;
+_ZN3tbb8internal7move_v3ERNS0_13tbb_thread_v3ES2_;
+_ZN3tbb8internal15thread_yield_v3Ev;
+_ZN3tbb8internal15thread_sleep_v3ERKNS_10tick_count10interval_tE;
+
+/* asm functions */
+__TBB_machine_fetchadd1__TBB_full_fence;
+__TBB_machine_fetchadd2__TBB_full_fence;
+__TBB_machine_fetchadd4__TBB_full_fence;
+__TBB_machine_fetchadd8__TBB_full_fence;
+__TBB_machine_fetchstore1__TBB_full_fence;
+__TBB_machine_fetchstore2__TBB_full_fence;
+__TBB_machine_fetchstore4__TBB_full_fence;
+__TBB_machine_fetchstore8__TBB_full_fence;
+__TBB_machine_fetchadd1acquire;
+__TBB_machine_fetchadd1release;
+__TBB_machine_fetchadd2acquire;
+__TBB_machine_fetchadd2release;
+__TBB_machine_fetchadd4acquire;
+__TBB_machine_fetchadd4release;
+__TBB_machine_fetchadd8acquire;
+__TBB_machine_fetchadd8release;
+__TBB_machine_fetchstore1acquire;
+__TBB_machine_fetchstore1release;
+__TBB_machine_fetchstore2acquire;
+__TBB_machine_fetchstore2release;
+__TBB_machine_fetchstore4acquire;
+__TBB_machine_fetchstore4release;
+__TBB_machine_fetchstore8acquire;
+__TBB_machine_fetchstore8release;
+__TBB_machine_cmpswp1acquire;
+__TBB_machine_cmpswp1release;
+__TBB_machine_cmpswp1__TBB_full_fence;
+__TBB_machine_cmpswp2acquire;
+__TBB_machine_cmpswp2release;
+__TBB_machine_cmpswp2__TBB_full_fence;
+__TBB_machine_cmpswp4acquire;
+__TBB_machine_cmpswp4release;
+__TBB_machine_cmpswp4__TBB_full_fence;
+__TBB_machine_cmpswp8acquire;
+__TBB_machine_cmpswp8release;
+__TBB_machine_cmpswp8__TBB_full_fence;
+__TBB_machine_lg;
+__TBB_machine_lockbyte;
+__TBB_machine_pause;
+__TBB_machine_trylockbyte;
+
+local:
+
+/* TBB symbols */
+*3tbb*;
+*__TBB*;
+
+/* Intel Compiler (libirc) symbols */
+__intel_*;
+_intel_*;
+?0_memcopyA;
+?0_memcopyDu;
+?0_memcpyD;
+?1__memcpy;
+?1__memmove;
+?1__serial_memmove;
+memcpy;
+memset;
+
+};
diff --git a/dep/tbb/src/tbb/mac32-tbb-export.def b/dep/tbb/src/tbb/mac32-tbb-export.def
new file mode 100644
index 000000000..9366805e0
--- /dev/null
+++ b/dep/tbb/src/tbb/mac32-tbb-export.def
@@ -0,0 +1,294 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+# cache_aligned_allocator.cpp
+__ZN3tbb8internal12NFS_AllocateEmmPv
+__ZN3tbb8internal15NFS_GetLineSizeEv
+__ZN3tbb8internal8NFS_FreeEPv
+__ZN3tbb8internal23allocate_via_handler_v3Em
+__ZN3tbb8internal25deallocate_via_handler_v3EPv
+__ZN3tbb8internal17is_malloc_used_v3Ev
+
+# task.cpp v3
+__ZN3tbb4task13note_affinityEt
+__ZN3tbb4task22internal_set_ref_countEi
+__ZN3tbb4task28internal_decrement_ref_countEv
+__ZN3tbb4task22spawn_and_wait_for_allERNS_9task_listE
+__ZN3tbb4task4selfEv
+__ZN3tbb4task7destroyERS0_
+__ZNK3tbb4task26is_owned_by_current_threadEv
+__ZN3tbb8internal19allocate_root_proxy4freeERNS_4taskE
+__ZN3tbb8internal19allocate_root_proxy8allocateEm
+__ZN3tbb8internal28affinity_partitioner_base_v36resizeEj
+__ZN3tbb8internal36get_initial_auto_partitioner_divisorEv
+__ZNK3tbb8internal20allocate_child_proxy4freeERNS_4taskE
+__ZNK3tbb8internal20allocate_child_proxy8allocateEm
+__ZNK3tbb8internal27allocate_continuation_proxy4freeERNS_4taskE
+__ZNK3tbb8internal27allocate_continuation_proxy8allocateEm
+__ZNK3tbb8internal34allocate_additional_child_of_proxy4freeERNS_4taskE
+__ZNK3tbb8internal34allocate_additional_child_of_proxy8allocateEm
+__ZTIN3tbb4taskE
+__ZTSN3tbb4taskE
+__ZTVN3tbb4taskE
+__ZN3tbb19task_scheduler_init19default_num_threadsEv
+__ZN3tbb19task_scheduler_init10initializeEim
+__ZN3tbb19task_scheduler_init10initializeEi
+__ZN3tbb19task_scheduler_init9terminateEv
+__ZN3tbb8internal26task_scheduler_observer_v37observeEb
+__ZN3tbb10empty_task7executeEv
+__ZN3tbb10empty_taskD0Ev
+__ZN3tbb10empty_taskD1Ev
+__ZTIN3tbb10empty_taskE
+__ZTSN3tbb10empty_taskE
+__ZTVN3tbb10empty_taskE
+
+# exception handling support
+__ZNK3tbb8internal32allocate_root_with_context_proxy8allocateEm
+__ZNK3tbb8internal32allocate_root_with_context_proxy4freeERNS_4taskE
+__ZNK3tbb18task_group_context28is_group_execution_cancelledEv
+__ZN3tbb18task_group_context22cancel_group_executionEv
+__ZN3tbb18task_group_context26register_pending_exceptionEv
+__ZN3tbb18task_group_context5resetEv
+__ZN3tbb18task_group_context4initEv
+__ZN3tbb18task_group_contextD1Ev
+__ZN3tbb18task_group_contextD2Ev
+__ZNK3tbb18captured_exception4nameEv
+__ZNK3tbb18captured_exception4whatEv
+__ZN3tbb18captured_exception10throw_selfEv
+__ZN3tbb18captured_exception3setEPKcS2_
+__ZN3tbb18captured_exception4moveEv
+__ZN3tbb18captured_exception5clearEv
+__ZN3tbb18captured_exception7destroyEv
+__ZN3tbb18captured_exception8allocateEPKcS2_
+__ZN3tbb18captured_exceptionD0Ev
+__ZN3tbb18captured_exceptionD1Ev
+__ZTIN3tbb18captured_exceptionE
+__ZTSN3tbb18captured_exceptionE
+__ZTVN3tbb18captured_exceptionE
+__ZTIN3tbb13tbb_exceptionE
+__ZTSN3tbb13tbb_exceptionE
+__ZTVN3tbb13tbb_exceptionE
+__ZN3tbb14bad_last_allocD0Ev
+__ZN3tbb14bad_last_allocD1Ev
+__ZNK3tbb14bad_last_alloc4whatEv
+__ZTIN3tbb14bad_last_allocE
+__ZTSN3tbb14bad_last_allocE
+__ZTVN3tbb14bad_last_allocE
+
+# Symbols for std exception classes thrown from TBB
+__ZNSt11range_errorD1Ev
+__ZNSt12length_errorD1Ev
+__ZNSt12out_of_rangeD1Ev
+__ZTISt11range_error
+__ZTISt12length_error
+__ZTISt12out_of_range
+__ZTSSt11range_error
+__ZTSSt12length_error
+__ZTSSt12out_of_range
+
+# tbb_misc.cpp
+__ZN3tbb17assertion_failureEPKciS1_S1_
+__ZN3tbb21set_assertion_handlerEPFvPKciS1_S1_E
+__ZN3tbb8internal13handle_perrorEiPKc
+__ZN3tbb8internal15runtime_warningEPKcz
+___TBB_machine_store8_slow_perf_warning
+___TBB_machine_store8_slow
+_TBB_runtime_interface_version
+__ZN3tbb8internal33throw_bad_last_alloc_exception_v4Ev
+
+# itt_notify.cpp
+__ZN3tbb8internal32itt_load_pointer_with_acquire_v3EPKv
+__ZN3tbb8internal33itt_store_pointer_with_release_v3EPvS1_
+__ZN3tbb8internal19itt_load_pointer_v3EPKv
+__ZN3tbb8internal20itt_set_sync_name_v3EPvPKc
+
+# pipeline.cpp
+__ZTIN3tbb6filterE
+__ZTSN3tbb6filterE
+__ZTVN3tbb6filterE
+__ZN3tbb6filterD2Ev
+__ZN3tbb8pipeline10add_filterERNS_6filterE
+__ZN3tbb8pipeline12inject_tokenERNS_4taskE
+__ZN3tbb8pipeline13remove_filterERNS_6filterE
+__ZN3tbb8pipeline3runEm
+__ZN3tbb8pipeline3runEmRNS_18task_group_contextE
+__ZN3tbb8pipeline5clearEv
+__ZN3tbb19thread_bound_filter12process_itemEv
+__ZN3tbb19thread_bound_filter16try_process_itemEv
+__ZN3tbb8pipelineC1Ev
+__ZN3tbb8pipelineC2Ev
+__ZN3tbb8pipelineD0Ev
+__ZN3tbb8pipelineD1Ev
+__ZN3tbb8pipelineD2Ev
+__ZTIN3tbb8pipelineE
+__ZTSN3tbb8pipelineE
+__ZTVN3tbb8pipelineE
+
+# queuing_rw_mutex.cpp
+__ZN3tbb16queuing_rw_mutex11scoped_lock17upgrade_to_writerEv
+__ZN3tbb16queuing_rw_mutex11scoped_lock19downgrade_to_readerEv
+__ZN3tbb16queuing_rw_mutex11scoped_lock7acquireERS0_b
+__ZN3tbb16queuing_rw_mutex11scoped_lock7releaseEv
+__ZN3tbb16queuing_rw_mutex11scoped_lock11try_acquireERS0_b
+__ZN3tbb16queuing_rw_mutex18internal_constructEv
+
+#if !TBB_NO_LEGACY
+# spin_rw_mutex.cpp v2
+__ZN3tbb13spin_rw_mutex16internal_upgradeEPS0_
+__ZN3tbb13spin_rw_mutex22internal_itt_releasingEPS0_
+__ZN3tbb13spin_rw_mutex23internal_acquire_readerEPS0_
+__ZN3tbb13spin_rw_mutex23internal_acquire_writerEPS0_
+__ZN3tbb13spin_rw_mutex18internal_downgradeEPS0_
+__ZN3tbb13spin_rw_mutex23internal_release_readerEPS0_
+__ZN3tbb13spin_rw_mutex23internal_release_writerEPS0_
+__ZN3tbb13spin_rw_mutex27internal_try_acquire_readerEPS0_
+__ZN3tbb13spin_rw_mutex27internal_try_acquire_writerEPS0_
+#endif
+
+# spin_rw_mutex v3
+__ZN3tbb16spin_rw_mutex_v316internal_upgradeEv
+__ZN3tbb16spin_rw_mutex_v318internal_downgradeEv
+__ZN3tbb16spin_rw_mutex_v323internal_acquire_readerEv
+__ZN3tbb16spin_rw_mutex_v323internal_acquire_writerEv
+__ZN3tbb16spin_rw_mutex_v323internal_release_readerEv
+__ZN3tbb16spin_rw_mutex_v323internal_release_writerEv
+__ZN3tbb16spin_rw_mutex_v327internal_try_acquire_readerEv
+__ZN3tbb16spin_rw_mutex_v327internal_try_acquire_writerEv
+__ZN3tbb16spin_rw_mutex_v318internal_constructEv
+
+# spin_mutex.cpp
+__ZN3tbb10spin_mutex11scoped_lock16internal_acquireERS0_
+__ZN3tbb10spin_mutex11scoped_lock16internal_releaseEv
+__ZN3tbb10spin_mutex11scoped_lock20internal_try_acquireERS0_
+__ZN3tbb10spin_mutex18internal_constructEv
+
+# mutex.cpp
+__ZN3tbb5mutex11scoped_lock16internal_acquireERS0_
+__ZN3tbb5mutex11scoped_lock16internal_releaseEv
+__ZN3tbb5mutex11scoped_lock20internal_try_acquireERS0_
+__ZN3tbb5mutex16internal_destroyEv
+__ZN3tbb5mutex18internal_constructEv
+
+# recursive_mutex.cpp
+__ZN3tbb15recursive_mutex11scoped_lock16internal_acquireERS0_
+__ZN3tbb15recursive_mutex11scoped_lock16internal_releaseEv
+__ZN3tbb15recursive_mutex11scoped_lock20internal_try_acquireERS0_
+__ZN3tbb15recursive_mutex16internal_destroyEv
+__ZN3tbb15recursive_mutex18internal_constructEv
+
+# queuing_mutex.cpp
+__ZN3tbb13queuing_mutex11scoped_lock7acquireERS0_
+__ZN3tbb13queuing_mutex11scoped_lock7releaseEv
+__ZN3tbb13queuing_mutex11scoped_lock11try_acquireERS0_
+__ZN3tbb13queuing_mutex18internal_constructEv
+
+#if !TBB_NO_LEGACY
+# concurrent_hash_map
+__ZNK3tbb8internal21hash_map_segment_base23internal_grow_predicateEv
+
+# concurrent_queue.cpp v2
+__ZN3tbb8internal21concurrent_queue_base12internal_popEPv
+__ZN3tbb8internal21concurrent_queue_base13internal_pushEPKv
+__ZN3tbb8internal21concurrent_queue_base21internal_set_capacityEim
+__ZN3tbb8internal21concurrent_queue_base23internal_pop_if_presentEPv
+__ZN3tbb8internal21concurrent_queue_base25internal_push_if_not_fullEPKv
+__ZN3tbb8internal21concurrent_queue_baseC2Em
+__ZN3tbb8internal21concurrent_queue_baseD2Ev
+__ZTIN3tbb8internal21concurrent_queue_baseE
+__ZTSN3tbb8internal21concurrent_queue_baseE
+__ZTVN3tbb8internal21concurrent_queue_baseE
+__ZN3tbb8internal30concurrent_queue_iterator_base6assignERKS1_
+__ZN3tbb8internal30concurrent_queue_iterator_base7advanceEv
+__ZN3tbb8internal30concurrent_queue_iterator_baseC2ERKNS0_21concurrent_queue_baseE
+__ZN3tbb8internal30concurrent_queue_iterator_baseD2Ev
+__ZNK3tbb8internal21concurrent_queue_base13internal_sizeEv
+#endif
+
+# concurrent_queue v3
+# constructors
+__ZN3tbb8internal33concurrent_queue_iterator_base_v3C2ERKNS0_24concurrent_queue_base_v3E
+__ZN3tbb8internal24concurrent_queue_base_v3C2Em
+# destructors
+__ZN3tbb8internal33concurrent_queue_iterator_base_v3D2Ev
+__ZN3tbb8internal24concurrent_queue_base_v3D2Ev
+# typeinfo
+__ZTIN3tbb8internal24concurrent_queue_base_v3E
+__ZTSN3tbb8internal24concurrent_queue_base_v3E
+#vtable
+__ZTVN3tbb8internal24concurrent_queue_base_v3E
+# methods
+__ZN3tbb8internal33concurrent_queue_iterator_base_v37advanceEv
+__ZN3tbb8internal33concurrent_queue_iterator_base_v36assignERKS1_
+__ZN3tbb8internal24concurrent_queue_base_v313internal_pushEPKv
+__ZN3tbb8internal24concurrent_queue_base_v325internal_push_if_not_fullEPKv
+__ZN3tbb8internal24concurrent_queue_base_v312internal_popEPv
+__ZN3tbb8internal24concurrent_queue_base_v323internal_pop_if_presentEPv
+__ZN3tbb8internal24concurrent_queue_base_v321internal_set_capacityEim
+__ZNK3tbb8internal24concurrent_queue_base_v313internal_sizeEv
+__ZNK3tbb8internal24concurrent_queue_base_v314internal_emptyEv
+__ZN3tbb8internal24concurrent_queue_base_v321internal_finish_clearEv
+__ZNK3tbb8internal24concurrent_queue_base_v324internal_throw_exceptionEv
+__ZN3tbb8internal24concurrent_queue_base_v36assignERKS1_
+
+#if !TBB_NO_LEGACY
+# concurrent_vector.cpp v2
+__ZN3tbb8internal22concurrent_vector_base13internal_copyERKS1_mPFvPvPKvmE
+__ZN3tbb8internal22concurrent_vector_base14internal_clearEPFvPvmEb
+__ZN3tbb8internal22concurrent_vector_base15internal_assignERKS1_mPFvPvmEPFvS4_PKvmESA_
+__ZN3tbb8internal22concurrent_vector_base16internal_grow_byEmmPFvPvmE
+__ZN3tbb8internal22concurrent_vector_base16internal_reserveEmmm
+__ZN3tbb8internal22concurrent_vector_base18internal_push_backEmRm
+__ZN3tbb8internal22concurrent_vector_base25internal_grow_to_at_leastEmmPFvPvmE
+__ZNK3tbb8internal22concurrent_vector_base17internal_capacityEv
+#endif
+
+# concurrent_vector v3
+__ZN3tbb8internal25concurrent_vector_base_v313internal_copyERKS1_mPFvPvPKvmE
+__ZN3tbb8internal25concurrent_vector_base_v314internal_clearEPFvPvmE
+__ZN3tbb8internal25concurrent_vector_base_v315internal_assignERKS1_mPFvPvmEPFvS4_PKvmESA_
+__ZN3tbb8internal25concurrent_vector_base_v316internal_grow_byEmmPFvPvPKvmES4_
+__ZN3tbb8internal25concurrent_vector_base_v316internal_reserveEmmm
+__ZN3tbb8internal25concurrent_vector_base_v318internal_push_backEmRm
+__ZN3tbb8internal25concurrent_vector_base_v325internal_grow_to_at_leastEmmPFvPvPKvmES4_
+__ZNK3tbb8internal25concurrent_vector_base_v317internal_capacityEv
+__ZN3tbb8internal25concurrent_vector_base_v316internal_compactEmPvPFvS2_mEPFvS2_PKvmE
+__ZN3tbb8internal25concurrent_vector_base_v313internal_swapERS1_
+__ZNK3tbb8internal25concurrent_vector_base_v324internal_throw_exceptionEm
+__ZN3tbb8internal25concurrent_vector_base_v3D2Ev
+__ZN3tbb8internal25concurrent_vector_base_v315internal_resizeEmmmPKvPFvPvmEPFvS4_S3_mE
+__ZN3tbb8internal25concurrent_vector_base_v337internal_grow_to_at_least_with_resultEmmPFvPvPKvmES4_
+
+# tbb_thread
+__ZN3tbb8internal13tbb_thread_v314internal_startEPFPvS2_ES2_
+__ZN3tbb8internal13tbb_thread_v320hardware_concurrencyEv
+__ZN3tbb8internal13tbb_thread_v34joinEv
+__ZN3tbb8internal13tbb_thread_v36detachEv
+__ZN3tbb8internal15free_closure_v3EPv
+__ZN3tbb8internal15thread_sleep_v3ERKNS_10tick_count10interval_tE
+__ZN3tbb8internal15thread_yield_v3Ev
+__ZN3tbb8internal16thread_get_id_v3Ev
+__ZN3tbb8internal19allocate_closure_v3Em
+__ZN3tbb8internal7move_v3ERNS0_13tbb_thread_v3ES2_
diff --git a/dep/tbb/src/tbb/mac64-tbb-export.def b/dep/tbb/src/tbb/mac64-tbb-export.def
new file mode 100644
index 000000000..c91c8ceb7
--- /dev/null
+++ b/dep/tbb/src/tbb/mac64-tbb-export.def
@@ -0,0 +1,292 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+# cache_aligned_allocator.cpp
+__ZN3tbb8internal12NFS_AllocateEmmPv
+__ZN3tbb8internal15NFS_GetLineSizeEv
+__ZN3tbb8internal8NFS_FreeEPv
+__ZN3tbb8internal23allocate_via_handler_v3Em
+__ZN3tbb8internal25deallocate_via_handler_v3EPv
+__ZN3tbb8internal17is_malloc_used_v3Ev
+
+# task.cpp v3
+__ZN3tbb4task13note_affinityEt
+__ZN3tbb4task22internal_set_ref_countEi
+__ZN3tbb4task28internal_decrement_ref_countEv
+__ZN3tbb4task22spawn_and_wait_for_allERNS_9task_listE
+__ZN3tbb4task4selfEv
+__ZN3tbb4task7destroyERS0_
+__ZNK3tbb4task26is_owned_by_current_threadEv
+__ZN3tbb8internal19allocate_root_proxy4freeERNS_4taskE
+__ZN3tbb8internal19allocate_root_proxy8allocateEm
+__ZN3tbb8internal28affinity_partitioner_base_v36resizeEj
+__ZN3tbb8internal36get_initial_auto_partitioner_divisorEv
+__ZNK3tbb8internal20allocate_child_proxy4freeERNS_4taskE
+__ZNK3tbb8internal20allocate_child_proxy8allocateEm
+__ZNK3tbb8internal27allocate_continuation_proxy4freeERNS_4taskE
+__ZNK3tbb8internal27allocate_continuation_proxy8allocateEm
+__ZNK3tbb8internal34allocate_additional_child_of_proxy4freeERNS_4taskE
+__ZNK3tbb8internal34allocate_additional_child_of_proxy8allocateEm
+__ZTIN3tbb4taskE
+__ZTSN3tbb4taskE
+__ZTVN3tbb4taskE
+__ZN3tbb19task_scheduler_init19default_num_threadsEv
+__ZN3tbb19task_scheduler_init10initializeEim
+__ZN3tbb19task_scheduler_init10initializeEi
+__ZN3tbb19task_scheduler_init9terminateEv
+__ZN3tbb8internal26task_scheduler_observer_v37observeEb
+__ZN3tbb10empty_task7executeEv
+__ZN3tbb10empty_taskD0Ev
+__ZN3tbb10empty_taskD1Ev
+__ZTIN3tbb10empty_taskE
+__ZTSN3tbb10empty_taskE
+__ZTVN3tbb10empty_taskE
+
+# exception handling support
+__ZNK3tbb8internal32allocate_root_with_context_proxy8allocateEm
+__ZNK3tbb8internal32allocate_root_with_context_proxy4freeERNS_4taskE
+__ZNK3tbb18task_group_context28is_group_execution_cancelledEv
+__ZN3tbb18task_group_context22cancel_group_executionEv
+__ZN3tbb18task_group_context26register_pending_exceptionEv
+__ZN3tbb18task_group_context5resetEv
+__ZN3tbb18task_group_context4initEv
+__ZN3tbb18task_group_contextD1Ev
+__ZN3tbb18task_group_contextD2Ev
+__ZNK3tbb18captured_exception4nameEv
+__ZNK3tbb18captured_exception4whatEv
+__ZN3tbb18captured_exception10throw_selfEv
+__ZN3tbb18captured_exception3setEPKcS2_
+__ZN3tbb18captured_exception4moveEv
+__ZN3tbb18captured_exception5clearEv
+__ZN3tbb18captured_exception7destroyEv
+__ZN3tbb18captured_exception8allocateEPKcS2_
+__ZN3tbb18captured_exceptionD0Ev
+__ZN3tbb18captured_exceptionD1Ev
+__ZTIN3tbb18captured_exceptionE
+__ZTSN3tbb18captured_exceptionE
+__ZTVN3tbb18captured_exceptionE
+__ZTIN3tbb13tbb_exceptionE
+__ZTSN3tbb13tbb_exceptionE
+__ZTVN3tbb13tbb_exceptionE
+__ZN3tbb14bad_last_allocD0Ev
+__ZN3tbb14bad_last_allocD1Ev
+__ZNK3tbb14bad_last_alloc4whatEv
+__ZTIN3tbb14bad_last_allocE
+__ZTSN3tbb14bad_last_allocE
+__ZTVN3tbb14bad_last_allocE
+
+# Symbols for std exception classes thrown from TBB
+__ZNSt11range_errorD1Ev
+__ZNSt12length_errorD1Ev
+__ZNSt12out_of_rangeD1Ev
+__ZTISt11range_error
+__ZTISt12length_error
+__ZTISt12out_of_range
+__ZTSSt11range_error
+__ZTSSt12length_error
+__ZTSSt12out_of_range
+
+# tbb_misc.cpp
+__ZN3tbb17assertion_failureEPKciS1_S1_
+__ZN3tbb21set_assertion_handlerEPFvPKciS1_S1_E
+__ZN3tbb8internal13handle_perrorEiPKc
+__ZN3tbb8internal15runtime_warningEPKcz
+__ZN3tbb8internal33throw_bad_last_alloc_exception_v4Ev
+_TBB_runtime_interface_version
+
+# itt_notify.cpp
+__ZN3tbb8internal32itt_load_pointer_with_acquire_v3EPKv
+__ZN3tbb8internal33itt_store_pointer_with_release_v3EPvS1_
+__ZN3tbb8internal19itt_load_pointer_v3EPKv
+__ZN3tbb8internal20itt_set_sync_name_v3EPvPKc
+
+# pipeline.cpp
+__ZTIN3tbb6filterE
+__ZTSN3tbb6filterE
+__ZTVN3tbb6filterE
+__ZN3tbb6filterD2Ev
+__ZN3tbb8pipeline10add_filterERNS_6filterE
+__ZN3tbb8pipeline12inject_tokenERNS_4taskE
+__ZN3tbb8pipeline13remove_filterERNS_6filterE
+__ZN3tbb8pipeline3runEm
+__ZN3tbb8pipeline3runEmRNS_18task_group_contextE
+__ZN3tbb8pipeline5clearEv
+__ZN3tbb19thread_bound_filter12process_itemEv
+__ZN3tbb19thread_bound_filter16try_process_itemEv
+__ZN3tbb8pipelineC1Ev
+__ZN3tbb8pipelineC2Ev
+__ZN3tbb8pipelineD0Ev
+__ZN3tbb8pipelineD1Ev
+__ZN3tbb8pipelineD2Ev
+__ZTIN3tbb8pipelineE
+__ZTSN3tbb8pipelineE
+__ZTVN3tbb8pipelineE
+
+# queuing_rw_mutex.cpp
+__ZN3tbb16queuing_rw_mutex11scoped_lock17upgrade_to_writerEv
+__ZN3tbb16queuing_rw_mutex11scoped_lock19downgrade_to_readerEv
+__ZN3tbb16queuing_rw_mutex11scoped_lock7acquireERS0_b
+__ZN3tbb16queuing_rw_mutex11scoped_lock7releaseEv
+__ZN3tbb16queuing_rw_mutex11scoped_lock11try_acquireERS0_b
+__ZN3tbb16queuing_rw_mutex18internal_constructEv
+
+#if !TBB_NO_LEGACY
+# spin_rw_mutex.cpp v2
+__ZN3tbb13spin_rw_mutex16internal_upgradeEPS0_
+__ZN3tbb13spin_rw_mutex22internal_itt_releasingEPS0_
+__ZN3tbb13spin_rw_mutex23internal_acquire_readerEPS0_
+__ZN3tbb13spin_rw_mutex23internal_acquire_writerEPS0_
+__ZN3tbb13spin_rw_mutex18internal_downgradeEPS0_
+__ZN3tbb13spin_rw_mutex23internal_release_readerEPS0_
+__ZN3tbb13spin_rw_mutex23internal_release_writerEPS0_
+__ZN3tbb13spin_rw_mutex27internal_try_acquire_readerEPS0_
+__ZN3tbb13spin_rw_mutex27internal_try_acquire_writerEPS0_
+#endif
+
+# spin_rw_mutex v3
+__ZN3tbb16spin_rw_mutex_v316internal_upgradeEv
+__ZN3tbb16spin_rw_mutex_v318internal_downgradeEv
+__ZN3tbb16spin_rw_mutex_v323internal_acquire_readerEv
+__ZN3tbb16spin_rw_mutex_v323internal_acquire_writerEv
+__ZN3tbb16spin_rw_mutex_v323internal_release_readerEv
+__ZN3tbb16spin_rw_mutex_v323internal_release_writerEv
+__ZN3tbb16spin_rw_mutex_v327internal_try_acquire_readerEv
+__ZN3tbb16spin_rw_mutex_v327internal_try_acquire_writerEv
+__ZN3tbb16spin_rw_mutex_v318internal_constructEv
+
+# spin_mutex.cpp
+__ZN3tbb10spin_mutex11scoped_lock16internal_acquireERS0_
+__ZN3tbb10spin_mutex11scoped_lock16internal_releaseEv
+__ZN3tbb10spin_mutex11scoped_lock20internal_try_acquireERS0_
+__ZN3tbb10spin_mutex18internal_constructEv
+
+# mutex.cpp
+__ZN3tbb5mutex11scoped_lock16internal_acquireERS0_
+__ZN3tbb5mutex11scoped_lock16internal_releaseEv
+__ZN3tbb5mutex11scoped_lock20internal_try_acquireERS0_
+__ZN3tbb5mutex16internal_destroyEv
+__ZN3tbb5mutex18internal_constructEv
+
+# recursive_mutex.cpp
+__ZN3tbb15recursive_mutex11scoped_lock16internal_acquireERS0_
+__ZN3tbb15recursive_mutex11scoped_lock16internal_releaseEv
+__ZN3tbb15recursive_mutex11scoped_lock20internal_try_acquireERS0_
+__ZN3tbb15recursive_mutex16internal_destroyEv
+__ZN3tbb15recursive_mutex18internal_constructEv
+
+# queuing_mutex.cpp
+__ZN3tbb13queuing_mutex11scoped_lock7acquireERS0_
+__ZN3tbb13queuing_mutex11scoped_lock7releaseEv
+__ZN3tbb13queuing_mutex11scoped_lock11try_acquireERS0_
+__ZN3tbb13queuing_mutex18internal_constructEv
+
+#if !TBB_NO_LEGACY
+# concurrent_hash_map
+__ZNK3tbb8internal21hash_map_segment_base23internal_grow_predicateEv
+
+# concurrent_queue.cpp v2
+__ZN3tbb8internal21concurrent_queue_base12internal_popEPv
+__ZN3tbb8internal21concurrent_queue_base13internal_pushEPKv
+__ZN3tbb8internal21concurrent_queue_base21internal_set_capacityElm
+__ZN3tbb8internal21concurrent_queue_base23internal_pop_if_presentEPv
+__ZN3tbb8internal21concurrent_queue_base25internal_push_if_not_fullEPKv
+__ZN3tbb8internal21concurrent_queue_baseC2Em
+__ZN3tbb8internal21concurrent_queue_baseD2Ev
+__ZTIN3tbb8internal21concurrent_queue_baseE
+__ZTSN3tbb8internal21concurrent_queue_baseE
+__ZTVN3tbb8internal21concurrent_queue_baseE
+__ZN3tbb8internal30concurrent_queue_iterator_base6assignERKS1_
+__ZN3tbb8internal30concurrent_queue_iterator_base7advanceEv
+__ZN3tbb8internal30concurrent_queue_iterator_baseC2ERKNS0_21concurrent_queue_baseE
+__ZN3tbb8internal30concurrent_queue_iterator_baseD2Ev
+__ZNK3tbb8internal21concurrent_queue_base13internal_sizeEv
+#endif
+
+# concurrent_queue v3
+# constructors
+__ZN3tbb8internal33concurrent_queue_iterator_base_v3C2ERKNS0_24concurrent_queue_base_v3E
+__ZN3tbb8internal24concurrent_queue_base_v3C2Em
+# destructors
+__ZN3tbb8internal33concurrent_queue_iterator_base_v3D2Ev
+__ZN3tbb8internal24concurrent_queue_base_v3D2Ev
+# typeinfo
+__ZTIN3tbb8internal24concurrent_queue_base_v3E
+__ZTSN3tbb8internal24concurrent_queue_base_v3E
+#vtable
+__ZTVN3tbb8internal24concurrent_queue_base_v3E
+# methods
+__ZN3tbb8internal33concurrent_queue_iterator_base_v36assignERKS1_
+__ZN3tbb8internal33concurrent_queue_iterator_base_v37advanceEv
+__ZN3tbb8internal24concurrent_queue_base_v313internal_pushEPKv
+__ZN3tbb8internal24concurrent_queue_base_v325internal_push_if_not_fullEPKv
+__ZN3tbb8internal24concurrent_queue_base_v312internal_popEPv
+__ZN3tbb8internal24concurrent_queue_base_v323internal_pop_if_presentEPv
+__ZN3tbb8internal24concurrent_queue_base_v321internal_finish_clearEv
+__ZN3tbb8internal24concurrent_queue_base_v321internal_set_capacityElm
+__ZNK3tbb8internal24concurrent_queue_base_v313internal_sizeEv
+__ZNK3tbb8internal24concurrent_queue_base_v314internal_emptyEv
+__ZNK3tbb8internal24concurrent_queue_base_v324internal_throw_exceptionEv
+__ZN3tbb8internal24concurrent_queue_base_v36assignERKS1_
+
+#if !TBB_NO_LEGACY
+# concurrent_vector.cpp v2
+__ZN3tbb8internal22concurrent_vector_base13internal_copyERKS1_mPFvPvPKvmE
+__ZN3tbb8internal22concurrent_vector_base14internal_clearEPFvPvmEb
+__ZN3tbb8internal22concurrent_vector_base15internal_assignERKS1_mPFvPvmEPFvS4_PKvmESA_
+__ZN3tbb8internal22concurrent_vector_base16internal_grow_byEmmPFvPvmE
+__ZN3tbb8internal22concurrent_vector_base16internal_reserveEmmm
+__ZN3tbb8internal22concurrent_vector_base18internal_push_backEmRm
+__ZN3tbb8internal22concurrent_vector_base25internal_grow_to_at_leastEmmPFvPvmE
+__ZNK3tbb8internal22concurrent_vector_base17internal_capacityEv
+#endif
+
+# concurrent_vector v3
+__ZN3tbb8internal25concurrent_vector_base_v313internal_copyERKS1_mPFvPvPKvmE
+__ZN3tbb8internal25concurrent_vector_base_v314internal_clearEPFvPvmE
+__ZN3tbb8internal25concurrent_vector_base_v315internal_assignERKS1_mPFvPvmEPFvS4_PKvmESA_
+__ZN3tbb8internal25concurrent_vector_base_v316internal_grow_byEmmPFvPvPKvmES4_
+__ZN3tbb8internal25concurrent_vector_base_v316internal_reserveEmmm
+__ZN3tbb8internal25concurrent_vector_base_v318internal_push_backEmRm
+__ZN3tbb8internal25concurrent_vector_base_v325internal_grow_to_at_leastEmmPFvPvPKvmES4_
+__ZNK3tbb8internal25concurrent_vector_base_v317internal_capacityEv
+__ZN3tbb8internal25concurrent_vector_base_v316internal_compactEmPvPFvS2_mEPFvS2_PKvmE
+__ZN3tbb8internal25concurrent_vector_base_v313internal_swapERS1_
+__ZNK3tbb8internal25concurrent_vector_base_v324internal_throw_exceptionEm
+__ZN3tbb8internal25concurrent_vector_base_v3D2Ev
+__ZN3tbb8internal25concurrent_vector_base_v315internal_resizeEmmmPKvPFvPvmEPFvS4_S3_mE
+__ZN3tbb8internal25concurrent_vector_base_v337internal_grow_to_at_least_with_resultEmmPFvPvPKvmES4_
+
+# tbb_thread
+__ZN3tbb8internal13tbb_thread_v320hardware_concurrencyEv
+__ZN3tbb8internal13tbb_thread_v36detachEv
+__ZN3tbb8internal16thread_get_id_v3Ev
+__ZN3tbb8internal15free_closure_v3EPv
+__ZN3tbb8internal13tbb_thread_v34joinEv
+__ZN3tbb8internal13tbb_thread_v314internal_startEPFPvS2_ES2_
+__ZN3tbb8internal19allocate_closure_v3Em
+__ZN3tbb8internal7move_v3ERNS0_13tbb_thread_v3ES2_
+__ZN3tbb8internal15thread_yield_v3Ev
+__ZN3tbb8internal15thread_sleep_v3ERKNS_10tick_count10interval_tE
diff --git a/dep/tbb/src/tbb/mutex.cpp b/dep/tbb/src/tbb/mutex.cpp
new file mode 100644
index 000000000..3c619b69c
--- /dev/null
+++ b/dep/tbb/src/tbb/mutex.cpp
@@ -0,0 +1,148 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "tbb/mutex.h"
+#include "itt_notify.h"
+
+namespace tbb {
+    void mutex::scoped_lock::internal_acquire( mutex& m ) {
+
+#if _WIN32||_WIN64
+        switch( m.state ) {
+        case INITIALIZED: 
+        case HELD:
+            EnterCriticalSection( &m.impl );
+            // If a thread comes here, and another thread holds the lock, it will block
+            // in EnterCriticalSection.  When it returns from EnterCriticalSection,
+            // m.state must be set to INITIALIZED.  If the same thread tries to acquire a lock it
+            // aleady holds, the the lock is in HELD state, thus will cause the assertion to fail.
+            __TBB_ASSERT(m.state!=HELD, "mutex::scoped_lock: deadlock caused by attempt to reacquire held mutex");
+            m.state = HELD;
+            break;
+        case DESTROYED: 
+            __TBB_ASSERT(false,"mutex::scoped_lock: mutex already destroyed"); 
+            break;
+        default: 
+            __TBB_ASSERT(false,"mutex::scoped_lock: illegal mutex state");
+            break;
+        }
+#else
+        int error_code = pthread_mutex_lock(&m.impl);
+        __TBB_ASSERT_EX(!error_code,"mutex::scoped_lock: pthread_mutex_lock failed");
+#endif /* _WIN32||_WIN64 */
+        my_mutex = &m;
+    }
+
+void mutex::scoped_lock::internal_release() {
+    __TBB_ASSERT( my_mutex, "mutex::scoped_lock: not holding a mutex" );
+#if _WIN32||_WIN64    
+     switch( my_mutex->state ) {
+        case INITIALIZED: 
+            __TBB_ASSERT(false,"mutex::scoped_lock: try to release the lock without acquisition");
+            break;
+        case HELD:
+            my_mutex->state = INITIALIZED;
+            LeaveCriticalSection(&my_mutex->impl);
+            break;
+        case DESTROYED: 
+            __TBB_ASSERT(false,"mutex::scoped_lock: mutex already destroyed"); 
+            break;
+        default: 
+            __TBB_ASSERT(false,"mutex::scoped_lock: illegal mutex state");
+            break;
+    }
+#else
+     int error_code = pthread_mutex_unlock(&my_mutex->impl);
+     __TBB_ASSERT_EX(!error_code, "mutex::scoped_lock: pthread_mutex_unlock failed");
+#endif /* _WIN32||_WIN64 */
+     my_mutex = NULL;
+}
+
+bool mutex::scoped_lock::internal_try_acquire( mutex& m ) {
+#if _WIN32||_WIN64
+    switch( m.state ) {
+        case INITIALIZED: 
+        case HELD:
+            break;
+        case DESTROYED: 
+            __TBB_ASSERT(false,"mutex::scoped_lock: mutex already destroyed"); 
+            break;
+        default: 
+            __TBB_ASSERT(false,"mutex::scoped_lock: illegal mutex state");
+            break;
+    }
+#endif /* _WIN32||_WIN64 */
+
+    bool result;
+#if _WIN32||_WIN64
+    result = TryEnterCriticalSection(&m.impl)!=0;
+    if( result ) {
+        __TBB_ASSERT(m.state!=HELD, "mutex::scoped_lock: deadlock caused by attempt to reacquire held mutex");
+        m.state = HELD;
+    }
+#else
+    result = pthread_mutex_trylock(&m.impl)==0;
+#endif /* _WIN32||_WIN64 */
+    if( result ) 
+        my_mutex = &m;
+    return result;
+}
+
+void mutex::internal_construct() {
+#if _WIN32||_WIN64
+    InitializeCriticalSection(&impl);
+    state = INITIALIZED;  
+#else
+    int error_code = pthread_mutex_init(&impl,NULL);
+    if( error_code )
+        tbb::internal::handle_perror(error_code,"mutex: pthread_mutex_init failed");
+#endif /* _WIN32||_WIN64*/    
+    ITT_SYNC_CREATE(&impl, _T("tbb::mutex"), _T(""));
+}
+
+void mutex::internal_destroy() {
+#if _WIN32||_WIN64
+    switch( state ) {
+      case INITIALIZED:
+        DeleteCriticalSection(&impl);
+       break;
+      case DESTROYED: 
+        __TBB_ASSERT(false,"mutex: already destroyed");
+        break;
+      default: 
+        __TBB_ASSERT(false,"mutex: illegal state for destruction");
+        break;
+    }
+    state = DESTROYED;
+#else
+    int error_code = pthread_mutex_destroy(&impl); 
+    __TBB_ASSERT_EX(!error_code,"mutex: pthread_mutex_destroy failed");
+#endif /* _WIN32||_WIN64 */
+}
+
+} // namespace tbb
diff --git a/dep/tbb/src/tbb/pipeline.cpp b/dep/tbb/src/tbb/pipeline.cpp
new file mode 100644
index 000000000..822609be1
--- /dev/null
+++ b/dep/tbb/src/tbb/pipeline.cpp
@@ -0,0 +1,687 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "tbb/pipeline.h"
+#include "tbb/spin_mutex.h"
+#include "tbb/cache_aligned_allocator.h"
+#include "itt_notify.h"
+
+
+namespace tbb {
+
+namespace internal {
+
+//! This structure is used to store task information in a input buffer
+struct task_info {
+    void* my_object;
+    //! Invalid unless a task went through an ordered stage.
+    Token my_token;
+    //! False until my_token is set.
+    bool my_token_ready;
+    //! True if my_object is valid.
+    bool is_valid;
+    //! Set to initial state (no object, no token)
+    void reset() {
+        my_object = NULL;
+        my_token = 0;
+        my_token_ready = false;
+        is_valid = false;
+    }
+};
+//! A buffer of input items for a filter.
+/** Each item is a task_info, inserted into a position in the buffer corresponding to a Token. */
+class input_buffer {
+    friend class tbb::internal::pipeline_root_task;
+    friend class tbb::thread_bound_filter;
+
+    typedef  Token  size_type;
+
+    //! Array of deferred tasks that cannot yet start executing. 
+    task_info* array;
+
+    //! Size of array
+    /** Always 0 or a power of 2 */
+    size_type array_size;
+
+    //! Lowest token that can start executing.
+    /** All prior Token have already been seen. */
+    Token low_token;
+
+    //! Serializes updates.
+    spin_mutex array_mutex;
+
+    //! Resize "array".
+    /** Caller is responsible to acquiring a lock on "array_mutex". */
+    void grow( size_type minimum_size );
+
+    //! Initial size for "array"
+    /** Must be a power of 2 */
+    static const size_type initial_buffer_size = 4;
+
+    //! Used only for out of order buffer.
+    Token high_token;
+
+    //! True for ordered filter, false otherwise. 
+    bool is_ordered;
+
+    //! True for thread-bound filter, false otherwise. 
+    bool is_bound;
+public:
+    //! Construct empty buffer.
+    input_buffer( bool is_ordered_, bool is_bound_ ) : 
+            array(NULL), array_size(0),
+            low_token(0), high_token(0), 
+            is_ordered(is_ordered_), is_bound(is_bound_) {
+        grow(initial_buffer_size);
+        __TBB_ASSERT( array, NULL );
+    }
+
+    //! Destroy the buffer.
+    ~input_buffer() {
+        __TBB_ASSERT( array, NULL );
+        cache_aligned_allocator<task_info>().deallocate(array,array_size);
+        poison_pointer( array );
+    }
+
+    //! Put a token into the buffer.
+    /** If task information was placed into buffer, returns true;
+        otherwise returns false, informing the caller to create and spawn a task.
+    */
+    // Using template to avoid explicit dependency on stage_task
+    template<typename StageTask>
+    bool put_token( StageTask& putter ) {
+        {
+            spin_mutex::scoped_lock lock( array_mutex );
+            Token token;
+            if( is_ordered ) {
+                if( !putter.my_token_ready ) {
+                    putter.my_token = high_token++;
+                    putter.my_token_ready = true;
+                }
+                token = putter.my_token;
+            } else
+                token = high_token++;
+            __TBB_ASSERT( (tokendiff_t)(token-low_token)>=0, NULL );
+            if( token!=low_token || is_bound ) {
+                // Trying to put token that is beyond low_token.
+                // Need to wait until low_token catches up before dispatching.
+                if( token-low_token>=array_size ) 
+                    grow( token-low_token+1 );
+                ITT_NOTIFY( sync_releasing, this );
+                putter.put_task_info(array[token&array_size-1]);
+                return true;
+            }
+        }
+        return false;
+    }
+
+    //! Note that processing of a token is finished.
+    /** Fires up processing of the next token, if processing was deferred. */
+    // Using template to avoid explicit dependency on stage_task
+    template<typename StageTask>
+    void note_done( Token token, StageTask& spawner ) {
+        task_info wakee;
+        wakee.reset();
+        {
+            spin_mutex::scoped_lock lock( array_mutex );
+            if( !is_ordered || token==low_token ) {
+                // Wake the next task
+                task_info& item = array[++low_token & array_size-1];
+                ITT_NOTIFY( sync_acquired, this );
+                wakee = item;
+                item.is_valid = false;
+            }
+        }
+        if( wakee.is_valid )
+            spawner.spawn_stage_task(wakee);
+    }
+
+#if __TBB_EXCEPTIONS
+    //! The method destroys all data in filters to prevent memory leaks
+    void clear( filter* my_filter ) {
+        long t=low_token;
+        for( size_type i=0; i<array_size; ++i, ++t ){
+            task_info& temp = array[t&array_size-1];
+            if (temp.is_valid ) {
+                my_filter->finalize(temp.my_object);
+                temp.is_valid = false;
+            }
+        }
+    }
+#endif
+
+    bool return_item(task_info& info, bool advance) {
+        spin_mutex::scoped_lock lock( array_mutex );
+        task_info& item = array[low_token&array_size-1];
+        ITT_NOTIFY( sync_acquired, this );
+        if( item.is_valid ) {
+            info = item;
+            item.is_valid = false;
+            if (advance) low_token++;
+            return true;
+        }
+        return false;
+    }
+
+    void put_item( task_info& info ) {
+        info.is_valid = true;
+        spin_mutex::scoped_lock lock( array_mutex );
+        Token token;
+        if( is_ordered ) {
+            if( !info.my_token_ready ) {
+                info.my_token = high_token++;
+                info.my_token_ready = true;
+            }
+            token = info.my_token;
+        } else
+            token = high_token++;
+        __TBB_ASSERT( (tokendiff_t)(token-low_token)>=0, NULL );
+        if( token-low_token>=array_size ) 
+            grow( token-low_token+1 );
+        ITT_NOTIFY( sync_releasing, this );
+        array[token&array_size-1] = info;
+    }
+};
+
+void input_buffer::grow( size_type minimum_size ) {
+    size_type old_size = array_size;
+    size_type new_size = old_size ? 2*old_size : initial_buffer_size;
+    while( new_size<minimum_size ) 
+        new_size*=2;
+    task_info* new_array = cache_aligned_allocator<task_info>().allocate(new_size);
+    task_info* old_array = array;
+    for( size_type i=0; i<new_size; ++i )
+        new_array[i].is_valid = false;
+    long t=low_token;
+    for( size_type i=0; i<old_size; ++i, ++t )
+        new_array[t&new_size-1] = old_array[t&old_size-1];
+    array = new_array;
+    array_size = new_size;
+    if( old_array )
+        cache_aligned_allocator<task_info>().deallocate(old_array,old_size);
+}
+
+class stage_task: public task, public task_info {
+private:
+    friend class tbb::pipeline;
+    pipeline& my_pipeline;
+    filter* my_filter;  
+    //! True if this task has not yet read the input.
+    bool my_at_start;
+public:
+    //! Construct stage_task for first stage in a pipeline.
+    /** Such a stage has not read any input yet. */
+    stage_task( pipeline& pipeline ) :
+        my_pipeline(pipeline), 
+        my_filter(pipeline.filter_list),
+        my_at_start(true)
+    {
+        task_info::reset();
+    }
+    //! Construct stage_task for a subsequent stage in a pipeline.
+    stage_task( pipeline& pipeline, filter* filter_, const task_info& info ) :
+        task_info(info),
+        my_pipeline(pipeline), 
+        my_filter(filter_),
+        my_at_start(false)
+    {}
+    //! Roughly equivalent to the constructor of input stage task
+    void reset() {
+        task_info::reset();
+        my_filter = my_pipeline.filter_list;
+        my_at_start = true;
+    }
+    //! The virtual task execution method
+    /*override*/ task* execute();
+#if __TBB_EXCEPTIONS
+    ~stage_task()    
+    {
+        if (my_filter && my_object && (my_filter->my_filter_mode & filter::version_mask) >= __TBB_PIPELINE_VERSION(4)) {
+            __TBB_ASSERT(is_cancelled(), "Trying to finalize the task that wasn't cancelled");
+            my_filter->finalize(my_object);
+            my_object = NULL;
+        }
+    }
+#endif // __TBB_EXCEPTIONS
+    //! Creates and spawns stage_task from task_info
+    void spawn_stage_task(const task_info& info)
+    {
+        stage_task* clone = new (allocate_additional_child_of(*parent())) 
+                                stage_task( my_pipeline, my_filter, info );
+        spawn(*clone);
+    }
+    //! Puts current task information
+    void put_task_info(task_info &where_to_put ) {
+        where_to_put.my_object = my_object;
+        where_to_put.my_token = my_token;
+        where_to_put.my_token_ready = my_token_ready;
+        where_to_put.is_valid = true;
+    }
+};
+
+task* stage_task::execute() {
+    __TBB_ASSERT( !my_at_start || !my_object, NULL );
+    __TBB_ASSERT( !my_filter->is_bound(), NULL );
+    if( my_at_start ) {
+        if( my_filter->is_serial() ) {
+            my_object = (*my_filter)(my_object);
+            if( my_object ) {
+                if( my_filter->is_ordered() ) {
+                    my_token = my_pipeline.token_counter++; // ideally, with relaxed semantics
+                    my_token_ready = true;
+                } else if( (my_filter->my_filter_mode & my_filter->version_mask) >= __TBB_PIPELINE_VERSION(5) ) {
+                    if( my_pipeline.has_thread_bound_filters )
+                        my_pipeline.token_counter++; // ideally, with relaxed semantics
+                }
+                if( !my_filter->next_filter_in_pipeline ) {
+                    reset();
+                    goto process_another_stage;
+                } else {
+                    ITT_NOTIFY( sync_releasing, &my_pipeline.input_tokens );
+                    if( --my_pipeline.input_tokens>0 )
+                        spawn( *new( allocate_additional_child_of(*parent()) ) stage_task( my_pipeline ) );
+                }
+            } else {
+                my_pipeline.end_of_input = true; 
+                return NULL;
+            }
+        } else /*not is_serial*/ {
+            if( my_pipeline.end_of_input )
+                return NULL;
+            if( (my_filter->my_filter_mode & my_filter->version_mask) >= __TBB_PIPELINE_VERSION(5) ) {
+                if( my_pipeline.has_thread_bound_filters )
+                    my_pipeline.token_counter++;
+            }
+            ITT_NOTIFY( sync_releasing, &my_pipeline.input_tokens );
+            if( --my_pipeline.input_tokens>0 )
+                spawn( *new( allocate_additional_child_of(*parent()) ) stage_task( my_pipeline ) );
+            my_object = (*my_filter)(my_object);
+            if( !my_object ) {
+                my_pipeline.end_of_input = true; 
+                if( (my_filter->my_filter_mode & my_filter->version_mask) >= __TBB_PIPELINE_VERSION(5) ) {
+                    if( my_pipeline.has_thread_bound_filters )
+                        my_pipeline.token_counter--;
+                }
+                return NULL;
+            }
+        }
+        my_at_start = false;
+    } else {
+        my_object = (*my_filter)(my_object);
+        if( my_filter->is_serial() )
+            my_filter->my_input_buffer->note_done(my_token, *this);
+    }
+    my_filter = my_filter->next_filter_in_pipeline; 
+    if( my_filter ) {
+        // There is another filter to execute.
+        // Crank up priority a notch.
+        add_to_depth(1);
+        if( my_filter->is_serial() ) {
+            // The next filter must execute tokens in order
+            if( my_filter->my_input_buffer->put_token(*this) ){
+                // Can't proceed with the same item
+                if( my_filter->is_bound() ) {
+                    // Find the next non-thread-bound filter
+                    do {
+                        my_filter = my_filter->next_filter_in_pipeline;
+                    } while( my_filter && my_filter->is_bound() );
+                    // Check if there is an item ready to process
+                    if( my_filter && my_filter->my_input_buffer->return_item(*this, !my_filter->is_serial()) ) 
+                        goto process_another_stage;
+                } 
+                my_filter = NULL; // To prevent deleting my_object twice if exception occurs
+                return NULL;
+            }
+        }
+    } else {
+        // Reached end of the pipe.
+        if( ++my_pipeline.input_tokens>1 || my_pipeline.end_of_input || my_pipeline.filter_list->is_bound() )
+            return NULL; // No need to recycle for new input
+        ITT_NOTIFY( sync_acquired, &my_pipeline.input_tokens );
+        // Recycle as an input stage task.
+        reset();
+    }
+process_another_stage:
+    /* A semi-hackish way to reexecute the same task object immediately without spawning.
+       recycle_as_continuation marks the task for future execution,
+       and then 'this' pointer is returned to bypass spawning. */
+    recycle_as_continuation();
+    return this;
+}
+
+class pipeline_root_task: public task {
+    pipeline& my_pipeline;
+    bool do_segment_scanning;
+
+    /*override*/ task* execute() {
+        if( !my_pipeline.end_of_input )
+            if( !my_pipeline.filter_list->is_bound() )
+                if( my_pipeline.input_tokens > 0 ) {
+                    recycle_as_continuation();
+                    set_ref_count(1);
+                    return new( allocate_child() ) stage_task( my_pipeline );
+                }
+        if( do_segment_scanning ) {
+            filter* current_filter = my_pipeline.filter_list->next_segment;
+            /* first non-thread-bound filter that follows thread-bound one 
+            and may have valid items to process */
+            filter* first_suitable_filter = current_filter;
+            while( current_filter ) {
+                __TBB_ASSERT( !current_filter->is_bound(), "filter is thread-bound?" );
+                __TBB_ASSERT( current_filter->prev_filter_in_pipeline->is_bound(), "previous filter is not thread-bound?" );
+                if( !my_pipeline.end_of_input
+                    || (tokendiff_t)(my_pipeline.token_counter - current_filter->my_input_buffer->low_token) > 0 )
+                {
+                    task_info info;
+                    info.reset();
+                    if( current_filter->my_input_buffer->return_item(info, !current_filter->is_serial()) ) {
+                        set_ref_count(1);
+                        recycle_as_continuation();
+                        return new( allocate_child() ) stage_task( my_pipeline, current_filter, info);
+                    }
+                    current_filter = current_filter->next_segment;
+                    if( !current_filter ) {
+                        if( !my_pipeline.end_of_input ) {
+                            recycle_as_continuation();
+                            return this;
+                        }
+                        current_filter = first_suitable_filter;
+                        __TBB_Yield();
+                    }
+                } else { 
+                    /* The preceding pipeline segment is empty. 
+                    Fast-forward to the next post-TBF segment. */
+                    first_suitable_filter = first_suitable_filter->next_segment;
+                    current_filter = first_suitable_filter; 
+                }
+            } /* end of while */
+            return NULL;
+        } else { 
+            if( !my_pipeline.end_of_input ) {
+                recycle_as_continuation();
+                return this;
+            }
+            return NULL;
+        }
+    }
+public:
+    pipeline_root_task( pipeline& pipeline ): my_pipeline(pipeline), do_segment_scanning(false)
+    {
+        __TBB_ASSERT( my_pipeline.filter_list, NULL );
+        filter* first = my_pipeline.filter_list;
+        if( (first->my_filter_mode & first->version_mask) >= __TBB_PIPELINE_VERSION(5) ) {
+            // Scanning the pipeline for segments 
+            filter* head_of_previous_segment = first;
+            for(  filter* subfilter=first->next_filter_in_pipeline;
+                  subfilter!=NULL;
+                  subfilter=subfilter->next_filter_in_pipeline )
+            {
+                if( subfilter->prev_filter_in_pipeline->is_bound() && !subfilter->is_bound() ) {
+                    do_segment_scanning = true;
+                    head_of_previous_segment->next_segment = subfilter;
+                    head_of_previous_segment = subfilter;
+                }
+            }
+        }
+    }
+};
+
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+    // Workaround for overzealous compiler warnings
+    // Suppress compiler warning about constant conditional expression
+    #pragma warning (disable: 4127)
+#endif
+
+// The class destroys end_counter and clears all input buffers if pipeline was cancelled.
+class pipeline_cleaner: internal::no_copy {
+    pipeline& my_pipeline;  
+public:
+    pipeline_cleaner(pipeline& _pipeline) : 
+        my_pipeline(_pipeline)
+    {}
+    ~pipeline_cleaner(){
+#if __TBB_EXCEPTIONS
+        if (my_pipeline.end_counter->is_cancelled()) // Pipeline was cancelled
+            my_pipeline.clear_filters(); 
+#endif
+        my_pipeline.end_counter = NULL;            
+    }
+};
+
+} // namespace internal
+
+void pipeline::inject_token( task& ) {
+    __TBB_ASSERT(0,"illegal call to inject_token");
+}
+
+#if __TBB_EXCEPTIONS
+void pipeline::clear_filters() {
+    for( filter* f = filter_list; f; f = f->next_filter_in_pipeline ) {
+        if ((f->my_filter_mode & filter::version_mask) >= __TBB_PIPELINE_VERSION(4))
+            if( internal::input_buffer* b = f->my_input_buffer )
+                b->clear(f);
+    }
+}
+#endif
+
+pipeline::pipeline() : 
+    filter_list(NULL),
+    filter_end(NULL),
+    end_counter(NULL),
+    end_of_input(false),
+    has_thread_bound_filters(false)
+{
+    token_counter = 0;
+    input_tokens = 0;
+}
+
+pipeline::~pipeline() {
+    clear();
+}
+
+void pipeline::clear() {
+    filter* next;
+    for( filter* f = filter_list; f; f=next ) {
+        if( internal::input_buffer* b = f->my_input_buffer ) {
+            delete b; 
+            f->my_input_buffer = NULL;
+        }
+        next=f->next_filter_in_pipeline;
+        f->next_filter_in_pipeline = filter::not_in_pipeline();
+        if ( (f->my_filter_mode & filter::version_mask) >= __TBB_PIPELINE_VERSION(3) ) {
+            f->prev_filter_in_pipeline = filter::not_in_pipeline();
+            f->my_pipeline = NULL;
+        }
+        if ( (f->my_filter_mode & filter::version_mask) >= __TBB_PIPELINE_VERSION(5) )
+            f->next_segment = NULL;
+    }
+    filter_list = filter_end = NULL;
+}
+
+void pipeline::add_filter( filter& filter_ ) {
+#if TBB_USE_ASSERT
+    if ( (filter_.my_filter_mode & filter::version_mask) >= __TBB_PIPELINE_VERSION(3) ) 
+        __TBB_ASSERT( filter_.prev_filter_in_pipeline==filter::not_in_pipeline(), "filter already part of pipeline?" );
+    __TBB_ASSERT( filter_.next_filter_in_pipeline==filter::not_in_pipeline(), "filter already part of pipeline?" );
+    __TBB_ASSERT( !end_counter, "invocation of add_filter on running pipeline" );
+#endif    
+    if ( (filter_.my_filter_mode & filter::version_mask) >= __TBB_PIPELINE_VERSION(3) ) {
+        filter_.my_pipeline = this;
+        filter_.prev_filter_in_pipeline = filter_end;
+        if ( filter_list == NULL)
+            filter_list = &filter_;
+        else
+            filter_end->next_filter_in_pipeline = &filter_;
+        filter_.next_filter_in_pipeline = NULL;
+        filter_end = &filter_;
+    }
+    else
+    {
+        if( !filter_end )
+            filter_end = reinterpret_cast<filter*>(&filter_list);
+        
+        *reinterpret_cast<filter**>(filter_end) = &filter_;
+        filter_end = reinterpret_cast<filter*>(&filter_.next_filter_in_pipeline);
+        *reinterpret_cast<filter**>(filter_end) = NULL;
+    }
+    if( (filter_.my_filter_mode & filter_.version_mask) >= __TBB_PIPELINE_VERSION(5) ) {
+        if( filter_.is_serial() ) {
+            if( filter_.is_bound() )
+                has_thread_bound_filters = true;
+            filter_.my_input_buffer = new internal::input_buffer( filter_.is_ordered(), filter_.is_bound() );
+        }
+        else {
+            if( filter_.prev_filter_in_pipeline && filter_.prev_filter_in_pipeline->is_bound() )
+                filter_.my_input_buffer = new internal::input_buffer( false, false );
+        }
+    } else {
+        if( filter_.is_serial() ) {
+            filter_.my_input_buffer = new internal::input_buffer( filter_.is_ordered(), false );
+        }
+    }
+
+}
+
+void pipeline::remove_filter( filter& filter_ ) {
+    if (&filter_ == filter_list) 
+        filter_list = filter_.next_filter_in_pipeline;
+    else {
+        __TBB_ASSERT( filter_.prev_filter_in_pipeline, "filter list broken?" ); 
+        filter_.prev_filter_in_pipeline->next_filter_in_pipeline = filter_.next_filter_in_pipeline;
+    }
+    if (&filter_ == filter_end)
+        filter_end = filter_.prev_filter_in_pipeline;
+    else {
+        __TBB_ASSERT( filter_.next_filter_in_pipeline, "filter list broken?" ); 
+        filter_.next_filter_in_pipeline->prev_filter_in_pipeline = filter_.prev_filter_in_pipeline;
+    }
+    if( internal::input_buffer* b = filter_.my_input_buffer ) {
+        delete b; 
+        filter_.my_input_buffer = NULL;
+    }
+    filter_.next_filter_in_pipeline = filter_.prev_filter_in_pipeline = filter::not_in_pipeline();
+    if ( (filter_.my_filter_mode & filter::version_mask) >= __TBB_PIPELINE_VERSION(5) )
+        filter_.next_segment = NULL;
+    filter_.my_pipeline = NULL;
+}
+
+void pipeline::run( size_t max_number_of_live_tokens
+#if __TBB_EXCEPTIONS
+    , tbb::task_group_context& context
+#endif
+    ) {
+    __TBB_ASSERT( max_number_of_live_tokens>0, "pipeline::run must have at least one token" );
+    __TBB_ASSERT( !end_counter, "pipeline already running?" );
+    if( filter_list ) {
+        internal::pipeline_cleaner my_pipeline_cleaner(*this);
+        end_of_input = false;
+#if __TBB_EXCEPTIONS            
+        end_counter = new( task::allocate_root(context) ) internal::pipeline_root_task( *this );
+#else
+        end_counter = new( task::allocate_root() ) internal::pipeline_root_task( *this );
+#endif
+        input_tokens = internal::Token(max_number_of_live_tokens);
+        // Start execution of tasks
+        task::spawn_root_and_wait( *end_counter );
+    } 
+}
+
+#if __TBB_EXCEPTIONS
+void pipeline::run( size_t max_number_of_live_tokens ) {
+    tbb::task_group_context context;
+    run(max_number_of_live_tokens, context);
+}
+#endif // __TBB_EXCEPTIONS
+
+filter::~filter() {
+    if ( (my_filter_mode & version_mask) >= __TBB_PIPELINE_VERSION(3) ) {
+        if ( next_filter_in_pipeline != filter::not_in_pipeline() ) { 
+            __TBB_ASSERT( prev_filter_in_pipeline != filter::not_in_pipeline(), "probably filter list is broken" );
+            my_pipeline->remove_filter(*this);
+        } else 
+            __TBB_ASSERT( prev_filter_in_pipeline == filter::not_in_pipeline(), "probably filter list is broken" );
+    } else {
+        __TBB_ASSERT( next_filter_in_pipeline==filter::not_in_pipeline(), "cannot destroy filter that is part of pipeline" );
+    }
+}
+
+thread_bound_filter::result_type thread_bound_filter::process_item() {
+    return internal_process_item(true);
+}
+
+thread_bound_filter::result_type thread_bound_filter::try_process_item() {
+    return internal_process_item(false);
+}
+
+thread_bound_filter::result_type thread_bound_filter::internal_process_item(bool is_blocking) {
+    internal::task_info info;
+    info.reset();
+    
+    if( !prev_filter_in_pipeline ) {
+        if( my_pipeline->end_of_input )
+            return end_of_stream;
+        while( my_pipeline->input_tokens == 0 ) {
+            if( is_blocking )
+                __TBB_Yield();
+            else
+                return item_not_available;
+        }
+        info.my_object = (*this)(info.my_object);
+        if( info.my_object ) {
+            my_pipeline->input_tokens--;
+            if( is_ordered() ) {
+                info.my_token = my_pipeline->token_counter;
+                info.my_token_ready = true;
+            }
+            my_pipeline->token_counter++; // ideally, with relaxed semantics
+        } else {
+            my_pipeline->end_of_input = true; 
+            return end_of_stream; 
+        }
+    } else { /* this is not an input filter */
+        while( !my_input_buffer->return_item(info, /*advance=*/true) ) {
+            if( my_pipeline->end_of_input && my_input_buffer->low_token == my_pipeline->token_counter )
+                return end_of_stream;
+            if( is_blocking )
+                __TBB_Yield();
+            else
+                return item_not_available;
+        }
+        info.my_object = (*this)(info.my_object);
+    }
+    if( next_filter_in_pipeline ) {
+        next_filter_in_pipeline->my_input_buffer->put_item(info);
+    } else {
+        my_pipeline->input_tokens++;
+    }
+
+    return success;
+}
+
+} // tbb
+
diff --git a/dep/tbb/src/tbb/private_server.cpp b/dep/tbb/src/tbb/private_server.cpp
new file mode 100644
index 000000000..cda558e81
--- /dev/null
+++ b/dep/tbb/src/tbb/private_server.cpp
@@ -0,0 +1,346 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "../rml/include/rml_tbb.h"
+#include "../rml/server/thread_monitor.h"
+#include "tbb/atomic.h"
+#include "tbb/cache_aligned_allocator.h"
+#include "tbb/spin_mutex.h"
+#include "tbb/tbb_thread.h"
+
+using rml::internal::thread_monitor;
+
+namespace tbb {
+namespace internal {
+namespace rml {
+
+class private_server;
+
+class private_worker: no_copy {
+    //! State in finite-state machine that controls the worker.
+    /** State diagram:
+        open --> normal --> quit
+          |
+          V
+        plugged
+      */ 
+    enum state_t {
+        //! *this is initialized
+        st_init,
+        //! Associated thread is doing normal life sequence.
+        st_normal,
+        //! Associated thread is end normal life sequence.
+        st_quit,
+        //! Associated thread should skip normal life sequence, because private_server is shutting down.
+        st_plugged
+    };
+    atomic<state_t> my_state;
+    
+    //! Associated server
+    private_server& my_server; 
+
+    //! Associated client
+    tbb_client& my_client; 
+
+    //! index used for avoiding the 64K aliasing problem
+    const size_t my_index;
+
+    //! Monitor for sleeping when there is no work to do.
+    /** The invariant that holds for sleeping workers is:
+        "my_slack<=0 && my_state==st_normal && I am on server's list of asleep threads" */
+    thread_monitor my_thread_monitor;
+
+    //! Link for list of sleeping workers
+    private_worker* my_next;
+
+    friend class private_server;
+
+    //! Actions executed by the associated thread 
+    void run();
+
+    //! Called by a thread (usually not the associated thread) to commence termination.
+    void start_shutdown();
+
+    static __RML_DECL_THREAD_ROUTINE thread_routine( void* arg );
+
+protected:
+    private_worker( private_server& server, tbb_client& client, const size_t i ) : 
+        my_server(server),
+        my_client(client),
+        my_index(i)
+    {
+        my_state = st_init;
+    }
+
+};
+
+static const size_t cache_line_size = tbb::internal::NFS_MaxLineSize;
+
+
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+    // Suppress overzealous compiler warnings about uninstantiatble class
+    #pragma warning(push)
+    #pragma warning(disable:4510 4610)
+#endif
+class padded_private_worker: public private_worker {
+    char pad[cache_line_size - sizeof(private_worker)%cache_line_size];
+public:
+    padded_private_worker( private_server& server, tbb_client& client, const size_t i ) : private_worker(server,client,i) {}
+};
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+    #pragma warning(pop)
+#endif
+
+class private_server: public tbb_server, no_copy {
+    tbb_client& my_client;
+    const tbb_client::size_type my_n_thread;
+
+    //! Number of jobs that could use their associated thread minus number of active threads.
+    /** If negative, indicates oversubscription.
+        If positive, indicates that more threads should run. 
+        Can be lowered asynchronously, but must be raised only while holding my_asleep_list_mutex,
+        because raising it impacts the invariant for sleeping threads. */
+    atomic<int> my_slack;
+
+    //! Counter used to determine when to delete this.
+    atomic<int> my_ref_count;
+
+    padded_private_worker* my_thread_array;
+
+    //! List of workers that are asleep or committed to sleeping until notified by another thread.
+    tbb::atomic<private_worker*> my_asleep_list_root;
+
+    //! Protects my_asleep_list_root
+    tbb::spin_mutex my_asleep_list_mutex;
+
+#if TBB_USE_ASSERT
+    atomic<int> my_net_slack_requests;
+#endif /* TBB_USE_ASSERT */
+
+    //! Used for double-check idiom
+    bool has_sleepers() const {
+        return my_asleep_list_root!=NULL;
+    }
+
+    //! Try to add t to list of sleeping workers
+    bool try_insert_in_asleep_list( private_worker& t );
+
+    //! Equivalent of adding additional_slack to my_slack and waking up to 2 threads if my_slack permits.
+    void wake_some( int additional_slack );
+
+    virtual ~private_server();
+    
+    void remove_server_ref() {
+        if( --my_ref_count==0 ) {
+            my_client.acknowledge_close_connection();
+            this->~private_server();
+            tbb::cache_aligned_allocator<private_server>().deallocate( this, 1 );
+        } 
+    }
+
+    friend class private_worker;
+public:
+    private_server( tbb_client& client );
+
+    /*override*/ version_type version() const {
+        return 0;
+    } 
+
+    /*override*/ void request_close_connection() {
+        for( size_t i=0; i<my_n_thread; ++i ) 
+            my_thread_array[i].start_shutdown();
+        remove_server_ref();
+    }
+
+    /*override*/ void yield() {__TBB_Yield();}
+
+    /*override*/ void independent_thread_number_changed( int ) {__TBB_ASSERT(false,NULL);}
+
+    /*override*/ unsigned default_concurrency() const {return tbb::tbb_thread::hardware_concurrency()-1;}
+
+    /*override*/ void adjust_job_count_estimate( int delta );
+};
+
+//------------------------------------------------------------------------
+// Methods of private_worker
+//------------------------------------------------------------------------
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+    // Suppress overzealous compiler warnings about an initialized variable 'sink_for_alloca' not referenced
+    #pragma warning(push)
+    #pragma warning(disable:4189)
+#endif
+__RML_DECL_THREAD_ROUTINE private_worker::thread_routine( void* arg ) {
+    private_worker* self = static_cast<private_worker*>(arg);
+    AVOID_64K_ALIASING( self->my_index );
+    self->run();
+    return NULL;
+}
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+    #pragma warning(pop)
+#endif
+
+void private_worker::start_shutdown() {
+    state_t s; 
+    // Transition from st_init or st_normal to st_plugged or st_quit
+    do {
+        s = my_state;
+        __TBB_ASSERT( s==st_init||s==st_normal, NULL );
+    } while( my_state.compare_and_swap( s==st_init? st_plugged : st_quit, s )!=s );
+    if( s==st_normal ) {
+        // May have invalidated invariant for sleeping, so wake up the thread.
+        // Note that the notify() here occurs without maintaining invariants for my_slack.
+        // It does not matter, because my_state==st_quit overrides checking of my_slack.
+        my_thread_monitor.notify();
+    } 
+}
+
+void private_worker::run() {
+    if( my_state.compare_and_swap( st_normal, st_init )==st_init ) {
+        ::rml::job& j = *my_client.create_one_job();
+        --my_server.my_slack;
+        while( my_state==st_normal ) {
+            if( my_server.my_slack>=0 ) {
+                my_client.process(j);
+            } else {
+                thread_monitor::cookie c;
+                // Prepare to wait
+                my_thread_monitor.prepare_wait(c);
+                // Check/set the invariant for sleeping
+                if( my_state==st_normal && my_server.try_insert_in_asleep_list(*this) ) {
+                    my_thread_monitor.commit_wait(c);
+                    // Propagate chain reaction
+                    if( my_server.has_sleepers() )
+                        my_server.wake_some(0);
+                } else {
+                    // Invariant broken
+                    my_thread_monitor.cancel_wait();
+                }
+            }
+        }
+        my_client.cleanup(j);
+        ++my_server.my_slack;
+    }
+    my_server.remove_server_ref();
+}
+
+//------------------------------------------------------------------------
+// Methods of private_server
+//------------------------------------------------------------------------
+private_server::private_server( tbb_client& client ) : 
+    my_client(client), 
+    my_n_thread(client.max_job_count()),
+    my_thread_array(NULL) 
+{
+    my_ref_count = my_n_thread+1;
+    my_slack = 0;
+#if TBB_USE_ASSERT
+    my_net_slack_requests = 0;
+#endif /* TBB_USE_ASSERT */
+    my_asleep_list_root = NULL;
+    size_t stack_size = client.min_stack_size();
+    my_thread_array = tbb::cache_aligned_allocator<padded_private_worker>().allocate( my_n_thread );
+    memset( my_thread_array, 0, sizeof(private_worker)*my_n_thread );
+    // FIXME - use recursive chain reaction to launch the threads.
+    for( size_t i=0; i<my_n_thread; ++i ) {
+        private_worker* t = new( &my_thread_array[i] ) padded_private_worker( *this, client, i ); 
+        thread_monitor::launch( private_worker::thread_routine, t, stack_size );
+    } 
+}
+
+private_server::~private_server() {
+    __TBB_ASSERT( my_net_slack_requests==0, NULL );
+    for( size_t i=my_n_thread; i--; ) 
+        my_thread_array[i].~padded_private_worker();
+    tbb::cache_aligned_allocator<padded_private_worker>().deallocate( my_thread_array, my_n_thread );
+    tbb::internal::poison_pointer( my_thread_array );
+}
+
+inline bool private_server::try_insert_in_asleep_list( private_worker& t ) {
+    tbb::spin_mutex::scoped_lock lock(my_asleep_list_mutex);
+    // Contribute to slack under lock so that if another takes that unit of slack,
+    // it sees us sleeping on the list and wakes us up.
+    int k = ++my_slack;
+    if( k<=0 ) {
+        t.my_next = my_asleep_list_root;
+        my_asleep_list_root = &t;
+        return true;
+    } else {
+        --my_slack;
+        return false;
+    }
+}
+
+void private_server::wake_some( int additional_slack ) {
+    __TBB_ASSERT( additional_slack>=0, NULL );
+    private_worker* wakee[2];
+    private_worker**w = wakee;
+    {
+        tbb::spin_mutex::scoped_lock lock(my_asleep_list_mutex);
+        while( my_asleep_list_root && w<wakee+2 ) {
+            if( additional_slack>0 ) {
+                --additional_slack;
+            } else {
+                // Try to claim unit of slack
+                int old;
+                do {
+                    old = my_slack;
+                    if( old<=0 ) goto done;
+                } while( my_slack.compare_and_swap(old-1,old)!=old );
+            }
+            // Pop sleeping worker to combine with claimed unit of slack
+            my_asleep_list_root = (*w++ = my_asleep_list_root)->my_next;
+        }
+        if( additional_slack ) {
+            // Contribute our unused slack to my_slack.
+            my_slack += additional_slack;
+        }
+    }
+done:
+    while( w>wakee ) 
+        (*--w)->my_thread_monitor.notify();
+}
+
+void private_server::adjust_job_count_estimate( int delta ) {
+#if TBB_USE_ASSERT
+    my_net_slack_requests+=delta;
+#endif /* TBB_USE_ASSERT */
+    if( delta<0 ) {
+        my_slack+=delta;
+    } else if( delta>0 ) {
+        wake_some( delta );
+    }
+}
+
+//! Factory method called from task.cpp to create a private_server.
+tbb_server* make_private_server( tbb_client& client ) {
+    return new( tbb::cache_aligned_allocator<private_server>().allocate(1) ) private_server(client);
+}
+        
+} // namespace rml
+} // namespace internal
+} // namespace tbb
diff --git a/dep/tbb/src/tbb/queuing_mutex.cpp b/dep/tbb/src/tbb/queuing_mutex.cpp
new file mode 100644
index 000000000..db2b986f0
--- /dev/null
+++ b/dep/tbb/src/tbb/queuing_mutex.cpp
@@ -0,0 +1,117 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "tbb/tbb_machine.h"
+#include "tbb/tbb_stddef.h"
+#include "tbb_misc.h"
+#include "tbb/queuing_mutex.h"
+#include "itt_notify.h"
+
+
+namespace tbb {
+
+using namespace internal;
+
+//! A method to acquire queuing_mutex lock
+void queuing_mutex::scoped_lock::acquire( queuing_mutex& m )
+{
+    __TBB_ASSERT( !this->mutex, "scoped_lock is already holding a mutex");
+
+    // Must set all fields before the fetch_and_store, because once the
+    // fetch_and_store executes, *this becomes accessible to other threads.
+    mutex = &m;
+    next  = NULL;
+    going = 0;
+
+    // The fetch_and_store must have release semantics, because we are
+    // "sending" the fields initialized above to other processors.
+    scoped_lock* pred = m.q_tail.fetch_and_store<tbb::release>(this);
+    if( pred ) {
+        ITT_NOTIFY(sync_prepare, mutex);
+        __TBB_ASSERT( !pred->next, "the predecessor has another successor!");
+        pred->next = this;
+        spin_wait_while_eq( going, 0ul );
+    }
+    ITT_NOTIFY(sync_acquired, mutex);
+
+    // Force acquire so that user's critical section receives correct values
+    // from processor that was previously in the user's critical section.
+    __TBB_load_with_acquire(going);
+}
+
+//! A method to acquire queuing_mutex if it is free
+bool queuing_mutex::scoped_lock::try_acquire( queuing_mutex& m )
+{
+    __TBB_ASSERT( !this->mutex, "scoped_lock is already holding a mutex");
+
+    // Must set all fields before the fetch_and_store, because once the
+    // fetch_and_store executes, *this becomes accessible to other threads.
+    next  = NULL;
+    going = 0;
+
+    if( m.q_tail ) return false;
+    // The CAS must have release semantics, because we are
+    // "sending" the fields initialized above to other processors.
+    scoped_lock* pred = m.q_tail.compare_and_swap<tbb::release>(this, NULL);
+
+    // Force acquire so that user's critical section receives correct values
+    // from processor that was previously in the user's critical section.
+    // try_acquire should always have acquire semantic, even if failed.
+    __TBB_load_with_acquire(going);
+    if( !pred ) {
+        mutex = &m;
+        ITT_NOTIFY(sync_acquired, mutex);
+        return true;
+    } else return false;
+}
+
+//! A method to release queuing_mutex lock
+void queuing_mutex::scoped_lock::release( )
+{
+    __TBB_ASSERT(this->mutex!=NULL, "no lock acquired");
+
+    ITT_NOTIFY(sync_releasing, mutex);
+    if( !next ) {
+        if( this == mutex->q_tail.compare_and_swap<tbb::release>(NULL, this) ) {
+            // this was the only item in the queue, and the queue is now empty.
+            goto done;
+        }
+        // Someone in the queue
+        spin_wait_while_eq( next, (scoped_lock*)0 );
+    }
+    __TBB_ASSERT(next,NULL);
+    __TBB_store_with_release(next->going, 1);
+done:
+    initialize();
+}
+
+void queuing_mutex::internal_construct() {
+    ITT_SYNC_CREATE(this, _T("tbb::queuing_mutex"), _T(""));
+}
+
+} // namespace tbb
diff --git a/dep/tbb/src/tbb/queuing_rw_mutex.cpp b/dep/tbb/src/tbb/queuing_rw_mutex.cpp
new file mode 100644
index 000000000..4c7034737
--- /dev/null
+++ b/dep/tbb/src/tbb/queuing_rw_mutex.cpp
@@ -0,0 +1,505 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+/** Before making any changes in the implementation, please emulate algorithmic changes
+    with SPIN tool using <TBB directory>/tools/spin_models/ReaderWriterMutex.pml.
+    There could be some code looking as "can be restructured" but its structure does matter! */
+
+#include "tbb/tbb_machine.h"
+#include "tbb/tbb_stddef.h"
+#include "tbb/tbb_machine.h"
+#include "tbb/queuing_rw_mutex.h"
+#include "itt_notify.h"
+
+
+namespace tbb {
+
+using namespace internal;
+
+//! Flag bits in a state_t that specify information about a locking request.
+enum state_t_flags {
+    STATE_NONE = 0,
+    STATE_WRITER = 1,
+    STATE_READER = 1<<1,
+    STATE_READER_UNBLOCKNEXT = 1<<2,
+    STATE_COMBINED_WAITINGREADER = STATE_READER | STATE_READER_UNBLOCKNEXT,
+    STATE_ACTIVEREADER = 1<<3,
+    STATE_COMBINED_READER = STATE_COMBINED_WAITINGREADER | STATE_ACTIVEREADER,
+    STATE_UPGRADE_REQUESTED = 1<<4,
+    STATE_UPGRADE_WAITING = 1<<5,
+    STATE_UPGRADE_LOSER = 1<<6,
+    STATE_COMBINED_UPGRADING = STATE_UPGRADE_WAITING | STATE_UPGRADE_LOSER
+};
+
+const unsigned char RELEASED = 0;
+const unsigned char ACQUIRED = 1;
+ 
+template<typename T>
+inline atomic<T>& as_atomic( T& t ) {
+    return *(atomic<T>*)&t;
+}
+
+inline bool queuing_rw_mutex::scoped_lock::try_acquire_internal_lock()
+{
+    return as_atomic(internal_lock).compare_and_swap<tbb::acquire>(ACQUIRED,RELEASED) == RELEASED;
+}
+
+inline void queuing_rw_mutex::scoped_lock::acquire_internal_lock()
+{
+    // Usually, we would use the test-test-and-set idiom here, with exponential backoff.
+    // But so far, experiments indicate there is no value in doing so here.
+    while( !try_acquire_internal_lock() ) {
+        __TBB_Pause(1);
+    }
+}
+
+inline void queuing_rw_mutex::scoped_lock::release_internal_lock()
+{
+    __TBB_store_with_release(internal_lock,RELEASED);
+}
+
+inline void queuing_rw_mutex::scoped_lock::wait_for_release_of_internal_lock()
+{
+    spin_wait_until_eq(internal_lock, RELEASED);
+}
+
+inline void queuing_rw_mutex::scoped_lock::unblock_or_wait_on_internal_lock( uintptr_t flag ) {
+    if( flag )
+        wait_for_release_of_internal_lock();
+    else
+        release_internal_lock();
+}
+
+#if defined(_MSC_VER) && !defined(__INTEL_COMPILER)
+    // Workaround for overzealous compiler warnings
+    #pragma warning (push)
+    #pragma warning (disable: 4311 4312)
+#endif
+
+//! A view of a T* with additional functionality for twiddling low-order bits.
+template<typename T>
+class tricky_atomic_pointer: no_copy {
+public:
+    typedef typename atomic_rep<sizeof(T*)>::word word;
+
+    template<memory_semantics M>
+    static T* fetch_and_add( T* volatile * location, word addend ) {
+        return reinterpret_cast<T*>( atomic_traits<sizeof(T*),M>::fetch_and_add(location, addend) );
+    }
+    template<memory_semantics M>
+    static T* fetch_and_store( T* volatile * location, T* value ) {
+        return reinterpret_cast<T*>( atomic_traits<sizeof(T*),M>::fetch_and_store(location, reinterpret_cast<word>(value)) );
+    }
+    template<memory_semantics M>
+    static T* compare_and_swap( T* volatile * location, T* value, T* comparand ) {
+        return reinterpret_cast<T*>(
+                 atomic_traits<sizeof(T*),M>::compare_and_swap(location, reinterpret_cast<word>(value),
+                                                              reinterpret_cast<word>(comparand))
+               );
+    }
+
+    T* & ref;
+    tricky_atomic_pointer( T*& original ) : ref(original) {};
+    tricky_atomic_pointer( T* volatile & original ) : ref(original) {};
+    T* operator&( word operand2 ) const {
+        return reinterpret_cast<T*>( reinterpret_cast<word>(ref) & operand2 );
+    }
+    T* operator|( word operand2 ) const {
+        return reinterpret_cast<T*>( reinterpret_cast<word>(ref) | operand2 );
+    }
+};
+
+typedef tricky_atomic_pointer<queuing_rw_mutex::scoped_lock> tricky_pointer;
+
+#if defined(_MSC_VER) && !defined(__INTEL_COMPILER)
+    // Workaround for overzealous compiler warnings
+    #pragma warning (pop)
+#endif
+
+//! Mask for low order bit of a pointer.
+static const tricky_pointer::word FLAG = 0x1;
+
+inline
+uintptr get_flag( queuing_rw_mutex::scoped_lock* ptr ) { 
+    return uintptr(tricky_pointer(ptr)&FLAG);
+}
+
+//------------------------------------------------------------------------
+// Methods of queuing_rw_mutex::scoped_lock
+//------------------------------------------------------------------------
+
+void queuing_rw_mutex::scoped_lock::acquire( queuing_rw_mutex& m, bool write )
+{
+    __TBB_ASSERT( !this->mutex, "scoped_lock is already holding a mutex");
+
+    // Must set all fields before the fetch_and_store, because once the
+    // fetch_and_store executes, *this becomes accessible to other threads.
+    mutex = &m;
+    prev  = NULL;
+    next  = NULL;
+    going = 0;
+    state = state_t(write ? STATE_WRITER : STATE_READER);
+    internal_lock = RELEASED;
+
+    queuing_rw_mutex::scoped_lock* pred = m.q_tail.fetch_and_store<tbb::release>(this);
+
+    if( write ) {       // Acquiring for write
+
+        if( pred ) {
+            ITT_NOTIFY(sync_prepare, mutex);
+            pred = tricky_pointer(pred) & ~FLAG;
+            __TBB_ASSERT( !( tricky_pointer(pred) & FLAG ), "use of corrupted pointer!" );
+            __TBB_ASSERT( !pred->next, "the predecessor has another successor!");
+            // ensure release semantics on IPF
+           __TBB_store_with_release(pred->next,this);
+            spin_wait_until_eq(going, 1);
+        }
+
+    } else {            // Acquiring for read
+#if DO_ITT_NOTIFY
+        bool sync_prepare_done = false;
+#endif
+        if( pred ) {
+            unsigned short pred_state;
+            __TBB_ASSERT( !this->prev, "the predecessor is already set" );
+            if( tricky_pointer(pred)&FLAG ) {
+                /* this is only possible if pred is an upgrading reader and it signals us to wait */
+                pred_state = STATE_UPGRADE_WAITING;
+                pred = tricky_pointer(pred) & ~FLAG;
+            } else {
+                // Load pred->state now, because once pred->next becomes
+                // non-NULL, we must assume that *pred might be destroyed.
+                pred_state = pred->state.compare_and_swap<tbb::acquire>(STATE_READER_UNBLOCKNEXT, STATE_READER);
+            }
+            this->prev = pred;
+            __TBB_ASSERT( !( tricky_pointer(pred) & FLAG ), "use of corrupted pointer!" );
+            __TBB_ASSERT( !pred->next, "the predecessor has another successor!");
+            // ensure release semantics on IPF
+           __TBB_store_with_release(pred->next,this);
+            if( pred_state != STATE_ACTIVEREADER ) {
+#if DO_ITT_NOTIFY
+                sync_prepare_done = true;
+                ITT_NOTIFY(sync_prepare, mutex);
+#endif
+                spin_wait_until_eq(going, 1);
+            }
+        }
+        unsigned short old_state = state.compare_and_swap<tbb::acquire>(STATE_ACTIVEREADER, STATE_READER);
+        if( old_state!=STATE_READER ) {
+#if DO_ITT_NOTIFY
+            if( !sync_prepare_done )
+                ITT_NOTIFY(sync_prepare, mutex);
+#endif
+            // Failed to become active reader -> need to unblock the next waiting reader first
+            __TBB_ASSERT( state==STATE_READER_UNBLOCKNEXT, "unexpected state" );
+            spin_wait_while_eq(next, (scoped_lock*)NULL);
+            /* state should be changed before unblocking the next otherwise it might finish
+               and another thread can get our old state and left blocked */
+            state = STATE_ACTIVEREADER;
+            // ensure release semantics on IPF
+           __TBB_store_with_release(next->going,1);
+        }
+    }
+
+    ITT_NOTIFY(sync_acquired, mutex);
+
+    // Force acquire so that user's critical section receives correct values
+    // from processor that was previously in the user's critical section.
+    __TBB_load_with_acquire(going);
+}
+
+bool queuing_rw_mutex::scoped_lock::try_acquire( queuing_rw_mutex& m, bool write )
+{
+    __TBB_ASSERT( !this->mutex, "scoped_lock is already holding a mutex");
+
+    // Must set all fields before the fetch_and_store, because once the
+    // fetch_and_store executes, *this becomes accessible to other threads.
+    prev  = NULL;
+    next  = NULL;
+    going = 0;
+    state = state_t(write ? STATE_WRITER : STATE_ACTIVEREADER);
+    internal_lock = RELEASED;
+
+    if( m.q_tail ) return false;
+    // The CAS must have release semantics, because we are
+    // "sending" the fields initialized above to other processors.
+    queuing_rw_mutex::scoped_lock* pred = m.q_tail.compare_and_swap<tbb::release>(this, NULL);
+
+    // Force acquire so that user's critical section receives correct values
+    // from processor that was previously in the user's critical section.
+    // try_acquire should always have acquire semantic, even if failed.
+    __TBB_load_with_acquire(going);
+
+    if( !pred ) {
+        mutex = &m;
+        ITT_NOTIFY(sync_acquired, mutex);
+        return true;
+    } else return false;
+
+}
+
+void queuing_rw_mutex::scoped_lock::release( )
+{
+    __TBB_ASSERT(this->mutex!=NULL, "no lock acquired");
+
+    ITT_NOTIFY(sync_releasing, mutex);
+
+    if( state == STATE_WRITER ) { // Acquired for write
+
+        // The logic below is the same as "writerUnlock", but restructured to remove "return" in the middle of routine.
+        // In the statement below, acquire semantics of reading 'next' is required
+        // so that following operations with fields of 'next' are safe.
+        scoped_lock* n = __TBB_load_with_acquire(next);
+        if( !n ) {
+            if( this == mutex->q_tail.compare_and_swap<tbb::release>(NULL, this) ) {
+                // this was the only item in the queue, and the queue is now empty.
+                goto done;
+            }
+            spin_wait_while_eq( next, (scoped_lock*)NULL );
+            n = next;
+        }
+        n->going = 2; // protect next queue node from being destroyed too early
+        if( n->state==STATE_UPGRADE_WAITING ) {
+            // the next waiting for upgrade means this writer was upgraded before.
+            acquire_internal_lock();
+            queuing_rw_mutex::scoped_lock* tmp = tricky_pointer::fetch_and_store<tbb::release>(&(n->prev), NULL);
+            n->state = STATE_UPGRADE_LOSER;
+            __TBB_store_with_release(n->going,1);
+            unblock_or_wait_on_internal_lock(get_flag(tmp));
+        } else {
+            __TBB_ASSERT( state & (STATE_COMBINED_WAITINGREADER | STATE_WRITER), "unexpected state" );
+            __TBB_ASSERT( !( tricky_pointer(n->prev) & FLAG ), "use of corrupted pointer!" );
+            n->prev = NULL;
+            // ensure release semantics on IPF
+            __TBB_store_with_release(n->going,1);
+        }
+
+    } else { // Acquired for read
+
+        queuing_rw_mutex::scoped_lock *tmp = NULL;
+retry:
+        // Addition to the original paper: Mark this->prev as in use
+        queuing_rw_mutex::scoped_lock *pred = tricky_pointer::fetch_and_add<tbb::acquire>(&(this->prev), FLAG);
+
+        if( pred ) {
+            if( !(pred->try_acquire_internal_lock()) )
+            {
+                // Failed to acquire the lock on pred. The predecessor either unlinks or upgrades.
+                // In the second case, it could or could not know my "in use" flag - need to check
+                tmp = tricky_pointer::compare_and_swap<tbb::release>(&(this->prev), pred, tricky_pointer(pred)|FLAG );
+                if( !(tricky_pointer(tmp)&FLAG) ) {
+                    // Wait for the predecessor to change this->prev (e.g. during unlink)
+                    spin_wait_while_eq( this->prev, tricky_pointer(pred)|FLAG );
+                    // Now owner of pred is waiting for _us_ to release its lock
+                    pred->release_internal_lock();
+                }
+                else ; // The "in use" flag is back -> the predecessor didn't get it and will release itself; nothing to do
+
+                tmp = NULL;
+                goto retry;
+            }
+            __TBB_ASSERT(pred && pred->internal_lock==ACQUIRED, "predecessor's lock is not acquired");
+            this->prev = pred;
+            acquire_internal_lock();
+
+            __TBB_store_with_release(pred->next,reinterpret_cast<scoped_lock *>(NULL));
+
+            if( !next && this != mutex->q_tail.compare_and_swap<tbb::release>(pred, this) ) {
+                spin_wait_while_eq( next, (void*)NULL );
+            }
+            __TBB_ASSERT( !get_flag(next), "use of corrupted pointer" );
+
+            // ensure acquire semantics of reading 'next'
+            if( __TBB_load_with_acquire(next) ) { // I->next != nil
+                // Equivalent to I->next->prev = I->prev but protected against (prev[n]&FLAG)!=0
+                tmp = tricky_pointer::fetch_and_store<tbb::release>(&(next->prev), pred);
+                // I->prev->next = I->next;
+                __TBB_ASSERT(this->prev==pred, NULL);
+                __TBB_store_with_release(pred->next,next);
+            }
+            // Safe to release in the order opposite to acquiring which makes the code simplier
+            pred->release_internal_lock();
+
+        } else { // No predecessor when we looked
+            acquire_internal_lock();  // "exclusiveLock(&I->EL)"
+            // ensure acquire semantics of reading 'next'
+            scoped_lock* n = __TBB_load_with_acquire(next);
+            if( !n ) {
+                if( this != mutex->q_tail.compare_and_swap<tbb::release>(NULL, this) ) {
+                    spin_wait_while_eq( next, (scoped_lock*)NULL );
+                    n = next;
+                } else {
+                    goto unlock_self;
+                }
+            }
+            n->going = 2; // protect next queue node from being destroyed too early
+            tmp = tricky_pointer::fetch_and_store<tbb::release>(&(n->prev), NULL);
+            // ensure release semantics on IPF
+            __TBB_store_with_release(n->going,1);
+        }
+unlock_self:
+        unblock_or_wait_on_internal_lock(get_flag(tmp));
+    }
+done:
+    spin_wait_while_eq( going, 2 );
+
+    initialize();
+}
+
+bool queuing_rw_mutex::scoped_lock::downgrade_to_reader()
+{
+    __TBB_ASSERT( state==STATE_WRITER, "no sense to downgrade a reader" );
+
+    ITT_NOTIFY(sync_releasing, mutex);
+
+    // ensure acquire semantics of reading 'next'
+    if( ! __TBB_load_with_acquire(next) ) {
+        state = STATE_READER;
+        if( this==mutex->q_tail ) {
+            unsigned short old_state = state.compare_and_swap<tbb::release>(STATE_ACTIVEREADER, STATE_READER);
+            if( old_state==STATE_READER ) {
+                goto downgrade_done;
+            }
+        }
+        /* wait for the next to register */
+        spin_wait_while_eq( next, (void*)NULL );
+    }
+    __TBB_ASSERT( next, "still no successor at this point!" );
+    if( next->state & STATE_COMBINED_WAITINGREADER )
+        __TBB_store_with_release(next->going,1);
+    else if( next->state==STATE_UPGRADE_WAITING )
+        // the next waiting for upgrade means this writer was upgraded before.
+        next->state = STATE_UPGRADE_LOSER;
+    state = STATE_ACTIVEREADER;
+
+downgrade_done:
+    return true;
+}
+
+bool queuing_rw_mutex::scoped_lock::upgrade_to_writer()
+{
+    __TBB_ASSERT( state==STATE_ACTIVEREADER, "only active reader can be upgraded" );
+
+    queuing_rw_mutex::scoped_lock * tmp;
+    queuing_rw_mutex::scoped_lock * me = this;
+
+    ITT_NOTIFY(sync_releasing, mutex);
+    state = STATE_UPGRADE_REQUESTED;
+requested:
+    __TBB_ASSERT( !( tricky_pointer(next) & FLAG ), "use of corrupted pointer!" );
+    acquire_internal_lock();
+    if( this != mutex->q_tail.compare_and_swap<tbb::release>(tricky_pointer(me)|FLAG, this) ) {
+        spin_wait_while_eq( next, (void*)NULL );
+        queuing_rw_mutex::scoped_lock * n;
+        n = tricky_pointer::fetch_and_add<tbb::acquire>(&(this->next), FLAG);
+        unsigned short n_state = n->state;
+        /* the next reader can be blocked by our state. the best thing to do is to unblock it */
+        if( n_state & STATE_COMBINED_WAITINGREADER )
+            __TBB_store_with_release(n->going,1);
+        tmp = tricky_pointer::fetch_and_store<tbb::release>(&(n->prev), this);
+        unblock_or_wait_on_internal_lock(get_flag(tmp));
+        if( n_state & (STATE_COMBINED_READER | STATE_UPGRADE_REQUESTED) ) {
+            // save n|FLAG for simplicity of following comparisons
+            tmp = tricky_pointer(n)|FLAG;
+            atomic_backoff backoff;
+            while(next==tmp) {
+                if( state & STATE_COMBINED_UPGRADING ) {
+                    if( __TBB_load_with_acquire(next)==tmp )
+                        next = n;
+                    goto waiting;
+                }
+                backoff.pause();
+            }
+            __TBB_ASSERT(next!=(tricky_pointer(n)|FLAG), NULL);
+            goto requested;
+        } else {
+            __TBB_ASSERT( n_state & (STATE_WRITER | STATE_UPGRADE_WAITING), "unexpected state");
+            __TBB_ASSERT( (tricky_pointer(n)|FLAG)==next, NULL);
+            next = n;
+        }
+    } else {
+        /* We are in the tail; whoever comes next is blocked by q_tail&FLAG */
+        release_internal_lock();
+    } // if( this != mutex->q_tail... )
+    state.compare_and_swap<tbb::acquire>(STATE_UPGRADE_WAITING, STATE_UPGRADE_REQUESTED);
+
+waiting:
+    __TBB_ASSERT( !( tricky_pointer(next) & FLAG ), "use of corrupted pointer!" );
+    __TBB_ASSERT( state & STATE_COMBINED_UPGRADING, "wrong state at upgrade waiting_retry" );
+    __TBB_ASSERT( me==this, NULL );
+    ITT_NOTIFY(sync_prepare, mutex);
+    /* if noone was blocked by the "corrupted" q_tail, turn it back */
+    mutex->q_tail.compare_and_swap<tbb::release>( this, tricky_pointer(me)|FLAG );
+    queuing_rw_mutex::scoped_lock * pred;
+    pred = tricky_pointer::fetch_and_add<tbb::acquire>(&(this->prev), FLAG);
+    if( pred ) {
+        bool success = pred->try_acquire_internal_lock();
+        pred->state.compare_and_swap<tbb::release>(STATE_UPGRADE_WAITING, STATE_UPGRADE_REQUESTED);
+        if( !success ) {
+            tmp = tricky_pointer::compare_and_swap<tbb::release>(&(this->prev), pred, tricky_pointer(pred)|FLAG );
+            if( tricky_pointer(tmp)&FLAG ) {
+                spin_wait_while_eq(this->prev, pred);
+                pred = this->prev;
+            } else {
+                spin_wait_while_eq( this->prev, tricky_pointer(pred)|FLAG );
+                pred->release_internal_lock();
+            }
+        } else {
+            this->prev = pred;
+            pred->release_internal_lock();
+            spin_wait_while_eq(this->prev, pred);
+            pred = this->prev;
+        }
+        if( pred )
+            goto waiting;
+    } else {
+        // restore the corrupted prev field for possible further use (e.g. if downgrade back to reader)
+        this->prev = pred;
+    }
+    __TBB_ASSERT( !pred && !this->prev, NULL );
+
+    // additional lifetime issue prevention checks
+    // wait for the successor to finish working with my fields
+    wait_for_release_of_internal_lock();
+    // now wait for the predecessor to finish working with my fields
+    spin_wait_while_eq( going, 2 );
+    // there is an acquire semantics statement in the end of spin_wait_while_eq.
+
+    bool result = ( state != STATE_UPGRADE_LOSER );
+    state = STATE_WRITER;
+    going = 1;
+
+    ITT_NOTIFY(sync_acquired, mutex);
+    return result;
+}
+
+void queuing_rw_mutex::internal_construct() {
+    ITT_SYNC_CREATE(this, _T("tbb::queuing_rw_mutex"), _T(""));
+}
+
+} // namespace tbb
diff --git a/dep/tbb/src/tbb/recursive_mutex.cpp b/dep/tbb/src/tbb/recursive_mutex.cpp
new file mode 100644
index 000000000..95e62906c
--- /dev/null
+++ b/dep/tbb/src/tbb/recursive_mutex.cpp
@@ -0,0 +1,143 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "tbb/recursive_mutex.h"
+#include "itt_notify.h"
+
+namespace tbb {
+
+void recursive_mutex::scoped_lock::internal_acquire( recursive_mutex& m ) {
+#if _WIN32||_WIN64
+    switch( m.state ) {
+      case INITIALIZED: 
+        // since we cannot look into the internal of the CriticalSection object
+        // we won't know how many times the lock has been acquired, and thus
+        // we won't know when we may safely set the state back to INITIALIZED
+        // if we change the state to HELD as in mutex.cpp.  thus, we won't change
+        // the state for recursive_mutex
+        EnterCriticalSection( &m.impl );
+        break;
+      case DESTROYED: 
+        __TBB_ASSERT(false,"recursive_mutex::scoped_lock: mutex already destroyed"); 
+        break;
+      default: 
+        __TBB_ASSERT(false,"recursive_mutex::scoped_lock: illegal mutex state");
+        break;
+    }
+#else
+    int error_code = pthread_mutex_lock(&m.impl);
+    __TBB_ASSERT_EX(!error_code,"recursive_mutex::scoped_lock: pthread_mutex_lock failed");
+#endif /* _WIN32||_WIN64 */
+    my_mutex = &m;
+}
+
+void recursive_mutex::scoped_lock::internal_release() {
+    __TBB_ASSERT( my_mutex, "recursive_mutex::scoped_lock: not holding a mutex" );
+#if _WIN32||_WIN64    
+    switch( my_mutex->state ) {
+      case INITIALIZED: 
+        LeaveCriticalSection( &my_mutex->impl );
+        break;
+      case DESTROYED: 
+        __TBB_ASSERT(false,"recursive_mutex::scoped_lock: mutex already destroyed"); 
+        break;
+      default: 
+        __TBB_ASSERT(false,"recursive_mutex::scoped_lock: illegal mutex state");
+        break;
+    }
+#else
+     int error_code = pthread_mutex_unlock(&my_mutex->impl);
+     __TBB_ASSERT_EX(!error_code, "recursive_mutex::scoped_lock: pthread_mutex_unlock failed");
+#endif /* _WIN32||_WIN64 */
+     my_mutex = NULL;
+}
+
+bool recursive_mutex::scoped_lock::internal_try_acquire( recursive_mutex& m ) {
+#if _WIN32||_WIN64
+    switch( m.state ) {
+      case INITIALIZED: 
+        break;
+      case DESTROYED: 
+        __TBB_ASSERT(false,"recursive_mutex::scoped_lock: mutex already destroyed"); 
+        break;
+      default: 
+        __TBB_ASSERT(false,"recursive_mutex::scoped_lock: illegal mutex state");
+        break;
+    }
+#endif /* _WIN32||_WIN64 */
+    bool result;
+#if _WIN32||_WIN64
+    result = TryEnterCriticalSection(&m.impl)!=0;
+#else
+    result = pthread_mutex_trylock(&m.impl)==0;
+#endif /* _WIN32||_WIN64 */
+    if( result )
+        my_mutex = &m;
+    return result;
+}
+
+void recursive_mutex::internal_construct() {
+#if _WIN32||_WIN64
+    InitializeCriticalSection(&impl);
+    state = INITIALIZED;
+#else
+    pthread_mutexattr_t mtx_attr;
+    int error_code = pthread_mutexattr_init( &mtx_attr );
+    if( error_code )
+        tbb::internal::handle_perror(error_code,"recursive_mutex: pthread_mutexattr_init failed");
+
+    pthread_mutexattr_settype( &mtx_attr, PTHREAD_MUTEX_RECURSIVE );
+    error_code = pthread_mutex_init( &impl, &mtx_attr );
+    if( error_code )
+        tbb::internal::handle_perror(error_code,"recursive_mutex: pthread_mutex_init failed");
+    pthread_mutexattr_destroy( &mtx_attr );
+#endif /* _WIN32||_WIN64*/    
+    ITT_SYNC_CREATE(&impl, _T("tbb::recursive_mutex"), _T(""));
+}
+
+void recursive_mutex::internal_destroy() {
+#if _WIN32||_WIN64
+    switch( state ) {
+      case INITIALIZED:
+        DeleteCriticalSection(&impl);
+        break;
+      case DESTROYED: 
+        __TBB_ASSERT(false,"recursive_mutex: already destroyed");
+        break;
+      default: 
+         __TBB_ASSERT(false,"recursive_mutex: illegal state for destruction");
+         break;
+    }
+    state = DESTROYED;
+#else
+    int error_code = pthread_mutex_destroy(&impl); 
+    __TBB_ASSERT_EX(!error_code,"recursive_mutex: pthread_mutex_destroy failed");
+#endif /* _WIN32||_WIN64 */
+}
+
+} // namespace tbb
diff --git a/dep/tbb/src/tbb/spin_mutex.cpp b/dep/tbb/src/tbb/spin_mutex.cpp
new file mode 100644
index 000000000..e233ffb33
--- /dev/null
+++ b/dep/tbb/src/tbb/spin_mutex.cpp
@@ -0,0 +1,68 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "tbb/tbb_machine.h"
+#include "tbb/spin_mutex.h"
+#include "itt_notify.h"
+#include "tbb_misc.h"
+
+namespace tbb {
+
+void spin_mutex::scoped_lock::internal_acquire( spin_mutex& m ) {
+    __TBB_ASSERT( !my_mutex, "already holding a lock on a spin_mutex" );
+    ITT_NOTIFY(sync_prepare, &m);
+    my_unlock_value = __TBB_LockByte(m.flag);
+    my_mutex = &m;
+    ITT_NOTIFY(sync_acquired, &m);
+}
+
+void spin_mutex::scoped_lock::internal_release() {
+    __TBB_ASSERT( my_mutex, "release on spin_mutex::scoped_lock that is not holding a lock" );
+    __TBB_ASSERT( !(my_unlock_value&1), "corrupted scoped_lock?" );
+
+    ITT_NOTIFY(sync_releasing, my_mutex);
+    __TBB_store_with_release(my_mutex->flag, static_cast<unsigned char>(my_unlock_value));
+    my_mutex = NULL;
+}
+
+bool spin_mutex::scoped_lock::internal_try_acquire( spin_mutex& m ) {
+    __TBB_ASSERT( !my_mutex, "already holding a lock on a spin_mutex" );
+    bool result = bool( __TBB_TryLockByte(m.flag) );
+    if( result ) {
+        my_unlock_value = 0;
+        my_mutex = &m;
+        ITT_NOTIFY(sync_acquired, &m);
+    }
+    return result;
+}
+
+void spin_mutex::internal_construct() {
+    ITT_SYNC_CREATE(this, _T("tbb::spin_mutex"), _T(""));
+}
+
+} // namespace tbb
diff --git a/dep/tbb/src/tbb/spin_rw_mutex.cpp b/dep/tbb/src/tbb/spin_rw_mutex.cpp
new file mode 100644
index 000000000..b3ce9d851
--- /dev/null
+++ b/dep/tbb/src/tbb/spin_rw_mutex.cpp
@@ -0,0 +1,174 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "tbb/spin_rw_mutex.h"
+#include "tbb/tbb_machine.h"
+#include "itt_notify.h"
+
+#if defined(_MSC_VER) && defined(_Wp64)
+    // Workaround for overzealous compiler warnings in /Wp64 mode
+    #pragma warning (disable: 4244)
+#endif
+
+namespace tbb {
+
+template<typename T> // a template can work with private spin_rw_mutex::state_t
+static inline T CAS(volatile T &addr, T newv, T oldv) {
+    // ICC (9.1 and 10.1 tried) unable to do implicit conversion 
+    // from "volatile T*" to "volatile void*", so explicit cast added.
+    return T(__TBB_CompareAndSwapW((volatile void *)&addr, (intptr_t)newv, (intptr_t)oldv));
+}
+
+//! Acquire write lock on the given mutex.
+bool spin_rw_mutex_v3::internal_acquire_writer()
+{
+    ITT_NOTIFY(sync_prepare, this);
+    internal::atomic_backoff backoff;
+    for(;;) {
+        state_t s = const_cast<volatile state_t&>(state); // ensure reloading
+        if( !(s & BUSY) ) { // no readers, no writers
+            if( CAS(state, WRITER, s)==s )
+                break; // successfully stored writer flag
+            backoff.reset(); // we could be very close to complete op.
+        } else if( !(s & WRITER_PENDING) ) { // no pending writers
+            __TBB_AtomicOR(&state, WRITER_PENDING);
+        }
+        backoff.pause();
+    }
+    ITT_NOTIFY(sync_acquired, this);
+    return false;
+}
+
+//! Release writer lock on the given mutex
+void spin_rw_mutex_v3::internal_release_writer() 
+{
+    ITT_NOTIFY(sync_releasing, this);
+    __TBB_AtomicAND( &state, READERS );
+}
+
+//! Acquire read lock on given mutex.
+void spin_rw_mutex_v3::internal_acquire_reader() 
+{
+    ITT_NOTIFY(sync_prepare, this);
+    internal::atomic_backoff backoff;
+    for(;;) {
+        state_t s = const_cast<volatile state_t&>(state); // ensure reloading
+        if( !(s & (WRITER|WRITER_PENDING)) ) { // no writer or write requests
+            state_t t = (state_t)__TBB_FetchAndAddW( &state, (intptr_t) ONE_READER );
+            if( !( t&WRITER )) 
+                break; // successfully stored increased number of readers
+            // writer got there first, undo the increment
+            __TBB_FetchAndAddW( &state, -(intptr_t)ONE_READER );
+        }
+        backoff.pause();
+    }
+
+    ITT_NOTIFY(sync_acquired, this);
+    __TBB_ASSERT( state & READERS, "invalid state of a read lock: no readers" );
+}
+
+//! Upgrade reader to become a writer.
+/** Returns true if the upgrade happened without re-acquiring the lock and false if opposite */
+bool spin_rw_mutex_v3::internal_upgrade() 
+{
+    state_t s = state;
+    __TBB_ASSERT( s & READERS, "invalid state before upgrade: no readers " );
+    // check and set writer-pending flag
+    // required conditions: either no pending writers, or we are the only reader
+    // (with multiple readers and pending writer, another upgrade could have been requested)
+    while( (s & READERS)==ONE_READER || !(s & WRITER_PENDING) ) {
+        state_t old_s = s;
+        if( (s=CAS(state, s | WRITER | WRITER_PENDING, s))==old_s ) {
+            internal::atomic_backoff backoff;
+            ITT_NOTIFY(sync_prepare, this);
+            // the state should be 0...0111, i.e. 1 reader and waiting writer;
+            // both new readers and writers are blocked
+            while( (state & READERS) != ONE_READER ) // more than 1 reader
+                backoff.pause(); 
+            __TBB_ASSERT((state&(WRITER_PENDING|WRITER))==(WRITER_PENDING|WRITER),"invalid state when upgrading to writer");
+
+            __TBB_FetchAndAddW( &state,  - (intptr_t)(ONE_READER+WRITER_PENDING));
+            ITT_NOTIFY(sync_acquired, this);
+            return true; // successfully upgraded
+        }
+    }
+    // slow reacquire
+    internal_release_reader();
+    return internal_acquire_writer(); // always returns false
+}
+
+//! Downgrade writer to a reader
+void spin_rw_mutex_v3::internal_downgrade() {
+    ITT_NOTIFY(sync_releasing, this);
+    __TBB_FetchAndAddW( &state, (intptr_t)(ONE_READER-WRITER));
+    __TBB_ASSERT( state & READERS, "invalid state after downgrade: no readers" );
+}
+
+//! Release read lock on the given mutex
+void spin_rw_mutex_v3::internal_release_reader()
+{
+    __TBB_ASSERT( state & READERS, "invalid state of a read lock: no readers" );
+    ITT_NOTIFY(sync_releasing, this); // release reader
+    __TBB_FetchAndAddWrelease( &state,-(intptr_t)ONE_READER);
+}
+
+//! Try to acquire write lock on the given mutex
+bool spin_rw_mutex_v3::internal_try_acquire_writer()
+{
+    // for a writer: only possible to acquire if no active readers or writers
+    state_t s = state;
+    if( !(s & BUSY) ) // no readers, no writers; mask is 1..1101
+        if( CAS(state, WRITER, s)==s ) {
+            ITT_NOTIFY(sync_acquired, this);
+            return true; // successfully stored writer flag
+        }
+    return false;
+}
+
+//! Try to acquire read lock on the given mutex
+bool spin_rw_mutex_v3::internal_try_acquire_reader()
+{
+    // for a reader: acquire if no active or waiting writers
+    state_t s = state;
+    if( !(s & (WRITER|WRITER_PENDING)) ) { // no writers
+        state_t t = (state_t)__TBB_FetchAndAddW( &state, (intptr_t) ONE_READER );
+        if( !( t&WRITER )) {  // got the lock
+            ITT_NOTIFY(sync_acquired, this);
+            return true; // successfully stored increased number of readers
+        }
+        // writer got there first, undo the increment
+        __TBB_FetchAndAddW( &state, -(intptr_t)ONE_READER );
+    }
+    return false;
+}
+
+
+void spin_rw_mutex_v3::internal_construct() {
+    ITT_SYNC_CREATE(this, _T("tbb::spin_rw_mutex"), _T(""));
+}
+} // namespace tbb
diff --git a/dep/tbb/src/tbb/task.cpp b/dep/tbb/src/tbb/task.cpp
new file mode 100644
index 000000000..857052270
--- /dev/null
+++ b/dep/tbb/src/tbb/task.cpp
@@ -0,0 +1,3912 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+/* This file contains the TBB task scheduler. There are many classes
+   lumped together here because very few are exposed to the outside
+   world, and by putting them in a single translation unit, the
+   compiler's optimizer might be able to do a better job. */
+
+#if USE_PTHREAD
+
+    // Some pthreads documentation says that <pthread.h> must be first header.
+    #include <pthread.h>
+    #define __TBB_THREAD_ROUTINE 
+
+#elif USE_WINTHREAD
+
+    #include <windows.h>
+    #include <process.h>        /* Need _beginthreadex from there */
+    #include <malloc.h>         /* Need _alloca from there */
+    #define __TBB_THREAD_ROUTINE WINAPI
+
+#else
+
+    #error Must define USE_PTHREAD or USE_WINTHREAD
+
+#endif
+
+#include <cassert>
+#include <cstddef>
+#include <cstdlib>
+#include <cstring>
+#include <cstdio>
+#include <new>
+#include "tbb/tbb_stddef.h"
+
+/* Temporarily change "private" to "public" while including "tbb/task.h".
+   This hack allows us to avoid publishing internal types and methods
+   in the public header files just for sake of friend declarations. */
+#define private public
+#include "tbb/task.h"
+#if __TBB_EXCEPTIONS
+#include "tbb/tbb_exception.h"
+#endif /* __TBB_EXCEPTIONS */
+#undef private
+
+#include "tbb/task_scheduler_init.h"
+#include "tbb/cache_aligned_allocator.h"
+#include "tbb/tbb_machine.h"
+#include "tbb/mutex.h"
+#include "tbb/atomic.h"
+#if __TBB_SCHEDULER_OBSERVER
+#include "tbb/task_scheduler_observer.h"
+#include "tbb/spin_rw_mutex.h"
+#include "tbb/aligned_space.h"
+#endif /* __TBB_SCHEDULER_OBSERVER */
+#if __TBB_EXCEPTIONS
+#include "tbb/spin_mutex.h"
+#endif /* __TBB_EXCEPTIONS */
+
+#include "tbb/partitioner.h"
+
+#include "../rml/include/rml_tbb.h"
+
+namespace tbb {
+    namespace internal {
+        namespace rml {
+            tbb_server* make_private_server( tbb_client& client );
+        }
+    }
+}
+
+#if DO_TBB_TRACE
+#include <cstdio>
+#define TBB_TRACE(x) ((void)std::printf x)
+#else
+#define TBB_TRACE(x) ((void)(0))
+#endif /* DO_TBB_TRACE */
+
+#if TBB_USE_ASSERT
+#define COUNT_TASK_NODES 1
+#endif /* TBB_USE_ASSERT */
+
+/* If nonzero, then gather statistics */
+#ifndef STATISTICS
+#define STATISTICS 0
+#endif /* STATISTICS */
+
+#if STATISTICS
+#define GATHER_STATISTIC(x) (x)
+#else
+#define GATHER_STATISTIC(x) ((void)0)
+#endif /* STATISTICS */
+
+#if __TBB_EXCEPTIONS
+// The standard offsetof macro does not work for us since its usage is restricted 
+// by POD-types only. Using 0x1000 (not NULL) is necessary to appease GCC.
+#define __TBB_offsetof(class_name, member_name) \
+    ((ptrdiff_t)&(reinterpret_cast<class_name*>(0x1000)->member_name) - 0x1000)
+// Returns address of the object containing a member with the given name and address
+#define __TBB_get_object_addr(class_name, member_name, member_addr) \
+    reinterpret_cast<class_name*>((char*)member_addr - __TBB_offsetof(class_name, member_name))
+#endif /* __TBB_EXCEPTIONS */
+
+// This macro is an attempt to get rid of ugly ifdefs in the shared parts of the code. 
+// It drops the second argument depending on whether the controlling macro is defined. 
+// The first argument is just a convenience allowing to keep comma before the macro usage.
+#if __TBB_EXCEPTIONS
+    #define __TBB_CONTEXT_ARG(arg1, context) arg1, context
+#else /* !__TBB_EXCEPTIONS */
+    #define __TBB_CONTEXT_ARG(arg1, context) arg1
+#endif /* !__TBB_EXCEPTIONS */
+
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+    // Workaround for overzealous compiler warnings
+    // These particular warnings are so ubquitous that no attempt is made to narrow 
+    // the scope of the warnings.
+    #pragma warning (disable: 4100 4127 4312 4244 4267 4706)
+#endif
+
+// internal headers
+#include "tbb_misc.h"
+#include "itt_notify.h"
+#include "tls.h"
+
+namespace tbb {
+
+using namespace std;
+
+#if DO_ITT_NOTIFY
+    const tchar 
+            *SyncType_GlobalLock = _T("TbbGlobalLock"),
+            *SyncType_Scheduler = _T("%Constant")
+            ;
+    const tchar 
+            *SyncObj_SchedulerInitialization = _T("TbbSchedulerInitialization"),
+            *SyncObj_SchedulersList = _T("TbbSchedulersList"),
+            *SyncObj_WorkerLifeCycleMgmt = _T("TBB Scheduler"),
+            *SyncObj_TaskStealingLoop = _T("TBB Scheduler"),
+            *SyncObj_WorkerTaskPool = _T("TBB Scheduler"),
+            *SyncObj_MasterTaskPool = _T("TBB Scheduler"),
+            *SyncObj_TaskPoolSpinning = _T("TBB Scheduler"),
+            *SyncObj_Mailbox = _T("TBB Scheduler"),
+            *SyncObj_TaskReturnList = _T("TBB Scheduler"),
+            *SyncObj_GateLock = _T("TBB Scheduler"),
+            *SyncObj_Gate = _T("TBB Scheduler"),
+            *SyncObj_ContextsList = _T("TBB Scheduler")
+            ;
+#endif /* DO_ITT_NOTIFY */
+
+namespace internal {
+
+const stack_size_type MByte = 1<<20;
+#if !defined(__TBB_WORDSIZE)
+const stack_size_type ThreadStackSize = 1*MByte;
+#elif __TBB_WORDSIZE<=4
+const stack_size_type ThreadStackSize = 2*MByte;
+#else
+const stack_size_type ThreadStackSize = 4*MByte;
+#endif
+
+#if USE_PTHREAD
+typedef void* thread_routine_return_type;
+#else
+typedef unsigned thread_routine_return_type;
+#endif
+
+//------------------------------------------------------------------------
+// General utility section
+//------------------------------------------------------------------------
+
+#if TBB_USE_ASSERT
+    #define __TBB_POISON_DEQUE 1
+#endif /* TBB_USE_ASSERT */
+
+#if __TBB_POISON_DEQUE
+    #if __ia64__
+        task* const poisoned_taskptr = (task*)0xDDEEAADDDEADBEEF;
+    #elif _WIN64
+        task* const poisoned_taskptr = (task*)0xDDEEAADDDEADBEEF;
+    #else
+        task* const poisoned_taskptr = (task*)0xDEADBEEF;
+    #endif
+
+    #define __TBB_POISON_TASK_PTR(ptr) ptr = poisoned_taskptr
+    #define __TBB_ASSERT_VALID_TASK_PTR(ptr) __TBB_ASSERT( ptr != poisoned_taskptr, "task pointer in the deque is poisoned" )
+#else /* !__TBB_POISON_DEQUE */
+    #define __TBB_POISON_TASK_PTR(ptr) ((void)0)
+    #define __TBB_ASSERT_VALID_TASK_PTR(ptr) ((void)0)
+#endif /* !__TBB_POISON_DEQUE */
+
+
+//! Vector that grows without reallocations, and stores items in the reverse order.
+/** Requires to initialize its first segment with a preallocated memory chunk
+    (usually it is static array or an array allocated on the stack).
+    The second template parameter specifies maximal number of segments. Each next 
+    segment is twice as large as the previous one. **/
+template<typename T, size_t max_segments = 16>
+class fast_reverse_vector
+{
+public:
+    fast_reverse_vector ( T* initial_segment, size_t segment_size )
+        : m_cur_segment(initial_segment)
+        , m_cur_segment_size(segment_size)
+        , m_pos(segment_size)
+        , m_num_segments(0)
+        , m_size(0)
+    {
+        __TBB_ASSERT ( initial_segment && segment_size, "Nonempty initial segment must be supplied");
+    }
+
+    ~fast_reverse_vector ()
+    {
+        for ( size_t i = 1; i < m_num_segments; ++i )
+            NFS_Free( m_segments[i] );
+    }
+
+    size_t size () const { return m_size + m_cur_segment_size - m_pos; }
+
+    void push_back ( const T& val )
+    {
+        if ( !m_pos ) {
+            m_segments[m_num_segments++] = m_cur_segment;
+            __TBB_ASSERT ( m_num_segments < max_segments, "Maximal capacity exceeded" );
+            m_size += m_cur_segment_size;
+            m_cur_segment_size *= 2;
+            m_pos = m_cur_segment_size;
+            m_cur_segment = (T*)NFS_Allocate( m_cur_segment_size * sizeof(T), 1, NULL );
+        }
+        m_cur_segment[--m_pos] = val;
+    }
+
+    //! Copies the contents of the vector into the dst array. 
+    /** Can only be used when T is a POD type, as copying does not invoke copy constructors. **/
+    void copy_memory ( T* dst ) const
+    {
+        size_t size = m_cur_segment_size - m_pos;
+        memcpy( dst, m_cur_segment + m_pos, size * sizeof(T) );
+        dst += size;
+        size = m_cur_segment_size / 2;
+        for ( long i = (long)m_num_segments - 1; i >= 0; --i ) {
+            memcpy( dst, m_segments[i], size * sizeof(T) );
+            dst += size;
+            size /= 2;
+        }
+    }
+
+protected:
+    //! The current (not completely filled) segment
+    T       *m_cur_segment;
+
+    //! Capacity of m_cur_segment
+    size_t  m_cur_segment_size;
+
+    //! Insertion position in m_cur_segment
+    size_t  m_pos;
+
+    //! Array of filled segments (has fixed size specified by the second template parameter)
+    T       *m_segments[max_segments];
+    
+    //! Number of filled segments (the size of m_segments)
+    size_t  m_num_segments;
+
+    //! Number of items in the segments in m_segments
+    size_t  m_size;
+
+}; // class fast_reverse_vector
+
+//------------------------------------------------------------------------
+// End of general utility section
+//------------------------------------------------------------------------
+
+//! Alignment for a task object
+const size_t task_alignment = 16;
+
+//! Number of bytes reserved for a task prefix
+/** If not exactly sizeof(task_prefix), the extra bytes *precede* the task_prefix. */
+const size_t task_prefix_reservation_size = ((sizeof(internal::task_prefix)-1)/task_alignment+1)*task_alignment;
+
+template<typename SchedulerTraits> class CustomScheduler;
+
+
+class mail_outbox;
+
+struct task_proxy: public task {
+    static const intptr pool_bit = 1;
+    static const intptr mailbox_bit = 2;
+    /* All but two low-order bits represent a (task*).
+       Two low-order bits mean:
+       1 = proxy is/was/will be in task pool
+       2 = proxy is/was/will be in mailbox */
+    intptr task_and_tag;
+
+    //! Pointer to next task_proxy in a mailbox
+    task_proxy* next_in_mailbox;
+
+    //! Mailbox to which this was mailed.
+    mail_outbox* outbox;
+};
+
+//! Internal representation of mail_outbox, without padding.
+class unpadded_mail_outbox {
+protected:
+    //! Pointer to first task_proxy in mailbox, or NULL if box is empty. 
+    task_proxy* my_first;
+
+    //! Pointer to last task_proxy in mailbox, or NULL if box is empty. 
+    /** Low-order bit set to 1 to represent lock on the box. */
+    task_proxy* my_last;
+
+    //! Owner of mailbox is not executing a task, and has drained its own task pool.
+    bool my_is_idle;
+};
+
+//! Class representing where mail is put.
+/** Padded to occupy a cache line. */
+class mail_outbox: unpadded_mail_outbox {
+    char pad[NFS_MaxLineSize-sizeof(unpadded_mail_outbox)];
+
+    //! Acquire lock on the box.
+    task_proxy* acquire() {
+        atomic_backoff backoff;
+        for(;;) {
+            // No fence on load, because subsequent compare-and-swap has the necessary fence.
+            intptr last = (intptr)my_last;
+            if( (last&1)==0 && __TBB_CompareAndSwapW(&my_last,last|1,last)==last) {
+                __TBB_ASSERT( (my_first==NULL)==((intptr(my_last)&~1)==0), NULL );
+                return (task_proxy*)last;
+            }
+            backoff.pause();
+        }
+    }
+    task_proxy* internal_pop() {
+        //! No fence on load of my_first, because if it is NULL, there's nothing further to read from another thread.
+        task_proxy* result = my_first;
+        if( result ) {
+            if( task_proxy* f = __TBB_load_with_acquire(result->next_in_mailbox) ) {
+                // No lock required
+                __TBB_store_with_release( my_first, f );
+            } else {
+                // acquire() has the necessary fence.
+                task_proxy* l = acquire();
+                __TBB_ASSERT(result==my_first,NULL); 
+                if( !(my_first = result->next_in_mailbox) ) 
+                    l=0;
+                __TBB_store_with_release( my_last, l );
+            }
+        }
+        return result;
+    }
+public:
+    friend class mail_inbox;
+
+    //! Push task_proxy onto the mailbox queue of another thread.
+    void push( task_proxy& t ) {
+        __TBB_ASSERT(&t!=NULL, NULL);
+        t.next_in_mailbox = NULL; 
+        if( task_proxy* l = acquire() ) {
+            l->next_in_mailbox = &t;
+        } else {
+            my_first=&t;
+        }
+        // Fence required because caller is sending the task_proxy to another thread.
+        __TBB_store_with_release( my_last, &t );
+    }
+#if TBB_USE_ASSERT
+    //! Verify that *this is initialized empty mailbox.
+    /** Raise assertion if *this is not in initialized state, or sizeof(this) is wrong.
+        Instead of providing a constructor, we provide this assertion, because for
+        brevity and speed, we depend upon a memset to initialize instances of this class */
+    void assert_is_initialized() const {
+        __TBB_ASSERT( sizeof(*this)==NFS_MaxLineSize, NULL );
+        __TBB_ASSERT( !my_first, NULL );
+        __TBB_ASSERT( !my_last, NULL );
+        __TBB_ASSERT( !my_is_idle, NULL );
+    }
+#endif /* TBB_USE_ASSERT */
+
+    //! Drain the mailbox 
+    intptr drain() {
+        intptr k = 0;
+        // No fences here because other threads have already quit.
+        for( ; task_proxy* t = my_first; ++k ) {
+            my_first = t->next_in_mailbox;
+            NFS_Free((char*)t-task_prefix_reservation_size);
+        }
+        return k;  
+    }
+
+    //! True if thread that owns this mailbox is looking for work.
+    bool recipient_is_idle() {
+        return my_is_idle;
+    }
+};
+
+//! Class representing source of mail.
+class mail_inbox {
+    //! Corresponding sink where mail that we receive will be put.
+    mail_outbox* my_putter;
+public:
+    //! Construct unattached inbox
+    mail_inbox() : my_putter(NULL) {}
+
+    //! Attach inbox to a corresponding outbox. 
+    void attach( mail_outbox& putter ) {
+        __TBB_ASSERT(!my_putter,"already attached");
+        my_putter = &putter;
+    }
+    //! Detach inbox from its outbox
+    void detach() {
+        __TBB_ASSERT(my_putter,"not attached");
+        my_putter = NULL;
+    }
+    //! Get next piece of mail, or NULL if mailbox is empty.
+    task_proxy* pop() {
+        return my_putter->internal_pop();
+    }
+    //! Indicate whether thread that reads this mailbox is idle.
+    /** Raises assertion failure if mailbox is redundantly marked as not idle. */
+    void set_is_idle( bool value ) {
+        if( my_putter ) {
+            __TBB_ASSERT( my_putter->my_is_idle || value, "attempt to redundantly mark mailbox as not idle" );
+            my_putter->my_is_idle = value;
+        }
+    }
+#if TBB_USE_ASSERT
+    //! Indicate whether thread that reads this mailbox is idle.
+    bool assert_is_idle( bool value ) const {
+        __TBB_ASSERT( !my_putter || my_putter->my_is_idle==value, NULL );
+        return true;
+    }
+#endif /* TBB_USE_ASSERT */
+#if DO_ITT_NOTIFY
+    //! Get pointer to corresponding outbox used for ITT_NOTIFY calls.
+    void* outbox() const {return my_putter;}
+#endif /* DO_ITT_NOTIFY */ 
+};
+
+#if __TBB_SCHEDULER_OBSERVER
+//------------------------------------------------------------------------
+// observer_proxy
+//------------------------------------------------------------------------
+class observer_proxy {
+    friend class task_scheduler_observer_v3;
+    //! Reference count used for garbage collection.
+    /** 1 for reference from my task_scheduler_observer.
+        1 for each local_last_observer_proxy that points to me. 
+        No accounting for predecessor in the global list. 
+        No accounting for global_last_observer_proxy that points to me. */
+    atomic<int> gc_ref_count;
+    //! Pointer to next task_scheduler_observer 
+    /** Valid even when *this has been removed from the global list. */
+    observer_proxy* next; 
+    //! Pointer to previous task_scheduler_observer in global list.
+    observer_proxy* prev; 
+    //! Associated observer
+    task_scheduler_observer* observer;
+    //! Account for removing reference from p.  No effect if p is NULL.
+    void remove_ref_slow();
+    void remove_from_list(); 
+    observer_proxy( task_scheduler_observer_v3& wo ); 
+public:
+    static observer_proxy* process_list( observer_proxy* local_last, bool is_worker, bool is_entry );
+};
+#endif /* __TBB_SCHEDULER_OBSERVER */
+
+
+//------------------------------------------------------------------------
+// Arena
+//------------------------------------------------------------------------
+
+class Arena;
+class GenericScheduler;
+
+struct WorkerDescriptor {
+    //! NULL until worker is published.  -1 if worker should not be published.
+    GenericScheduler* scheduler;
+
+};
+
+//! The useful contents of an ArenaPrefix
+class UnpaddedArenaPrefix: no_copy 
+   ,rml::tbb_client
+{
+    friend class GenericScheduler;
+    template<typename SchedulerTraits> friend class internal::CustomScheduler;
+    friend class Arena;
+    friend class Governor;
+    friend struct WorkerDescriptor;
+
+    //! Arena slot to try to acquire first for the next new master.
+    unsigned limit;
+
+    //! Number of masters that own this arena.
+    /** This may be smaller than the number of masters who have entered the arena. */
+    unsigned number_of_masters;
+
+    //! Total number of slots in the arena
+    const unsigned number_of_slots;
+
+    //! Number of workers that belong to this arena
+    const unsigned number_of_workers;
+
+    //! Pointer to the RML server object that services requests for this arena.
+    rml::tbb_server* server;
+    //! Counter used to allocate job indices
+    tbb::atomic<size_t> next_job_index;
+
+    //! Stack size of worker threads
+    stack_size_type stack_size;
+
+    //! Array of workers.
+    WorkerDescriptor* worker_list;
+
+#if COUNT_TASK_NODES
+    //! Net number of nodes that have been allocated from heap.
+    /** Updated each time a scheduler is destroyed. */
+    atomic<intptr> task_node_count;
+#endif /* COUNT_TASK_NODES */
+
+    //! Estimate of number of available tasks.  
+    /** The estimate is either 0 (SNAPSHOT_EMPTY), infinity (SNAPSHOT_FULL), or a special value. 
+        The implementation of Arena::is_busy_or_empty requires that pool_state_t be unsigned. */
+    typedef uintptr_t pool_state_t;
+
+    //! Current estimate of number of available tasks.  
+    tbb::atomic<pool_state_t> pool_state;
+ 
+protected:
+    UnpaddedArenaPrefix( unsigned number_of_slots_, unsigned number_of_workers_ ) :
+        number_of_masters(1),
+        number_of_slots(number_of_slots_),
+        number_of_workers(number_of_workers_)
+    {
+#if COUNT_TASK_NODES
+        task_node_count = 0;
+#endif /* COUNT_TASK_NODES */
+        limit = number_of_workers_;
+        server = NULL;
+        stack_size = 0;
+        next_job_index = 0;
+    }
+    void open_connection_to_rml();
+
+private:
+    //! Return reference to corresponding arena.
+    Arena& arena();
+
+    /*override*/ version_type version() const {
+        return 0;
+    }
+
+    /*override*/ unsigned max_job_count() const {
+        return number_of_workers;
+    }
+
+    /*override*/ size_t min_stack_size() const {
+        return stack_size;
+    }
+
+    /*override*/ policy_type policy() const {
+        return throughput;
+    }
+
+    /*override*/ job* create_one_job();
+
+    /*override*/ void cleanup( job& j );
+
+    /*override*/ void acknowledge_close_connection();
+
+    /*override*/ void process( job& j );
+};
+
+//! The prefix to Arena with padding.
+class ArenaPrefix: public UnpaddedArenaPrefix {
+    //! Padding to fill out to multiple of cache line size.
+    char pad[(sizeof(UnpaddedArenaPrefix)/NFS_MaxLineSize+1)*NFS_MaxLineSize-sizeof(UnpaddedArenaPrefix)];
+
+public:
+    ArenaPrefix( unsigned number_of_slots_, unsigned number_of_workers_ ) :
+        UnpaddedArenaPrefix(number_of_slots_,number_of_workers_)
+    {
+    }
+};
+
+
+struct ArenaSlot {
+    // Task pool (the deque of task pointers) of the scheduler that owns this slot
+    /** Also is used to specify if the slot is empty or locked:
+         0 - empty
+        -1 - locked **/
+    task** task_pool;
+
+    //! Index of the first ready task in the deque.
+    /** Modified by thieves, and by the owner during compaction/reallocation **/
+    size_t head;
+
+    //! Padding to avoid false sharing caused by the thieves accessing this slot
+    char pad1[NFS_MaxLineSize - sizeof(size_t) - sizeof(task**)];
+
+    //! Index of the element following the last ready task in the deque.
+    /** Modified by the owner thread. **/
+    size_t tail;
+
+    //! Padding to avoid false sharing caused by the thieves accessing the next slot
+    char pad2[NFS_MaxLineSize - sizeof(size_t)];
+};
+
+
+class Arena {
+    friend class UnpaddedArenaPrefix;
+    friend class GenericScheduler;
+    template<typename SchedulerTraits> friend class internal::CustomScheduler;
+    friend class Governor;
+    friend struct WorkerDescriptor;
+
+    //! Get reference to prefix portion
+    ArenaPrefix& prefix() const {return ((ArenaPrefix*)(void*)this)[-1];}
+
+    //! Get reference to mailbox corresponding to given affinity_id.
+    mail_outbox& mailbox( affinity_id id ) {
+        __TBB_ASSERT( 0<id, "id must be positive integer" );
+        __TBB_ASSERT( id <= prefix().number_of_slots, "id out of bounds" );
+        return ((mail_outbox*)&prefix())[-(int)id];
+    }
+
+    //! Allocate an instance of Arena, and prepare everything to start workers.
+    static Arena* allocate_arena( unsigned number_of_slots, unsigned number_of_workers, stack_size_type stack_size );
+
+    void free_arena() {
+        // Drain mailboxes
+        // TODO: each scheduler should plug-and-drain its own mailbox when it terminates.
+        intptr drain_count = 0;
+        for( unsigned i=1; i<=prefix().number_of_slots; ++i )
+            drain_count += mailbox(i).drain();
+#if COUNT_TASK_NODES
+        prefix().task_node_count -= drain_count;
+#if !TEST_ASSEMBLY_ROUTINES
+        if( prefix().task_node_count ) {
+            fprintf(stderr,"warning: leaked %ld task objects\n", long(prefix().task_node_count));
+        }
+#endif /* !TEST_ASSEMBLY_ROUTINES */
+#endif /* COUNT_TASK_NODES */
+        void* storage  = &mailbox(prefix().number_of_slots);
+        delete[] prefix().worker_list;
+        prefix().~ArenaPrefix();
+        NFS_Free( storage );
+    }
+
+    typedef ArenaPrefix::pool_state_t pool_state_t;
+
+    //! No tasks to steal since last snapshot was taken
+    static const pool_state_t SNAPSHOT_EMPTY = 0;
+
+    //! At least one task has been offered for stealing since the last snapshot started
+    static const pool_state_t SNAPSHOT_FULL = pool_state_t(-1);
+
+    //! Server is going away and hence further calls to adjust_job_count_estimate are unsafe.
+    static const pool_state_t SNAPSHOT_SERVER_GOING_AWAY = pool_state_t(-2);
+
+    //! No tasks to steal or snapshot is being taken.
+    static bool is_busy_or_empty( pool_state_t s ) {return s<SNAPSHOT_SERVER_GOING_AWAY;}
+
+    //! If necessary, inform gate that task was added to pool recently.
+    void mark_pool_full();
+
+    //! Check if pool is empty.
+    /** Return true if pool is empty or being cleaned up. */
+    bool check_if_pool_is_empty();
+
+    //! Terminate worker threads
+    /** Wait for worker threads to complete. */
+    void terminate_workers();
+
+
+#if COUNT_TASK_NODES
+    //! Returns the number of task objects "living" in worker threads
+    inline intptr workers_task_node_count();
+#endif
+
+    /** Must be last field */
+    ArenaSlot slot[1];
+};
+
+//------------------------------------------------------------------------
+//! Traits classes for scheduler
+//------------------------------------------------------------------------
+
+struct DefaultSchedulerTraits {
+    static const int id = 0;
+    static const bool itt_possible = true;
+    static const bool has_slow_atomic = false;
+};
+
+struct IntelSchedulerTraits {
+    static const int id = 1;
+    static const bool itt_possible = false;
+#if __TBB_x86_32||__TBB_x86_64
+    static const bool has_slow_atomic = true;
+#else
+    static const bool has_slow_atomic = false;
+#endif /* __TBB_x86_32||__TBB_x86_64 */
+};
+
+//------------------------------------------------------------------------
+// Class __TBB_InitOnce
+//------------------------------------------------------------------------
+
+//! Class that supports TBB initialization. 
+/** It handles acquisition and release of global resources (e.g. TLS) during startup and shutdown,
+    as well as synchronization for DoOneTimeInitializations. */
+class __TBB_InitOnce {
+    friend void DoOneTimeInitializations();
+    friend void ITT_DoUnsafeOneTimeInitialization ();
+
+    static atomic<int> count;
+
+    //! Platform specific code to acquire resources.
+    static void acquire_resources();
+
+    //! Platform specific code to release resources.
+    static void release_resources();
+
+    static bool InitializationDone;
+
+    // Scenarios are possible when tools interop has to be initialized before the
+    // TBB itself. This imposes a requirement that the global initialization lock 
+    // has to support valid static initialization, and does not issue any tool
+    // notifications in any build mode.
+    typedef unsigned char mutex_type;
+
+    // Global initialization lock
+    static mutex_type InitializationLock;
+
+public:
+    static void lock()   { __TBB_LockByte( InitializationLock ); }
+
+    static void unlock() { __TBB_store_with_release( InitializationLock, 0 ); }
+
+    static bool initialization_done() { return __TBB_load_with_acquire(InitializationDone); }
+
+    //! Add initial reference to resources. 
+    /** We assume that dynamic loading of the library prevents any other threads from entering the library
+        until this constructor has finished running. */
+    __TBB_InitOnce() { add_ref(); }
+
+    //! Remove the initial reference to resources.
+    /** This is not necessarily the last reference if other threads are still running.
+        If the extra reference from DoOneTimeInitializations is present, remove it as well.*/
+    ~__TBB_InitOnce();
+
+    //! Add reference to resources.  If first reference added, acquire the resources.
+    static void add_ref() {
+        if( ++count==1 )
+            acquire_resources();
+    }
+    //! Remove reference to resources.  If last reference removed, release the resources.
+    static void remove_ref() {
+        int k = --count;
+        __TBB_ASSERT(k>=0,"removed __TBB_InitOnce ref that was not added?"); 
+        if( k==0 ) 
+            release_resources();
+    }
+}; // class __TBB_InitOnce
+
+//------------------------------------------------------------------------
+// Class Governor
+//------------------------------------------------------------------------
+
+//! The class handles access to the single instance of Arena, and to TLS to keep scheduler instances.
+/** It also supports automatic on-demand intialization of the TBB scheduler.
+    The class contains only static data members and methods.*/
+class Governor {
+    friend class __TBB_InitOnce;
+    friend void ITT_DoUnsafeOneTimeInitialization ();
+
+    static basic_tls<GenericScheduler*> theTLS;
+    static Arena* theArena;
+    static mutex  theArenaMutex;
+
+    //! Create key for thread-local storage.
+    static void create_tls() {
+#if USE_PTHREAD
+        int status = theTLS.create(auto_terminate);
+#else
+        int status = theTLS.create();
+#endif
+        if( status )
+            handle_perror(status, "TBB failed to initialize TLS storage\n");
+    }
+
+    //! Destroy the thread-local storage key.
+    static void destroy_tls() {
+#if TBB_USE_ASSERT
+        if( __TBB_InitOnce::initialization_done() && theTLS.get() ) 
+            fprintf(stderr, "TBB is unloaded while tbb::task_scheduler_init object is alive?");
+#endif
+        int status = theTLS.destroy();
+        if( status )
+            handle_perror(status, "TBB failed to destroy TLS storage");
+    }
+    
+    //! Obtain the instance of arena to register a new master thread
+    /** If there is no active arena, create one. */
+    static Arena* obtain_arena( int number_of_threads, stack_size_type thread_stack_size )
+    {
+        mutex::scoped_lock lock( theArenaMutex );
+        Arena* a = theArena;
+        if( a ) {
+            a->prefix().number_of_masters += 1;
+        } else {
+            if( number_of_threads==task_scheduler_init::automatic )
+                number_of_threads = task_scheduler_init::default_num_threads();
+            a = Arena::allocate_arena( 2*number_of_threads, number_of_threads-1,
+                                       thread_stack_size?thread_stack_size:ThreadStackSize );
+            __TBB_ASSERT( a->prefix().number_of_masters==1, NULL );
+            // Publish the Arena.  
+            // A memory release fence is not required here, because workers have not started yet,
+            // and concurrent masters inspect theArena while holding theArenaMutex.
+            __TBB_ASSERT( !theArena, NULL );
+            theArena = a;
+            // Must create server under lock, otherwise second master might see arena without a server.
+            a->prefix().open_connection_to_rml();
+        }
+        return a;
+    }
+
+    //! The internal routine to undo automatic initialization.
+    /** The signature is written with void* so that the routine
+        can be the destructor argument to pthread_key_create. */
+    static void auto_terminate(void* scheduler);
+
+public:
+    //! Processes scheduler initialization request (possibly nested) in a master thread
+    /** If necessary creates new instance of arena and/or local scheduler.
+        The auto_init argument specifies if the call is due to automatic initialization. **/
+    static GenericScheduler* init_scheduler( int num_threads, stack_size_type stack_size, bool auto_init = false );
+
+    //! Processes scheduler termination request (possibly nested) in a master thread
+    static void terminate_scheduler( GenericScheduler* s );
+
+    //! Dereference arena when a master thread stops using TBB.
+    /** If no more masters in the arena, terminate workers and destroy it. */
+    static void finish_with_arena() {
+        mutex::scoped_lock lock( theArenaMutex );
+        Arena* a = theArena;
+        __TBB_ASSERT( a, "theArena is missing" );
+        if( --(a->prefix().number_of_masters) )
+            a = NULL;
+        else {
+            theArena = NULL;
+            // Must do this while holding lock, otherwise terminate message might reach
+            // RML thread *after* initialize message reaches it for the next arena, which
+            // which causes TLS to be set to new value before old one is erased!
+            a->terminate_workers();
+        }
+    }
+
+    static size_t number_of_workers_in_arena() {
+        __TBB_ASSERT( theArena, "thread did not activate a task_scheduler_init object?" );
+        // No fence required to read theArena, because it does not change after the thread starts.
+        return theArena->prefix().number_of_workers;
+    }
+
+    //! Register TBB scheduler instance in thread local storage.
+    inline static void sign_on(GenericScheduler* s);
+
+    //! Unregister TBB scheduler instance from thread local storage.
+    inline static void sign_off(GenericScheduler* s);
+
+    //! Used to check validity of the local scheduler TLS contents.
+    static bool is_set ( GenericScheduler* s ) { return theTLS.get() == s; }
+
+    //! Obtain the thread local instance of the TBB scheduler.
+    /** If the scheduler has not been initialized yet, initialization is done automatically.
+        Note that auto-initialized scheduler instance is destroyed only when its thread terminates. **/
+    static GenericScheduler* local_scheduler () {
+        GenericScheduler* s = theTLS.get();
+        return s ? s : init_scheduler( task_scheduler_init::automatic, 0, true );
+    }
+
+    //! Undo automatic initialization if necessary; call when a thread exits.
+    static void terminate_auto_initialized_scheduler() {
+        auto_terminate( theTLS.get() );
+    }
+}; // class Governor
+
+//------------------------------------------------------------------------
+// Begin shared data layout.
+//
+// The following global data items are read-only after initialization.
+// The first item is aligned on a 128 byte boundary so that it starts a new cache line.
+//------------------------------------------------------------------------
+
+basic_tls<GenericScheduler*> Governor::theTLS;
+Arena * Governor::theArena;
+mutex   Governor::theArenaMutex;
+
+//! Number of hardware threads
+/** One more than the default number of workers. */
+static int DefaultNumberOfThreads;
+
+//! T::id for the scheduler traits type T to use for the scheduler
+/** For example, the default value is DefaultSchedulerTraits::id. */
+static int SchedulerTraitsId;
+
+//! Counter of references to global shared resources such as TLS.
+atomic<int> __TBB_InitOnce::count;
+
+__TBB_InitOnce::mutex_type __TBB_InitOnce::InitializationLock;
+
+//! Flag that is set to true after one-time initializations are done.
+bool __TBB_InitOnce::InitializationDone;
+
+#if DO_ITT_NOTIFY
+    static bool ITT_Present;
+    static bool ITT_InitializationDone;
+#endif
+
+static rml::tbb_factory rml_server_factory;
+//! Set to true if private statically linked RML server should be used instead of shared server.
+static bool use_private_rml;
+
+#if !(_WIN32||_WIN64) || __TBB_TASK_CPP_DIRECTLY_INCLUDED
+    static __TBB_InitOnce __TBB_InitOnceHiddenInstance;
+#endif
+
+#if __TBB_SCHEDULER_OBSERVER
+typedef spin_rw_mutex::scoped_lock task_scheduler_observer_mutex_scoped_lock;
+/** aligned_space used here to shut up warnings when mutex destructor is called while threads are still using it. */
+static aligned_space<spin_rw_mutex,1> the_task_scheduler_observer_mutex;
+static observer_proxy* global_first_observer_proxy;
+static observer_proxy* global_last_observer_proxy;
+#endif /* __TBB_SCHEDULER_OBSERVER */
+
+//! Table of primes used by fast random-number generator.
+/** Also serves to keep anything else from being placed in the same
+    cache line as the global data items preceding it. */
+static const unsigned Primes[] = {
+    0x9e3779b1, 0xffe6cc59, 0x2109f6dd, 0x43977ab5,
+    0xba5703f5, 0xb495a877, 0xe1626741, 0x79695e6b,
+    0xbc98c09f, 0xd5bee2b3, 0x287488f9, 0x3af18231,
+    0x9677cd4d, 0xbe3a6929, 0xadc6a877, 0xdcf0674b,
+    0xbe4d6fe9, 0x5f15e201, 0x99afc3fd, 0xf3f16801,
+    0xe222cfff, 0x24ba5fdb, 0x0620452d, 0x79f149e3,
+    0xc8b93f49, 0x972702cd, 0xb07dd827, 0x6c97d5ed,
+    0x085a3d61, 0x46eb5ea7, 0x3d9910ed, 0x2e687b5b,
+    0x29609227, 0x6eb081f1, 0x0954c4e1, 0x9d114db9,
+    0x542acfa9, 0xb3e6bd7b, 0x0742d917, 0xe9f3ffa7,
+    0x54581edb, 0xf2480f45, 0x0bb9288f, 0xef1affc7,
+    0x85fa0ca7, 0x3ccc14db, 0xe6baf34b, 0x343377f7,
+    0x5ca19031, 0xe6d9293b, 0xf0a9f391, 0x5d2e980b,
+    0xfc411073, 0xc3749363, 0xb892d829, 0x3549366b,
+    0x629750ad, 0xb98294e5, 0x892d9483, 0xc235baf3,
+    0x3d2402a3, 0x6bdef3c9, 0xbec333cd, 0x40c9520f
+};
+
+#if STATISTICS
+//! Class for collecting statistics
+/** There should be only one instance of this class. 
+    Results are written to a file "statistics.txt" in tab-separated format. */
+static class statistics {
+public:
+    statistics() {
+        my_file = fopen("statistics.txt","w");
+        if( !my_file ) {
+            perror("fopen(\"statistics.txt\"\")");
+            exit(1);
+        }
+        fprintf(my_file,"%13s\t%13s\t%13s\t%13s\t%13s\t%13s\n", "execute", "steal", "mail", "proxy_execute", "proxy_steal", "proxy_bypass" );
+    }
+    ~statistics() {
+        fclose(my_file);
+    }
+    void record( long execute_count, long steal_count, long mail_received_count, 
+                 long proxy_execute_count, long proxy_steal_count, long proxy_bypass_count ) {
+        mutex::scoped_lock lock(my_mutex);
+        fprintf (my_file,"%13ld\t%13ld\t%13ld\t%13ld\t%13ld\t%13ld\n", execute_count, steal_count, mail_received_count, 
+                                                           proxy_execute_count, proxy_steal_count, proxy_bypass_count );
+    }
+private:
+    //! File into which statistics are written.
+    FILE* my_file;
+    //! Mutex that serializes accesses to my_file
+    mutex my_mutex;
+} the_statistics;
+#endif /* STATISTICS */
+
+#if __TBB_EXCEPTIONS
+    struct scheduler_list_node_t {
+        scheduler_list_node_t *my_prev,
+                              *my_next;
+    };
+
+    //! Head of the list of master thread schedulers.
+    static scheduler_list_node_t the_scheduler_list_head;
+
+    //! Mutex protecting access to the list of schedulers.
+    static mutex the_scheduler_list_mutex;
+
+//! Counter that is incremented whenever new cancellation signal is sent to a task group.
+/** Together with GenericScheduler::local_cancel_count forms cross-thread signaling
+    mechanism that allows to avoid locking at the hot path of normal execution flow.
+
+    When a descendant task group context is being registered or unregistered,
+    the global and local counters are compared. If they differ, it means that 
+    a cancellation signal is being propagated, and registration/deregistration
+    routines take slower branch that may block (at most one thread of the pool
+    can be blocked at any moment). Otherwise the control path is lock-free and fast. **/
+    static uintptr_t global_cancel_count = 0;
+
+    //! Context to be associated with dummy tasks of worker threads schedulers.
+    /** It is never used for its direct purpose, and is introduced solely for the sake 
+        of avoiding one extra conditional branch in the end of wait_for_all method. **/
+    static task_group_context dummy_context(task_group_context::isolated);
+#endif /* __TBB_EXCEPTIONS */
+
+//------------------------------------------------------------------------
+// End of shared data layout
+//------------------------------------------------------------------------
+
+//! Amount of time to pause between steals.
+/** The default values below were found to be best empirically for K-Means
+    on the 32-way Altix and 4-way (*2 for HT) fxqlin04. */
+#if __TBB_ipf
+static const long PauseTime = 1500;
+#else 
+static const long PauseTime = 80;
+#endif
+
+//------------------------------------------------------------------------
+// One-time Initializations
+//------------------------------------------------------------------------
+
+//! Defined in cache_aligned_allocator.cpp
+extern void initialize_cache_aligned_allocator();
+
+#if DO_ITT_NOTIFY
+//! Performs initialization of tools support.
+/** Defined in itt_notify.cpp. Must be called in a protected do-once manner.
+    \return true if notification hooks were installed, false otherwise. **/
+bool InitializeITT();
+
+/** Thread-unsafe lazy one-time initialization of tools interop.
+    Used by both dummy handlers and general TBB one-time initialization routine. **/
+void ITT_DoUnsafeOneTimeInitialization () {
+    if ( !ITT_InitializationDone ) {
+        ITT_Present = InitializeITT();
+        ITT_InitializationDone = true;
+        ITT_SYNC_CREATE(&Governor::theArenaMutex, SyncType_GlobalLock, SyncObj_SchedulerInitialization);
+    }
+}
+
+/** Thread-safe lazy one-time initialization of tools interop.
+    Used by dummy handlers only. **/
+extern "C"
+void ITT_DoOneTimeInitialization() {
+    __TBB_InitOnce::lock();
+    ITT_DoUnsafeOneTimeInitialization();
+    __TBB_InitOnce::unlock();
+}
+#endif /* DO_ITT_NOTIFY */
+
+//! Performs thread-safe lazy one-time general TBB initialization.
+void DoOneTimeInitializations() {
+    __TBB_InitOnce::lock();
+    // No fence required for load of InitializationDone, because we are inside a critical section.
+    if( !__TBB_InitOnce::InitializationDone ) {
+        __TBB_InitOnce::add_ref();
+        if( GetBoolEnvironmentVariable("TBB_VERSION") )
+            PrintVersion();
+        bool have_itt = false;
+#if DO_ITT_NOTIFY
+        ITT_DoUnsafeOneTimeInitialization();
+        have_itt = ITT_Present;
+#endif /* DO_ITT_NOTIFY */
+        initialize_cache_aligned_allocator();
+        ::rml::factory::status_type status = rml_server_factory.open(); 
+        if( status!=::rml::factory::st_success ) {
+            use_private_rml = true;
+            PrintExtraVersionInfo( "RML", "private" );
+        } else {
+            PrintExtraVersionInfo( "RML", "shared" );
+            rml_server_factory.call_with_server_info( PrintRMLVersionInfo, (void*)"" );
+        }
+        if( !have_itt )
+            SchedulerTraitsId = IntelSchedulerTraits::id;
+#if __TBB_EXCEPTIONS
+        else {
+            ITT_SYNC_CREATE(&the_scheduler_list_mutex, SyncType_GlobalLock, SyncObj_SchedulersList);
+        }
+#endif /* __TBB_EXCEPTIONS */
+        PrintExtraVersionInfo( "SCHEDULER",
+                               SchedulerTraitsId==IntelSchedulerTraits::id ? "Intel" : "default" );
+#if __TBB_EXCEPTIONS
+        the_scheduler_list_head.my_next = &the_scheduler_list_head;
+        the_scheduler_list_head.my_prev = &the_scheduler_list_head;
+#endif /* __TBB_EXCEPTIONS */
+        __TBB_InitOnce::InitializationDone = true;
+    }
+    __TBB_InitOnce::unlock();
+}
+
+//------------------------------------------------------------------------
+// Methods of class __TBB_InitOnce
+//------------------------------------------------------------------------
+
+__TBB_InitOnce::~__TBB_InitOnce() { 
+    remove_ref();
+    // It is assumed that InitializationDone is not set after file-scope destructors start running,
+    // and thus no race on InitializationDone is possible.
+    if( initialization_done() ) {
+        // Remove reference that we added in DoOneTimeInitializations.
+        remove_ref();  
+    }
+} 
+
+void __TBB_InitOnce::acquire_resources() {
+    Governor::create_tls();
+}
+
+void __TBB_InitOnce::release_resources() {
+    rml_server_factory.close();
+    Governor::destroy_tls();
+}
+
+#if (_WIN32||_WIN64) && !__TBB_TASK_CPP_DIRECTLY_INCLUDED
+//! Windows "DllMain" that handles startup and shutdown of dynamic library.
+extern "C" bool WINAPI DllMain( HANDLE /*hinstDLL*/, DWORD reason, LPVOID /*lpvReserved*/ ) {
+    switch( reason ) {
+        case DLL_PROCESS_ATTACH:
+            __TBB_InitOnce::add_ref();
+            break;
+        case DLL_PROCESS_DETACH:
+            __TBB_InitOnce::remove_ref();
+            // It is assumed that InitializationDone is not set after DLL_PROCESS_DETACH,
+            // and thus no race on InitializationDone is possible.
+            if( __TBB_InitOnce::initialization_done() ) {
+                // Remove reference that we added in DoOneTimeInitializations.
+                __TBB_InitOnce::remove_ref();
+            }
+            break;
+        case DLL_THREAD_DETACH:
+            Governor::terminate_auto_initialized_scheduler();
+            break;
+    }
+    return true;
+}
+#endif /* (_WIN32||_WIN64) && !__TBB_TASK_CPP_DIRECTLY_INCLUDED */
+
+//------------------------------------------------------------------------
+// FastRandom
+//------------------------------------------------------------------------
+
+//! A fast random number generator.
+/** Uses linear congruential method. */
+class FastRandom {
+    unsigned x, a;
+public:
+    //! Get a random number.
+    unsigned short get() {
+        unsigned short r = x>>16;
+        x = x*a+1;
+        return r;
+    }
+    //! Construct a random number generator.
+    FastRandom( unsigned seed ) {
+        x = seed;
+        a = Primes[seed%(sizeof(Primes)/sizeof(Primes[0]))];
+    }
+};
+
+//------------------------------------------------------------------------
+// GenericScheduler
+//------------------------------------------------------------------------
+
+//  A pure virtual destructor should still have a body
+//  so the one for tbb::internal::scheduler::~scheduler() is provided here
+scheduler::~scheduler( ) {}
+
+    #define EmptyTaskPool ((task**)0u)
+    #define LockedTaskPool ((task**)~0u)
+
+    #define LocalSpawn local_spawn
+
+//! Cilk-style task scheduler.
+/** None of the fields here are every read or written by threads other than
+    the thread that creates the instance.
+
+    Class GenericScheduler is an abstract base class that contains most of the scheduler,
+    except for tweaks specific to processors and tools (e.g. VTune).
+    The derived template class CustomScheduler<SchedulerTraits> fills in the tweaks. */
+class GenericScheduler: public scheduler 
+   ,public ::rml::job
+{
+    friend class tbb::task;
+    friend class UnpaddedArenaPrefix;
+    friend class Arena;
+    friend class allocate_root_proxy;
+    friend class Governor;
+#if __TBB_EXCEPTIONS
+    friend class allocate_root_with_context_proxy;
+    friend class tbb::task_group_context;
+#endif /* __TBB_EXCEPTIONS */
+#if __TBB_SCHEDULER_OBSERVER
+    friend class task_scheduler_observer_v3;
+#endif /* __TBB_SCHEDULER_OBSERVER */
+    friend class scheduler;
+    template<typename SchedulerTraits> friend class internal::CustomScheduler;
+
+    //! If sizeof(task) is <=quick_task_size, it is handled on a free list instead of malloc'd.
+    static const size_t quick_task_size = 256-task_prefix_reservation_size;
+
+    //! Definitions for bits in task_prefix::extra_state
+    enum internal_state_t {
+        //! Tag for TBB <3.0 tasks.
+        es_version_2_task = 0,
+        //! Tag for TBB 3.0 tasks.
+        es_version_3_task = 1,
+        //! Tag for TBB 3.0 task_proxy.
+        es_task_proxy = 2,
+        //! Set if ref_count might be changed by another thread.  Used for debugging.
+        es_ref_count_active = 0x40
+    };
+    
+    static bool is_version_3_task( task& t ) {
+        return (t.prefix().extra_state & 0x3F)==0x1;
+    }
+
+    //! Position in the call stack specifying its maximal filling when stealing is still allowed
+    uintptr_t my_stealing_threshold;
+#if __TBB_ipf
+    //! Position in the RSE backup area specifying its maximal filling when stealing is still allowed
+    uintptr_t my_rsb_stealing_threshold;
+#endif
+
+    static const size_t null_arena_index = ~0u;
+
+    //! Index of the arena slot the scheduler occupies now, or occupied last time.
+    size_t arena_index;
+
+    //! Capacity of ready tasks deque (number of elements - pointers to task).
+    size_t task_pool_size;
+
+    //! Dummy slot used when scheduler is not in arena
+    /** Only its "head" and "tail" members are ever used. The scheduler uses 
+        the "task_pool" shortcut to access the task deque. **/
+    ArenaSlot dummy_slot;
+
+    //! Pointer to the slot in the arena we own at the moment.
+    /** When out of arena it points to this scheduler's dummy_slot. **/
+    mutable ArenaSlot* arena_slot;
+
+    bool in_arena () const { return arena_slot != &dummy_slot; }
+
+    bool is_local_task_pool_empty () {
+        return arena_slot->task_pool == EmptyTaskPool || arena_slot->head >= arena_slot->tail;
+    }
+
+    //! The arena that I own (if master) or belong to (if worker)
+    Arena* const arena;
+
+    //! Random number generator used for picking a random victim from which to steal.
+    FastRandom random;
+
+    //! Free list of small tasks that can be reused.
+    task* free_list;
+
+    //! Innermost task whose task::execute() is running.
+    task* innermost_running_task;
+
+    //! Fake root task created by slave threads.
+    /** The task is used as the "parent" argument to method wait_for_all. */
+    task* dummy_task;
+
+    //! Reference count for scheduler
+    /** Number of task_scheduler_init objects that point to this scheduler */
+    long ref_count;
+
+    mail_inbox inbox;
+
+    void attach_mailbox( affinity_id id ) {
+        __TBB_ASSERT(id>0,NULL);
+        inbox.attach( arena->mailbox(id) );
+        my_affinity_id = id;
+    }
+
+    //! The mailbox id assigned to this scheduler.
+    /** The id is assigned upon first entry into the arena.
+        TODO: how are id's being garbage collected? 
+        TODO: master thread may enter arena and leave and then reenter.
+                We want to give it the same affinity_id upon reentry, if practical.
+      */
+    affinity_id my_affinity_id;
+
+    /* A couple of bools can be located here because space is otherwise just padding after my_affinity_id. */
+
+    //! True if this is assigned to thread local storage by registering with Governor.
+    bool is_registered;
+
+    //! True if *this was created by automatic TBB initialization
+    bool is_auto_initialized;
+
+#if __TBB_SCHEDULER_OBSERVER
+    //! Last observer_proxy processed by this scheduler
+    observer_proxy* local_last_observer_proxy;
+
+    //! Notify any entry observers that have been created since the last call by this thread.
+    void notify_entry_observers() {
+        local_last_observer_proxy = observer_proxy::process_list(local_last_observer_proxy,is_worker(),/*is_entry=*/true);
+    }
+ 
+    //! Notify all exit observers that this thread is no longer participating in task scheduling.
+    void notify_exit_observers( bool is_worker ) {
+        observer_proxy::process_list(local_last_observer_proxy,is_worker,/*is_entry=*/false);
+    }
+#endif /* __TBB_SCHEDULER_OBSERVER */
+
+#if COUNT_TASK_NODES
+    //! Net number of big task objects that have been allocated but not yet freed.
+    intptr task_node_count;
+#endif /* COUNT_TASK_NODES */
+
+#if STATISTICS
+    long current_active;
+    long current_length;
+    //! Number of big tasks that have been malloc'd.
+    /** To find total number of tasks malloc'd, compute (current_big_malloc+small_task_count) */
+    long current_big_malloc;
+    long execute_count;
+    //! Number of tasks stolen
+    long steal_count;
+    //! Number of tasks received from mailbox
+    long mail_received_count;
+    long proxy_execute_count;
+    long proxy_steal_count;
+    long proxy_bypass_count;
+#endif /* STATISTICS */
+
+    //! Sets up the data necessary for the stealing limiting heuristics
+    void init_stack_info ();
+
+    //! Returns true if stealing is allowed
+    bool can_steal () {
+        int anchor;
+#if __TBB_ipf
+        return my_stealing_threshold < (uintptr_t)&anchor && (uintptr_t)__TBB_get_bsp() < my_rsb_stealing_threshold;
+#else
+        return my_stealing_threshold < (uintptr_t)&anchor;
+#endif
+    }
+
+    //! Actions common to enter_arena and try_enter_arena
+    void do_enter_arena();
+
+    //! Used by workers to enter the arena 
+    /** Does not lock the task pool in case if arena slot has been successfully grabbed. **/
+    void enter_arena();
+
+    //! Used by masters to try to enter the arena
+    /** Does not lock the task pool in case if arena slot has been successfully grabbed. **/
+    void try_enter_arena();
+
+    //! Leave the arena
+    void leave_arena();
+
+    //! Locks victim's task pool, and returns pointer to it. The pointer can be NULL.
+    task** lock_task_pool( ArenaSlot* victim_arena_slot ) const;
+
+    //! Unlocks victim's task pool
+    void unlock_task_pool( ArenaSlot* victim_arena_slot, task** victim_task_pool ) const;
+
+
+    //! Locks the local task pool
+    void acquire_task_pool() const;
+
+    //! Unlocks the local task pool
+    void release_task_pool() const;
+
+    //! Get a task from the local pool.
+    //! Checks if t is affinitized to another thread, and if so, bundles it as proxy.
+    /** Returns either t or proxy containing t. **/
+    task* prepare_for_spawning( task* t );
+
+    /** Called only by the pool owner.
+        Returns the pointer to the task or NULL if the pool is empty. 
+        In the latter case compacts the pool. **/
+    task* get_task();
+
+    //! Attempt to get a task from the mailbox.
+    /** Called only by the thread that owns *this.
+        Gets a task only if there is one not yet executed by another thread.
+        If successful, unlinks the task and returns a pointer to it.
+        Otherwise returns NULL. */
+    task* get_mailbox_task();
+
+    //! True if t is a task_proxy
+    static bool is_proxy( const task& t ) {
+        return t.prefix().extra_state==es_task_proxy;
+    }
+
+    //! Extracts task pointer from task_proxy, and frees the proxy.
+    /** Return NULL if underlying task was claimed by mailbox. */
+    task* strip_proxy( task_proxy* result );
+
+    //! Steal task from another scheduler's ready pool.
+    task* steal_task( ArenaSlot& victim_arena_slot );
+
+    /** Initial size of the task deque sufficient to serve without reallocation
+        4 nested paralle_for calls with iteration space of 65535 grains each. **/
+    static const size_t min_task_pool_size = 64;
+
+    //! Allocate task pool containing at least n elements.
+    task** allocate_task_pool( size_t n );
+
+    //! Deallocate task pool that was allocated by means of allocate_task_pool.
+    static void free_task_pool( task** pool ) {
+        __TBB_ASSERT( pool, "attempt to free NULL TaskPool" );
+        NFS_Free( pool );
+    }
+
+    //! Grow ready task deque to at least n elements.
+    void grow( size_t n );
+
+    //! Initialize a scheduler for a master thread.
+    static GenericScheduler* create_master( Arena* a );
+
+    //! Perform necessary cleanup when a master thread stops using TBB.
+    void cleanup_master();
+
+    //! Initialize a scheduler for a worker thread.
+    static GenericScheduler* create_worker( Arena& a, size_t index );
+
+
+    //! Top-level routine for worker threads
+    /** Argument arg is a WorkerDescriptor*, cast to a (void*). */
+    static thread_routine_return_type __TBB_THREAD_ROUTINE worker_routine( void* arg );
+
+    //! Perform necessary cleanup when a worker thread finishes.
+    static void cleanup_worker( void* arg );
+
+protected:
+    GenericScheduler( Arena* arena );
+
+#if TBB_USE_ASSERT
+    //! Check that internal data structures are in consistent state.
+    /** Raises __TBB_ASSERT failure if inconsistency is found. */
+    bool assert_okay() const;
+#endif /* TBB_USE_ASSERT */
+
+public:
+    void local_spawn( task& first, task*& next );
+    void local_spawn_root_and_wait( task& first, task*& next );
+
+    /*override*/ 
+    void spawn( task& first, task*& next ) {
+        Governor::local_scheduler()->local_spawn( first, next );
+    }
+    /*override*/ 
+    void spawn_root_and_wait( task& first, task*& next ) {
+        Governor::local_scheduler()->local_spawn_root_and_wait( first, next );
+    }
+
+    //! Allocate and construct a scheduler object.
+    static GenericScheduler* allocate_scheduler( Arena* arena );
+
+    //! Destroy and deallocate scheduler that was created with method allocate.
+    void free_scheduler();
+
+    //! Allocate task object, either from the heap or a free list.
+    /** Returns uninitialized task object with initialized prefix. */
+    task& allocate_task( size_t number_of_bytes, 
+                       __TBB_CONTEXT_ARG(task* parent, task_group_context* context) );
+
+    //! Optimization hint to free_task that enables it omit unnecessary tests and code.
+    enum hint {
+        //! No hint 
+        no_hint=0,
+        //! Task is known to have been allocated by this scheduler
+        is_local=1,
+        //! Task is known to be a small task.
+        /** Task should be returned to the free list of *some* scheduler, possibly not this scheduler. */
+        is_small=2,
+        //! Bitwise-OR of is_local and is_small.  
+        /** Task should be returned to free list of this scheduler. */
+        is_small_local=3
+    };
+
+    //! Put task on free list.
+    /** Does not call destructor. */
+    template<hint h>
+    void free_task( task& t );
+
+    void free_task_proxy( task_proxy& tp ) {
+#if TBB_USE_ASSERT
+        poison_pointer( tp.outbox );
+        poison_pointer( tp.next_in_mailbox );
+        tp.task_and_tag = 0xDEADBEEF;
+#endif /* TBB_USE_ASSERT */
+        free_task<is_small>(tp);
+    }
+
+    //! Return task object to the memory allocator.
+    void deallocate_task( task& t ) {
+#if TBB_USE_ASSERT
+        task_prefix& p = t.prefix();
+        p.state = 0xFF;
+        p.extra_state = 0xFF; 
+        poison_pointer(p.next);
+#endif /* TBB_USE_ASSERT */
+        NFS_Free((char*)&t-task_prefix_reservation_size);
+#if COUNT_TASK_NODES
+        task_node_count -= 1;
+#endif /* COUNT_TASK_NODES */
+    }
+
+    //! True if running on a worker thread, false otherwise.
+    inline bool is_worker() {
+        return arena_index < arena->prefix().number_of_workers;
+    }
+
+#if TEST_ASSEMBLY_ROUTINES
+    /** Defined in test_assembly.cpp */
+    void test_assembly_routines();
+#endif /* TEST_ASSEMBLY_ROUTINES */
+
+#if COUNT_TASK_NODES
+    intptr get_task_node_count( bool count_arena_workers = false ) {
+        return task_node_count + (count_arena_workers? arena->workers_task_node_count(): 0);
+    }
+#endif /* COUNT_TASK_NODES */
+
+    //! Special value used to mark return_list as not taking any more entries.
+    static task* plugged_return_list() {return (task*)(intptr)(-1);}
+
+    //! Number of small tasks that have been allocated by this scheduler. 
+    intptr small_task_count;
+
+    //! List of small tasks that have been returned to this scheduler by other schedulers.
+    task* return_list;
+
+    //! Free a small task t that that was allocated by a different scheduler 
+    void free_nonlocal_small_task( task& t ); 
+
+#if __TBB_EXCEPTIONS
+    //! Padding isolating thread local members from members that can be written to by other threads.
+    char _padding1[NFS_MaxLineSize - sizeof(context_list_node_t)];
+
+    //! Head of the thread specific list of task group contexts.
+    context_list_node_t context_list_head;
+
+    //! Mutex protecting access to the list of task group contexts.
+    spin_mutex context_list_mutex;
+
+    //! Used to form the list of master thread schedulers.
+    scheduler_list_node_t my_node;
+
+    //! Thread local counter of cancellation requests.
+    /** When this counter equals global_cancel_count, the cancellation state known
+        to this thread is synchronized with the global cancellation state.
+        \sa #global_cancel_count **/
+    uintptr_t local_cancel_count;
+
+    //! Propagates cancellation request to all descendants of the argument context.
+    void propagate_cancellation ( task_group_context* ctx );
+
+    //! Propagates cancellation request to contexts registered by this scheduler.
+    void propagate_cancellation ();
+#endif /* __TBB_EXCEPTIONS */
+}; // class GenericScheduler
+
+//------------------------------------------------------------------------
+// auto_empty_task
+//------------------------------------------------------------------------
+
+//! Smart holder for the empty task class with automatic destruction
+class auto_empty_task {
+    task* my_task;
+    GenericScheduler* my_scheduler;
+public:
+    auto_empty_task ( __TBB_CONTEXT_ARG(GenericScheduler *s, task_group_context* context) ) 
+        : my_task( new(&s->allocate_task(sizeof(empty_task), __TBB_CONTEXT_ARG(NULL, context))) empty_task )
+        , my_scheduler(s)
+    {}
+    // empty_task has trivial destructor, so there's no need to call it.
+    ~auto_empty_task () { my_scheduler->free_task<GenericScheduler::is_small_local>(*my_task); }
+
+    operator task& () { return *my_task; }
+    task* operator & () { return my_task; }
+    task_prefix& prefix () { return my_task->prefix(); }
+}; // class auto_empty_task
+
+//------------------------------------------------------------------------
+// Methods of class Governor that need full definition of GenericScheduler
+//------------------------------------------------------------------------
+
+void Governor::sign_on(GenericScheduler* s) {
+    __TBB_ASSERT( !s->is_registered, NULL );  
+    s->is_registered = true;
+    __TBB_InitOnce::add_ref();
+    theTLS.set(s);
+}
+
+void Governor::sign_off(GenericScheduler* s) {
+    if( s->is_registered ) {
+#if USE_PTHREAD
+        __TBB_ASSERT( theTLS.get()==s || (!s->is_worker() && !theTLS.get()), "attempt to unregister a wrong scheduler instance" );
+#else
+        __TBB_ASSERT( theTLS.get()==s, "attempt to unregister a wrong scheduler instance" );
+#endif /* USE_PTHREAD */
+        theTLS.set(NULL);
+        s->is_registered = false;
+        __TBB_InitOnce::remove_ref();
+    }
+}
+
+GenericScheduler* Governor::init_scheduler( int num_threads, stack_size_type stack_size, bool auto_init ) {
+    if( !__TBB_InitOnce::initialization_done() )
+        DoOneTimeInitializations();
+    GenericScheduler* s = theTLS.get();
+    if( s ) {
+        s->ref_count += 1;
+        return s;
+    }
+    s = GenericScheduler::create_master( obtain_arena(num_threads, stack_size) );
+    __TBB_ASSERT(s, "Somehow a local scheduler creation for a master thread failed");
+    s->is_auto_initialized = auto_init;
+    return s;
+}
+
+void Governor::terminate_scheduler( GenericScheduler* s ) {
+    __TBB_ASSERT( s == theTLS.get(), "Attempt to terminate non-local scheduler instance" );
+    if( !--(s->ref_count) )
+        s->cleanup_master();
+}
+
+void Governor::auto_terminate(void* arg){
+    GenericScheduler* s = static_cast<GenericScheduler*>(arg);
+    if( s && s->is_auto_initialized ) {
+        if( !--(s->ref_count) ) {
+            if ( !theTLS.get() && !s->is_local_task_pool_empty() ) {
+                // This thread's TLS slot is already cleared. But in order to execute
+                // remaining tasks cleanup_master() will need TLS correctly set.
+                // So we temporarily restore its value.
+                theTLS.set(s);
+                s->cleanup_master();
+                theTLS.set(NULL);
+            }
+            else
+                s->cleanup_master();
+        }
+    }
+}
+
+//------------------------------------------------------------------------
+// GenericScheduler implementation
+//------------------------------------------------------------------------
+
+inline task& GenericScheduler::allocate_task( size_t number_of_bytes, 
+                                            __TBB_CONTEXT_ARG(task* parent, task_group_context* context) ) {
+    GATHER_STATISTIC(current_active+=1);
+    task* t = free_list;
+    if( number_of_bytes<=quick_task_size ) {
+        if( t ) {
+            GATHER_STATISTIC(current_length-=1);
+            __TBB_ASSERT( t->state()==task::freed, "free list of tasks is corrupted" );
+            free_list = t->prefix().next;
+        } else if( return_list ) {
+            // No fence required for read of return_list above, because __TBB_FetchAndStoreW has a fence.
+            t = (task*)__TBB_FetchAndStoreW( &return_list, 0 );
+            __TBB_ASSERT( t, "another thread emptied the return_list" );
+            __TBB_ASSERT( t->prefix().origin==this, "task returned to wrong return_list" );
+            ITT_NOTIFY( sync_acquired, &return_list );
+            free_list = t->prefix().next;
+        } else {
+            t = (task*)((char*)NFS_Allocate( task_prefix_reservation_size+quick_task_size, 1, NULL ) + task_prefix_reservation_size );
+#if COUNT_TASK_NODES
+            ++task_node_count;
+#endif /* COUNT_TASK_NODES */
+            t->prefix().origin = this;
+            ++small_task_count;
+        }
+    } else {
+        GATHER_STATISTIC(current_big_malloc+=1);
+        t = (task*)((char*)NFS_Allocate( task_prefix_reservation_size+number_of_bytes, 1, NULL ) + task_prefix_reservation_size );
+#if COUNT_TASK_NODES
+        ++task_node_count;
+#endif /* COUNT_TASK_NODES */
+        t->prefix().origin = NULL;
+    }
+    task_prefix& p = t->prefix();
+#if __TBB_EXCEPTIONS
+    p.context = context;
+#endif /* __TBB_EXCEPTIONS */
+    p.owner = this;
+    p.ref_count = 0;
+    // Assign some not outrageously out-of-place value for a while
+    p.depth = 0;
+    p.parent = parent;
+    // In TBB 3.0 and later, the constructor for task sets extra_state to indicate the version of the tbb/task.h header.
+    // In TBB 2.0 and earlier, the constructor leaves extra_state as zero.
+    p.extra_state = 0;
+    p.affinity = 0;
+    p.state = task::allocated;
+    return *t;
+}
+
+template<GenericScheduler::hint h>
+inline void GenericScheduler::free_task( task& t ) {
+    GATHER_STATISTIC(current_active-=1);
+    task_prefix& p = t.prefix();
+    // Verify that optimization hints are correct.
+    __TBB_ASSERT( h!=is_small_local || p.origin==this, NULL );
+    __TBB_ASSERT( !(h&is_small) || p.origin, NULL );
+#if TBB_USE_ASSERT
+    p.depth = 0xDEADBEEF;
+    p.ref_count = 0xDEADBEEF;
+    poison_pointer(p.owner);
+#endif /* TBB_USE_ASSERT */
+    __TBB_ASSERT( 1L<<t.state() & (1L<<task::executing|1L<<task::allocated), NULL );
+    p.state = task::freed;
+    if( h==is_small_local || p.origin==this ) {
+        GATHER_STATISTIC(current_length+=1);
+        p.next = free_list;
+        free_list = &t;
+    } else if( !(h&is_local) && p.origin ) {
+        free_nonlocal_small_task(t);
+    } else {
+        deallocate_task(t);
+    }
+}
+
+void GenericScheduler::free_nonlocal_small_task( task& t ) {
+    __TBB_ASSERT( t.state()==task::freed, NULL );
+    GenericScheduler& s = *static_cast<GenericScheduler*>(t.prefix().origin);
+    __TBB_ASSERT( &s!=this, NULL );
+    for(;;) {
+        task* old = s.return_list;
+        if( old==plugged_return_list() ) 
+            break;
+        // Atomically insert t at head of s.return_list
+        t.prefix().next = old; 
+        ITT_NOTIFY( sync_releasing, &s.return_list );
+        if( __TBB_CompareAndSwapW( &s.return_list, (intptr)&t, (intptr)old )==(intptr)old ) 
+            return;
+    }
+    deallocate_task(t);
+    if( __TBB_FetchAndDecrementWrelease( &s.small_task_count )==1 ) {
+        // We freed the last task allocated by scheduler s, so it's our responsibility
+        // to free the scheduler.
+        NFS_Free( &s );
+    }
+}
+
+//------------------------------------------------------------------------
+// CustomScheduler
+//------------------------------------------------------------------------
+
+//! A scheduler with a customized evaluation loop.
+/** The customization can use SchedulerTraits to make decisions without needing a run-time check. */
+template<typename SchedulerTraits>
+class CustomScheduler: private GenericScheduler {
+    //! Scheduler loop that dispatches tasks.
+    /** If child is non-NULL, it is dispatched first.
+        Then, until "parent" has a reference count of 1, other task are dispatched or stolen. */
+    void local_wait_for_all( task& parent, task* child );
+
+    /*override*/
+    void wait_for_all( task& parent, task* child ) {
+        static_cast<CustomScheduler*>(Governor::local_scheduler())->local_wait_for_all( parent, child );
+    }
+
+    typedef CustomScheduler<SchedulerTraits> scheduler_type;
+
+    //! Construct a CustomScheduler
+    CustomScheduler( Arena* arena ) : GenericScheduler(arena) {}
+
+    static bool tally_completion_of_one_predecessor( task& s ) {
+        task_prefix& p = s.prefix();
+        if( SchedulerTraits::itt_possible )
+            ITT_NOTIFY(sync_releasing, &p.ref_count);
+        if( SchedulerTraits::has_slow_atomic && p.ref_count==1 ) {
+            p.ref_count=0;
+        } else {
+            reference_count k = __TBB_FetchAndDecrementWrelease(&p.ref_count);
+            __TBB_ASSERT( k>0, "completion of task caused parent's reference count to underflow" );
+            if( k!=1 ) 
+                return false;
+        }
+        if( SchedulerTraits::itt_possible )
+            ITT_NOTIFY(sync_acquired, &p.ref_count);
+        return true;
+    }
+
+public:
+    static GenericScheduler* allocate_scheduler( Arena* arena ) {
+        __TBB_ASSERT( arena, "missing arena" );
+        scheduler_type* s = (scheduler_type*)NFS_Allocate(sizeof(scheduler_type),1,NULL);
+        new( s ) scheduler_type(  arena );
+        __TBB_ASSERT( s->assert_okay(), NULL );
+        ITT_SYNC_CREATE(s, SyncType_Scheduler, SyncObj_TaskPoolSpinning);
+        return s;
+    }
+};
+
+//------------------------------------------------------------------------
+// AssertOkay
+//------------------------------------------------------------------------
+#if TBB_USE_ASSERT
+/** Logically, this method should be a member of class task.
+    But we do not want to publish it, so it is here instead. */
+static bool AssertOkay( const task& task ) {
+    __TBB_ASSERT( &task!=NULL, NULL );
+    __TBB_ASSERT( (uintptr)&task % task_alignment == 0, "misaligned task" );
+    __TBB_ASSERT( (unsigned)task.state()<=(unsigned)task::recycle, "corrupt task (invalid state)" );
+    return true;
+}
+#endif /* TBB_USE_ASSERT */
+
+//------------------------------------------------------------------------
+// Methods of Arena
+//------------------------------------------------------------------------
+Arena* Arena::allocate_arena( unsigned number_of_slots, unsigned number_of_workers, stack_size_type stack_size) {
+    __TBB_ASSERT( sizeof(ArenaPrefix) % NFS_GetLineSize()==0, "ArenaPrefix not multiple of cache line size" );
+    __TBB_ASSERT( sizeof(mail_outbox)==NFS_MaxLineSize, NULL );
+    size_t n = sizeof(ArenaPrefix) + number_of_slots*(sizeof(mail_outbox)+sizeof(ArenaSlot));
+
+    unsigned char* storage = (unsigned char*)NFS_Allocate( n, 1, NULL );
+    memset( storage, 0, n );
+    Arena* a = (Arena*)(storage + sizeof(ArenaPrefix)+ number_of_slots*(sizeof(mail_outbox)));
+    __TBB_ASSERT( sizeof(a->slot[0]) % NFS_GetLineSize()==0, "Arena::slot size not multiple of cache line size" );
+    __TBB_ASSERT( (uintptr)a % NFS_GetLineSize()==0, NULL );
+    new( &a->prefix() ) ArenaPrefix( number_of_slots, number_of_workers );
+
+    // Allocate the worker_list
+    WorkerDescriptor * w = new WorkerDescriptor[number_of_workers];
+    memset( w, 0, sizeof(WorkerDescriptor)*(number_of_workers));
+    a->prefix().worker_list = w;
+
+#if TBB_USE_ASSERT
+    // Verify that earlier memset initialized the mailboxes.
+    for( unsigned j=1; j<=number_of_slots; ++j ) {
+        a->mailbox(j).assert_is_initialized();
+    }
+#endif /* TBB_USE_ASSERT */
+
+    a->prefix().stack_size = stack_size;
+    size_t k;
+    // Mark each worker slot as locked and unused
+    for( k=0; k<number_of_workers; ++k ) {
+        // All slots are set to null meaning that they are free
+        ITT_SYNC_CREATE(a->slot + k, SyncType_Scheduler, SyncObj_WorkerTaskPool);
+        ITT_SYNC_CREATE(&w[k].scheduler, SyncType_Scheduler, SyncObj_WorkerLifeCycleMgmt);
+        ITT_SYNC_CREATE(&a->mailbox(k+1), SyncType_Scheduler, SyncObj_Mailbox);
+    }
+    // Mark rest of slots as unused
+    for( ; k<number_of_slots; ++k ) {
+        ITT_SYNC_CREATE(a->slot + k, SyncType_Scheduler, SyncObj_MasterTaskPool);
+        ITT_SYNC_CREATE(&a->mailbox(k+1), SyncType_Scheduler, SyncObj_Mailbox);
+    }
+
+    return a;
+}
+
+inline void Arena::mark_pool_full() {
+    // Double-check idiom that is deliberately sloppy about memory fences.
+    // Technically, to avoid missed wakeups, there should be a full memory fence between the point we 
+    // released the task pool (i.e. spawned task) and read the gate's state.  However, adding such a 
+    // fence might hurt overall performance more than it helps, because the fence would be executed 
+    // on every task pool release, even when stealing does not occur.  Since TBB allows parallelism, 
+    // but never promises parallelism, the missed wakeup is not a correctness problem.
+    pool_state_t snapshot = prefix().pool_state;
+    if( is_busy_or_empty(snapshot) ) {
+        // Attempt to mark as full.  The compare_and_swap below is a little unusual because the 
+        // result is compared to a value that can be different than the comparand argument.
+        if( prefix().pool_state.compare_and_swap( SNAPSHOT_FULL, snapshot )==SNAPSHOT_EMPTY ) {
+            if( snapshot!=SNAPSHOT_EMPTY ) {
+                // This thread initialized s1 to "busy" and then another thread transitioned 
+                // pool_state to "empty" in the meantime, which caused the compare_and_swap above 
+                // to fail.  Attempt to transition pool_state from "empty" to "full".
+                if( prefix().pool_state.compare_and_swap( SNAPSHOT_FULL, SNAPSHOT_EMPTY )!=SNAPSHOT_EMPTY ) {
+                    // Some other thread transitioned pool_state from "empty", and hence became
+                    // responsible for waking up workers.
+                    return;
+                }
+            }
+            // This thread transitioned pool from empty to full state, and thus is responsible for
+            // telling RML that there is work to do.
+            prefix().server->adjust_job_count_estimate( int(prefix().number_of_workers) );
+        }
+    }
+}
+
+bool Arena::check_if_pool_is_empty() 
+{
+    for(;;) {
+        pool_state_t snapshot = prefix().pool_state;
+        switch( snapshot ) {
+            case SNAPSHOT_EMPTY:
+            case SNAPSHOT_SERVER_GOING_AWAY:
+                return true;
+            case SNAPSHOT_FULL: {
+                // Use unique id for "busy" in order to avoid ABA problems.
+                const pool_state_t busy = pool_state_t(this);
+                // Request permission to take snapshot
+                if( prefix().pool_state.compare_and_swap( busy, SNAPSHOT_FULL )==SNAPSHOT_FULL ) {
+                    // Got permission.  Take the snapshot.
+                    size_t n = prefix().limit;
+                    size_t k; 
+                    for( k=0; k<n; ++k ) 
+                        if( slot[k].task_pool != EmptyTaskPool && slot[k].head < slot[k].tail )
+                            break;
+                    // Test and test-and-set.
+                    if( prefix().pool_state==busy ) {
+                        if( k>=n ) {
+                            if( prefix().pool_state.compare_and_swap( SNAPSHOT_EMPTY, busy )==busy ) {
+                                // This thread transitioned pool to empty state, and thus is responsible for
+                                // telling RML that there is no other work to do.
+                                prefix().server->adjust_job_count_estimate( -int(prefix().number_of_workers) );
+                                return true;
+                            }
+                        } else {
+                            // Undo previous transition SNAPSHOT_FULL-->busy, unless another thread undid it.
+                            prefix().pool_state.compare_and_swap( SNAPSHOT_FULL, busy );
+                        }
+                    }
+                } 
+                return false;
+            }
+            default:
+                // Another thread is taking a snapshot.
+                return false;
+        }
+    }
+}
+
+void Arena::terminate_workers() {
+    for(;;) {
+        pool_state_t snapshot = prefix().pool_state;
+        if( snapshot==SNAPSHOT_SERVER_GOING_AWAY ) 
+            break;
+        if( prefix().pool_state.compare_and_swap( SNAPSHOT_SERVER_GOING_AWAY, snapshot )==snapshot ) {
+            if( snapshot!=SNAPSHOT_EMPTY )
+                prefix().server->adjust_job_count_estimate( -int(prefix().number_of_workers) );
+            break;
+        }
+    }
+    prefix().server->request_close_connection();
+}
+
+
+#if COUNT_TASK_NODES
+intptr Arena::workers_task_node_count() {
+    intptr result = 0;
+    for( unsigned i=0; i<prefix().number_of_workers; ++i ) {
+        GenericScheduler* s = prefix().worker_list[i].scheduler;
+        if( s )
+            result += s->task_node_count;
+    }
+    return result;
+}
+#endif
+
+//------------------------------------------------------------------------
+// Methods of GenericScheduler
+//------------------------------------------------------------------------
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+    // Suppress overzealous compiler warning about using 'this' in base initializer list. 
+    #pragma warning(push)
+    #pragma warning(disable:4355)
+#endif
+
+GenericScheduler::GenericScheduler( Arena* arena_ ) :
+    arena_index(null_arena_index),
+    task_pool_size(0),
+    arena_slot(&dummy_slot),
+    arena(arena_),
+    random( unsigned(this-(GenericScheduler*)NULL) ),
+    free_list(NULL),
+    innermost_running_task(NULL),
+    dummy_task(NULL),
+    ref_count(1),
+    my_affinity_id(0),
+    is_registered(false),
+    is_auto_initialized(false),
+#if __TBB_SCHEDULER_OBSERVER
+    local_last_observer_proxy(NULL),
+#endif /* __TBB_SCHEDULER_OBSERVER */
+#if COUNT_TASK_NODES
+    task_node_count(0),
+#endif /* COUNT_TASK_NODES */
+#if STATISTICS
+    current_active(0),
+    current_length(0),
+    current_big_malloc(0),
+    execute_count(0),
+    steal_count(0),
+    mail_received_count(0),
+    proxy_execute_count(0),
+    proxy_steal_count(0),
+    proxy_bypass_count(0),
+#endif /* STATISTICS */
+    small_task_count(1),   // Extra 1 is a guard reference
+    return_list(NULL)
+{
+    dummy_slot.task_pool = allocate_task_pool( min_task_pool_size );
+    dummy_slot.head = dummy_slot.tail = 0;
+    dummy_task = &allocate_task( sizeof(task), __TBB_CONTEXT_ARG(NULL, NULL) );
+#if __TBB_EXCEPTIONS
+    context_list_head.my_prev = &context_list_head;
+    context_list_head.my_next = &context_list_head;
+    ITT_SYNC_CREATE(&context_list_mutex, SyncType_Scheduler, SyncObj_ContextsList);
+#endif /* __TBB_EXCEPTIONS */
+    dummy_task->prefix().ref_count = 2;
+    ITT_SYNC_CREATE(&dummy_task->prefix().ref_count, SyncType_Scheduler, SyncObj_WorkerLifeCycleMgmt);
+    ITT_SYNC_CREATE(&return_list, SyncType_Scheduler, SyncObj_TaskReturnList);
+    __TBB_ASSERT( assert_okay(), "constructor error" );
+}
+
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+    #pragma warning(pop)
+#endif // warning 4355 is back
+
+#if TBB_USE_ASSERT
+bool GenericScheduler::assert_okay() const {
+#if TBB_USE_ASSERT>=2||TEST_ASSEMBLY_ROUTINES
+    acquire_task_pool();
+    task** tp = dummy_slot.task_pool;
+    __TBB_ASSERT( task_pool_size >= min_task_pool_size, NULL );
+    __TBB_ASSERT( arena_slot->head <= arena_slot->tail, NULL );
+    for ( size_t i = arena_slot->head; i < arena_slot->tail; ++i ) {
+        __TBB_ASSERT( (uintptr_t)tp[i] + 1 > 1u, "nil or invalid task pointer in the deque" );
+        __TBB_ASSERT( tp[i]->prefix().state == task::ready ||
+                      tp[i]->prefix().extra_state == es_task_proxy, "task in the deque has invalid state" );
+    }
+    release_task_pool();
+#endif /* TBB_USE_ASSERT>=2||TEST_ASSEMBLY_ROUTINES */
+    return true;
+}
+#endif /* TBB_USE_ASSERT */
+
+#if __TBB_EXCEPTIONS
+
+void GenericScheduler::propagate_cancellation () {
+    spin_mutex::scoped_lock lock(context_list_mutex);
+    // Acquire fence is necessary to ensure that the subsequent node->my_next load 
+    // returned the correct value in case it was just inserted in another thread.
+    // The fence also ensures visibility of the correct my_parent value.
+    context_list_node_t *node = __TBB_load_with_acquire(context_list_head.my_next);
+    while ( node != &context_list_head ) {
+        task_group_context *ctx = __TBB_get_object_addr(task_group_context, my_node, node);
+            // The absence of acquire fence while reading my_cancellation_requested may result 
+            // in repeated traversals of the same parents chain if another group (precedent or 
+            // descendant) belonging to the tree being canceled sends cancellation request of 
+            // its own around the same time.
+        if ( !ctx->my_cancellation_requested )
+            ctx->propagate_cancellation_from_ancestors();
+        node = node->my_next;
+        __TBB_ASSERT( ctx->is_alive(), "Walked into a destroyed context while propagating cancellation" );
+    }
+}
+
+/** Propagates cancellation down the tree of dependent contexts by walking each 
+    thread's local list of contexts **/
+void GenericScheduler::propagate_cancellation ( task_group_context* ctx ) {
+    __TBB_ASSERT ( ctx->my_cancellation_requested, "No cancellation request in the context" );
+    // The whole propagation algorithm is under the lock in order to ensure correctness 
+    // in case of parallel cancellations at the different levels of the context tree.
+    // See the note 2 at the bottom of the file.
+    mutex::scoped_lock lock(the_scheduler_list_mutex);
+    // Advance global cancellation state
+    __TBB_FetchAndAddWrelease(&global_cancel_count, 1);
+    // First propagate to workers using arena to access their context lists
+    size_t num_workers = arena->prefix().number_of_workers;
+    for ( size_t i = 0; i < num_workers; ++i ) {
+        // No fence is necessary here since the context list of worker's scheduler 
+        // can contain anything of interest only after the first stealing was done
+        // by that worker. And doing it applies the necessary fence
+        GenericScheduler *s = arena->prefix().worker_list[i].scheduler;
+        // If the worker is in the middle of its startup sequence, skip it.
+        if ( s )
+            s->propagate_cancellation();
+    }
+    // Then propagate to masters using the global list of master's schedulers
+    scheduler_list_node_t *node = the_scheduler_list_head.my_next;
+    while ( node != &the_scheduler_list_head ) {
+        __TBB_get_object_addr(GenericScheduler, my_node, node)->propagate_cancellation();
+        node = node->my_next;
+    }
+    // Now sync up the local counters
+    for ( size_t i = 0; i < num_workers; ++i ) {
+        GenericScheduler *s = arena->prefix().worker_list[i].scheduler;
+        // If the worker is in the middle of its startup sequence, skip it.
+        if ( s )
+            s->local_cancel_count = global_cancel_count;
+    }
+    node = the_scheduler_list_head.my_next;
+    while ( node != &the_scheduler_list_head ) {
+        __TBB_get_object_addr(GenericScheduler, my_node, node)->local_cancel_count = global_cancel_count;
+        node = node->my_next;
+    }
+}
+#endif /* __TBB_EXCEPTIONS */
+
+
+
+void GenericScheduler::init_stack_info () {
+    // Stacks are growing top-down. Highest address is called "stack base", 
+    // and the lowest is "stack limit".
+#if USE_WINTHREAD
+#if defined(_MSC_VER)&&_MSC_VER<1400 && !_WIN64
+    NT_TIB  *pteb = (NT_TIB*)__TBB_machine_get_current_teb();
+#else
+    NT_TIB  *pteb = (NT_TIB*)NtCurrentTeb();
+#endif
+    __TBB_ASSERT( &pteb < pteb->StackBase && &pteb > pteb->StackLimit, "invalid stack info in TEB" );
+    __TBB_ASSERT( arena->prefix().stack_size>0, "stack_size not initialized?" );
+    // When a thread is created with the attribute STACK_SIZE_PARAM_IS_A_RESERVATION, stack limit 
+    // in the TIB points to the committed part of the stack only. This renders the expression
+    // "(uintptr_t)pteb->StackBase / 2 + (uintptr_t)pteb->StackLimit / 2" virtually useless.
+    // Thus for worker threads we use the explicit stack size we used while creating them.
+    // And for master threads we rely on the following fact and assumption:
+    // - the default stack size of a master thread on Windows is 1M;
+    // - if it was explicitly set by the application it is at least as large as the size of a worker stack.
+    if ( is_worker() || arena->prefix().stack_size < MByte )
+        my_stealing_threshold = (uintptr_t)pteb->StackBase - arena->prefix().stack_size / 2;
+    else
+        my_stealing_threshold = (uintptr_t)pteb->StackBase - MByte / 2;
+#else /* USE_PTHREAD */
+    // There is no portable way to get stack base address in Posix, so we use 
+    // non-portable method (on all modern Linux) or the simplified approach 
+    // based on the common sense assumptions. The most important assumption 
+    // is that the main thread's stack size is not less than that of other threads.
+    size_t  stack_size = arena->prefix().stack_size;
+    void    *stack_base = &stack_size;
+#if __TBB_ipf
+    void    *rsb_base = __TBB_get_bsp();
+#endif
+#if __linux__
+    size_t  np_stack_size = 0;
+    void    *stack_limit = NULL;
+    pthread_attr_t  attr_stack, np_attr_stack;
+    if( 0 == pthread_getattr_np(pthread_self(), &np_attr_stack) ) {
+        if ( 0 == pthread_attr_getstack(&np_attr_stack, &stack_limit, &np_stack_size) ) {
+            if ( 0 == pthread_attr_init(&attr_stack) ) {
+                if ( 0 == pthread_attr_getstacksize(&attr_stack, &stack_size) )
+                {
+                    stack_base = (char*)stack_limit + np_stack_size;
+                    if ( np_stack_size < stack_size ) {
+                        // We are in a secondary thread. Use reliable data.
+#if __TBB_ipf
+                        // IA64 stack is split into RSE backup and memory parts
+                        rsb_base = stack_limit;
+                        stack_size = np_stack_size/2;
+#else
+                        stack_size = np_stack_size;
+#endif /* !__TBB_ipf */
+                    }
+                    // We are either in the main thread or this thread stack 
+                    // is bigger that that of the main one. As we cannot discern
+                    // these cases we fall back to the default (heuristic) values.
+                }
+                pthread_attr_destroy(&attr_stack);
+            }
+        }
+        pthread_attr_destroy(&np_attr_stack);
+    }
+#endif /* __linux__ */
+    __TBB_ASSERT( stack_size>0, "stack size must be positive" );
+    my_stealing_threshold = (uintptr_t)((char*)stack_base - stack_size/2);
+#if __TBB_ipf
+    my_rsb_stealing_threshold = (uintptr_t)((char*)rsb_base + stack_size/2);
+#endif
+#endif /* USE_PTHREAD */
+}
+
+task** GenericScheduler::allocate_task_pool( size_t n ) {
+    __TBB_ASSERT( n > task_pool_size, "Cannot shrink the task pool" );
+    size_t byte_size = ((n * sizeof(task*) + NFS_MaxLineSize - 1) / NFS_MaxLineSize) * NFS_MaxLineSize;
+    task_pool_size = byte_size / sizeof(task*);
+    task** new_pool = (task**)NFS_Allocate( byte_size, 1, NULL );
+    // No need to clear the fresh deque since valid items are designated by the head and tail members.
+#if TBB_USE_ASSERT>=2
+    // But clear it in the high vigilance debug mode
+    memset( new_pool, -1, n );
+#endif /* TBB_USE_ASSERT>=2 */
+    return new_pool;
+}
+
+void GenericScheduler::grow( size_t new_size ) {
+    __TBB_ASSERT( assert_okay(), NULL );
+    if ( new_size < 2 * task_pool_size )
+        new_size = 2 * task_pool_size;
+    task** new_pool = allocate_task_pool( new_size ); // updates task_pool_size
+    task** old_pool = dummy_slot.task_pool;
+    acquire_task_pool();    // requires the old dummy_slot.task_pool value
+    // arena_slot->tail should not be updated before arena_slot->head because their
+    // values are used by other threads to check if this task pool is empty.
+    size_t new_tail = arena_slot->tail - arena_slot->head;
+    __TBB_ASSERT( new_tail <= task_pool_size, "new task pool is too short" );
+    memcpy( new_pool, old_pool + arena_slot->head, new_tail * sizeof(task*) );
+    arena_slot->head = 0;
+    arena_slot->tail = new_tail;
+    dummy_slot.task_pool = new_pool;
+    release_task_pool();    // updates the task pool pointer in our arena slot
+    free_task_pool( old_pool );
+    __TBB_ASSERT( assert_okay(), NULL );
+}
+
+
+GenericScheduler* GenericScheduler::allocate_scheduler( Arena* arena ) {
+    switch( SchedulerTraitsId ) {
+        /* DefaultSchedulerTraits::id is listed explicitly as a case so that the host compiler
+           will issue an error message if it is the same as another id in the list. */
+        default:
+        case DefaultSchedulerTraits::id:
+            return CustomScheduler<DefaultSchedulerTraits>::allocate_scheduler(arena);
+        case IntelSchedulerTraits::id:
+            return CustomScheduler<IntelSchedulerTraits>::allocate_scheduler(arena);
+    }
+}
+
+void GenericScheduler::free_scheduler() {
+    if( in_arena() ) {
+        acquire_task_pool();
+        leave_arena();
+    }
+#if __TBB_EXCEPTIONS
+    task_group_context* &context = dummy_task->prefix().context;
+    // Only master thread's dummy task has a context
+    if ( context != &dummy_context) {
+        //! \todo Add assertion that master's dummy task context does not have children
+        context->task_group_context::~task_group_context();
+        NFS_Free(context);
+        {
+            mutex::scoped_lock lock(the_scheduler_list_mutex);
+            my_node.my_next->my_prev = my_node.my_prev;
+            my_node.my_prev->my_next = my_node.my_next;
+        }
+    }
+#endif /* __TBB_EXCEPTIONS */
+    free_task<is_small_local>( *dummy_task );
+
+    // k accounts for a guard reference and each task that we deallocate.
+    intptr k = 1;
+    for(;;) {
+        while( task* t = free_list ) {
+            free_list = t->prefix().next;
+            deallocate_task(*t);
+            ++k;
+        }
+        if( return_list==plugged_return_list() ) 
+            break;
+        free_list = (task*)__TBB_FetchAndStoreW( &return_list, (intptr)plugged_return_list() );
+    }
+
+#if COUNT_TASK_NODES
+    arena->prefix().task_node_count += task_node_count;
+#endif /* COUNT_TASK_NODES */
+#if STATISTICS
+    the_statistics.record( execute_count, steal_count, mail_received_count,
+                           proxy_execute_count, proxy_steal_count, proxy_bypass_count );
+#endif /* STATISTICS */
+    free_task_pool( dummy_slot.task_pool );
+    dummy_slot.task_pool = NULL;
+    // Update small_task_count last.  Doing so sooner might cause another thread to free *this.
+    __TBB_ASSERT( small_task_count>=k, "small_task_count corrupted" );
+    Governor::sign_off(this);
+    if( __TBB_FetchAndAddW( &small_task_count, -k )==k ) 
+        NFS_Free( this );
+}
+
+/** ATTENTION: 
+    This method is mostly the same as GenericScheduler::lock_task_pool(), with 
+    a little different logic of slot state checks (slot is either locked or points 
+    to our task pool).
+    Thus if either of them is changed, consider changing the counterpart as well. **/
+inline void GenericScheduler::acquire_task_pool() const {
+    if ( !in_arena() )
+        return; // we are not in arena - nothing to lock
+    atomic_backoff backoff;
+    bool sync_prepare_done = false;
+    for(;;) {
+#if TBB_USE_ASSERT
+        __TBB_ASSERT( arena_slot == arena->slot + arena_index, "invalid arena slot index" );
+        // Local copy of the arena slot task pool pointer is necessary for the next 
+        // assertion to work correctly to exclude asynchronous state transition effect.
+        task** tp = arena_slot->task_pool;
+        __TBB_ASSERT( tp == LockedTaskPool || tp == dummy_slot.task_pool, "slot ownership corrupt?" );
+#endif
+        if( arena_slot->task_pool != LockedTaskPool && 
+            __TBB_CompareAndSwapW( &arena_slot->task_pool, (intptr_t)LockedTaskPool, 
+                                   (intptr_t)dummy_slot.task_pool ) == (intptr_t)dummy_slot.task_pool )
+        {
+            // We acquired our own slot
+            ITT_NOTIFY(sync_acquired, arena_slot);
+            break;
+        } 
+        else if( !sync_prepare_done ) {
+            // Start waiting
+            ITT_NOTIFY(sync_prepare, arena_slot);
+            sync_prepare_done = true;
+        }
+        // Someone else acquired a lock, so pause and do exponential backoff.
+        backoff.pause();
+#if TEST_ASSEMBLY_ROUTINES
+        __TBB_ASSERT( arena_slot->task_pool == LockedTaskPool || 
+                      arena_slot->task_pool == dummy_slot.task_pool, NULL );
+#endif /* TEST_ASSEMBLY_ROUTINES */
+    }
+    __TBB_ASSERT( arena_slot->task_pool == LockedTaskPool, "not really acquired task pool" );
+} // GenericScheduler::acquire_task_pool
+
+inline void GenericScheduler::release_task_pool() const {
+    if ( !in_arena() )
+        return; // we are not in arena - nothing to unlock
+    __TBB_ASSERT( arena_slot, "we are not in arena" );
+    __TBB_ASSERT( arena_slot->task_pool == LockedTaskPool, "arena slot is not locked" );
+    ITT_NOTIFY(sync_releasing, arena_slot);
+    __TBB_store_with_release( arena_slot->task_pool, dummy_slot.task_pool );
+}
+
+/** ATTENTION: 
+    This method is mostly the same as GenericScheduler::acquire_task_pool(), 
+    with a little different logic of slot state checks (slot can be empty, locked 
+    or point to any task pool other than ours, and asynchronous transitions between 
+    all these states are possible).
+    Thus if any of them is changed, consider changing the counterpart as well **/
+inline task** GenericScheduler::lock_task_pool( ArenaSlot* victim_arena_slot ) const {
+    task** victim_task_pool;
+    atomic_backoff backoff;
+    bool sync_prepare_done = false;
+    for(;;) {
+        victim_task_pool = victim_arena_slot->task_pool;
+        // TODO: Investigate the effect of bailing out on the locked pool without trying to lock it.
+        //       When doing this update assertion in the end of the method.
+        if ( victim_task_pool == EmptyTaskPool ) {
+            // The victim thread emptied its task pool - nothing to lock
+            if( sync_prepare_done )
+                ITT_NOTIFY(sync_cancel, victim_arena_slot);
+            break;
+        }
+        if( victim_task_pool != LockedTaskPool && 
+            __TBB_CompareAndSwapW( &victim_arena_slot->task_pool, 
+                (intptr_t)LockedTaskPool, (intptr_t)victim_task_pool ) == (intptr_t)victim_task_pool )
+        {
+            // We've locked victim's task pool
+            ITT_NOTIFY(sync_acquired, victim_arena_slot);
+            break;
+        }
+        else if( !sync_prepare_done ) {
+            // Start waiting
+            ITT_NOTIFY(sync_prepare, victim_arena_slot);
+            sync_prepare_done = true;
+        }
+        // Someone else acquired a lock, so pause and do exponential backoff.
+        backoff.pause();
+    }
+    __TBB_ASSERT( victim_task_pool == EmptyTaskPool || 
+                  (victim_arena_slot->task_pool == LockedTaskPool && victim_task_pool != LockedTaskPool), 
+                  "not really locked victim's task pool?" );
+    return victim_task_pool;
+} // GenericScheduler::lock_task_pool
+
+inline void GenericScheduler::unlock_task_pool( ArenaSlot* victim_arena_slot, 
+                                                task** victim_task_pool ) const {
+    __TBB_ASSERT( victim_arena_slot, "empty victim arena slot pointer" );
+    __TBB_ASSERT( victim_arena_slot->task_pool == LockedTaskPool, "victim arena slot is not locked" );
+    ITT_NOTIFY(sync_releasing, victim_arena_slot);
+    __TBB_store_with_release( victim_arena_slot->task_pool, victim_task_pool );
+}
+
+
+inline task* GenericScheduler::prepare_for_spawning( task* t ) {
+    __TBB_ASSERT( t->state()==task::allocated, "attempt to spawn task that is not in 'allocated' state" );
+    t->prefix().owner = this;
+    t->prefix().state = task::ready;
+#if TBB_USE_ASSERT
+    if( task* parent = t->parent() ) {
+        internal::reference_count ref_count = parent->prefix().ref_count;
+        __TBB_ASSERT( ref_count>=0, "attempt to spawn task whose parent has a ref_count<0" );
+        __TBB_ASSERT( ref_count!=0, "attempt to spawn task whose parent has a ref_count==0 (forgot to set_ref_count?)" );
+        parent->prefix().extra_state |= es_ref_count_active;
+    }
+#endif /* TBB_USE_ASSERT */
+    affinity_id dst_thread = t->prefix().affinity;
+    __TBB_ASSERT( dst_thread == 0 || is_version_3_task(*t), "backwards compatibility to TBB 2.0 tasks is broken" );
+    if( dst_thread != 0 && dst_thread != my_affinity_id ) {
+        task_proxy& proxy = (task_proxy&)allocate_task( sizeof(task_proxy), 
+                                                      __TBB_CONTEXT_ARG(NULL, NULL) );
+        // Mark as a proxy
+        proxy.prefix().extra_state = es_task_proxy;
+        proxy.outbox = &arena->mailbox(dst_thread);
+        proxy.task_and_tag = intptr(t)|3;
+        proxy.next_in_mailbox = NULL;
+        ITT_NOTIFY( sync_releasing, proxy.outbox );
+        // Mail the proxy - after this point t may be destroyed by another thread at any moment.
+        proxy.outbox->push(proxy);
+        return &proxy;
+    }
+    return t;
+}
+
+/** Conceptually, this method should be a member of class scheduler.
+    But doing so would force us to publish class scheduler in the headers. */
+void GenericScheduler::local_spawn( task& first, task*& next ) {
+    __TBB_ASSERT( Governor::is_set(this), NULL );
+    __TBB_ASSERT( assert_okay(), NULL );
+    if ( &first.prefix().next == &next ) {
+        // Single task is being spawned
+        if ( arena_slot->tail == task_pool_size ) {
+            // 1 compensates for head possibly temporarily incremented by a thief
+            if ( arena_slot->head > 1 ) {
+                // Move the busy part of the deque to the beginning of the allocated space
+                acquire_task_pool();
+                arena_slot->tail -= arena_slot->head;
+                memmove( dummy_slot.task_pool, dummy_slot.task_pool + arena_slot->head, arena_slot->tail * sizeof(task*) );
+                arena_slot->head = 0;
+                release_task_pool();
+            }
+            else {
+                grow( task_pool_size + 1 );
+            }
+        }
+        dummy_slot.task_pool[arena_slot->tail] = prepare_for_spawning( &first );
+        ITT_NOTIFY(sync_releasing, arena_slot);
+        // The following store with release is required on ia64 only
+        size_t new_tail = arena_slot->tail + 1;
+        __TBB_store_with_release( arena_slot->tail, new_tail );
+        __TBB_ASSERT ( arena_slot->tail <= task_pool_size, "task deque end was overwritten" );
+    }
+    else {
+        // Task list is being spawned
+        const size_t initial_capacity = 64;
+        task *arr[initial_capacity];
+        fast_reverse_vector<task*> tasks(arr, initial_capacity);
+        task *t_next = NULL;
+        for( task* t = &first; ; t = t_next ) {
+            // After prepare_for_spawning returns t may already have been destroyed. 
+            // So milk it while it is alive.
+            bool end = &t->prefix().next == &next;
+            t_next = t->prefix().next;
+            tasks.push_back( prepare_for_spawning(t) );
+            if( end )
+                break;
+        }
+        size_t num_tasks = tasks.size();
+        __TBB_ASSERT ( arena_index != null_arena_index, "invalid arena slot index" );
+        if ( arena_slot->tail + num_tasks > task_pool_size ) {
+            // 1 compensates for head possibly temporarily incremented by a thief
+            size_t new_size = arena_slot->tail - arena_slot->head + num_tasks + 1;
+            if ( new_size <= task_pool_size ) {
+                // Move the busy part of the deque to the beginning of the allocated space
+                acquire_task_pool();
+                arena_slot->tail -= arena_slot->head;
+                memmove( dummy_slot.task_pool, dummy_slot.task_pool + arena_slot->head, arena_slot->tail * sizeof(task*) );
+                arena_slot->head = 0;
+                release_task_pool();
+            }
+            else {
+                grow( new_size );
+            }
+        }
+#if DO_ITT_NOTIFY
+        else {
+            // The preceding if-branch issues the same ittnotify inside release_task_pool() or grow() methods
+            ITT_NOTIFY(sync_releasing, arena_slot);
+        }
+#endif /* DO_ITT_NOTIFY */
+        tasks.copy_memory( dummy_slot.task_pool + arena_slot->tail );
+        // The following store with release is required on ia64 only
+        size_t new_tail = arena_slot->tail + num_tasks;
+        __TBB_store_with_release( arena_slot->tail, new_tail );
+        __TBB_ASSERT ( arena_slot->tail <= task_pool_size, "task deque end was overwritten" );
+    }
+    if ( !in_arena() ) {
+        if ( is_worker() )
+            enter_arena();
+        else
+            try_enter_arena();
+    }
+
+    arena->mark_pool_full();
+    __TBB_ASSERT( assert_okay(), NULL );
+
+    TBB_TRACE(("%p.internal_spawn exit\n", this ));
+}
+
+void GenericScheduler::local_spawn_root_and_wait( task& first, task*& next ) {
+    __TBB_ASSERT( Governor::is_set(this), NULL );
+    __TBB_ASSERT( &first, NULL );
+    auto_empty_task dummy( __TBB_CONTEXT_ARG(this, first.prefix().context) );
+    internal::reference_count n = 0;
+    for( task* t=&first; ; t=t->prefix().next ) {
+        ++n;
+        __TBB_ASSERT( !t->prefix().parent, "not a root task, or already running" );
+        t->prefix().parent = &dummy;
+        if( &t->prefix().next==&next ) break;
+#if __TBB_EXCEPTIONS
+        __TBB_ASSERT( t->prefix().context == t->prefix().next->prefix().context, 
+                    "all the root tasks in list must share the same context");
+#endif /* __TBB_EXCEPTIONS */
+    }
+    dummy.prefix().ref_count = n+1;
+    if( n>1 )
+        LocalSpawn( *first.prefix().next, next );
+    TBB_TRACE(("spawn_root_and_wait((task_list*)%p): calling %p.loop\n",&first,this));
+    wait_for_all( dummy, &first );
+    TBB_TRACE(("spawn_root_and_wait((task_list*)%p): return\n",&first));
+}
+
+inline task* GenericScheduler::get_mailbox_task() {
+    __TBB_ASSERT( my_affinity_id>0, "not in arena" );
+    task* result = NULL;
+    while( task_proxy* t = inbox.pop() ) {
+        intptr tat = __TBB_load_with_acquire(t->task_and_tag);
+        __TBB_ASSERT( tat==task_proxy::mailbox_bit || (tat==(tat|3)&&tat!=3), NULL );
+        if( tat!=task_proxy::mailbox_bit && __TBB_CompareAndSwapW( &t->task_and_tag, task_proxy::pool_bit, tat )==tat ) {
+            // Successfully grabbed the task, and left pool seeker with job of freeing the proxy.
+            ITT_NOTIFY( sync_acquired, inbox.outbox() );
+            result = (task*)(tat & ~3);
+            result->prefix().owner = this;
+            break;
+        }
+        free_task_proxy( *t );
+    }
+    return result;
+}
+
+inline task* GenericScheduler::strip_proxy( task_proxy* tp ) {
+    __TBB_ASSERT( tp->prefix().extra_state==es_task_proxy, NULL );
+    intptr tat = __TBB_load_with_acquire(tp->task_and_tag);
+    if( (tat&3)==3 ) {
+        // proxy is shared by a pool and a mailbox.
+        // Attempt to transition it to "empty proxy in mailbox" state.
+        if( __TBB_CompareAndSwapW( &tp->task_and_tag, task_proxy::mailbox_bit, tat )==tat ) {
+            // Successfully grabbed the task, and left the mailbox with the job of freeing the proxy.
+            return (task*)(tat&~3);
+        }
+        __TBB_ASSERT( tp->task_and_tag==task_proxy::pool_bit, NULL );
+    } else {
+        // We have exclusive access to the proxy
+        __TBB_ASSERT( (tat&3)==task_proxy::pool_bit, "task did not come from pool?" );
+        __TBB_ASSERT ( !(tat&~3), "Empty proxy in the pool contains non-zero task pointer" );
+    }
+#if TBB_USE_ASSERT
+    tp->prefix().state = task::allocated;
+#endif
+    free_task_proxy( *tp );
+    // Another thread grabbed the underlying task via their mailbox
+    return NULL;
+}
+
+inline task* GenericScheduler::get_task() {
+    task* result = NULL;
+retry:
+    --arena_slot->tail;
+    __TBB_rel_acq_fence();
+    if ( (intptr_t)arena_slot->head > (intptr_t)arena_slot->tail ) {
+        acquire_task_pool();
+        if ( (intptr_t)arena_slot->head <= (intptr_t)arena_slot->tail ) {
+            // The thief backed off - grab the task
+            __TBB_ASSERT_VALID_TASK_PTR( dummy_slot.task_pool[arena_slot->tail] );
+            result = dummy_slot.task_pool[arena_slot->tail];
+            __TBB_POISON_TASK_PTR( dummy_slot.task_pool[arena_slot->tail] );
+        }
+        else {
+            __TBB_ASSERT ( arena_slot->head == arena_slot->tail + 1, "victim/thief arbitration algorithm failure" );
+        }
+        if ( (intptr_t)arena_slot->head < (intptr_t)arena_slot->tail ) {
+            release_task_pool();
+        }
+        else {
+            // In any case the deque is empty now, so compact it
+            arena_slot->head = arena_slot->tail = 0;
+            if ( in_arena() )
+                leave_arena();
+        }
+    }
+    else {
+        __TBB_ASSERT_VALID_TASK_PTR( dummy_slot.task_pool[arena_slot->tail] );
+        result = dummy_slot.task_pool[arena_slot->tail];
+        __TBB_POISON_TASK_PTR( dummy_slot.task_pool[arena_slot->tail] );
+    }
+    if( result && is_proxy(*result) ) {
+        result = strip_proxy((task_proxy*)result);
+        if( !result ) {
+            goto retry;
+        }
+        GATHER_STATISTIC( ++proxy_execute_count );
+        // Following assertion should be true because TBB 2.0 tasks never specify affinity, and hence are not proxied.
+        __TBB_ASSERT( is_version_3_task(*result), "backwards compatibility with TBB 2.0 broken" );
+        // Task affinity has changed.
+        innermost_running_task = result;
+        result->note_affinity(my_affinity_id);
+    }
+    return result;
+} // GenericScheduler::get_task
+
+task* GenericScheduler::steal_task( ArenaSlot& victim_slot ) {
+    task** victim_pool = lock_task_pool( &victim_slot );
+    if ( !victim_pool )
+        return NULL;
+    const size_t none = ~0u;
+    size_t first_skipped_proxy = none;
+    task* result = NULL;
+retry:
+    ++victim_slot.head;
+    __TBB_rel_acq_fence();
+    if ( (intptr_t)victim_slot.head > (intptr_t)victim_slot.tail ) {
+        --victim_slot.head;
+    }
+    else {
+        __TBB_ASSERT_VALID_TASK_PTR( victim_pool[victim_slot.head - 1]);
+        result = victim_pool[victim_slot.head - 1];
+        if( is_proxy(*result) ) {
+            task_proxy& tp = *static_cast<task_proxy*>(result);
+            // If task will likely be grabbed by whom it was mailed to, skip it.
+            if( (tp.task_and_tag & 3) == 3 && tp.outbox->recipient_is_idle() ) {
+                if ( first_skipped_proxy == none )
+                    first_skipped_proxy = victim_slot.head - 1;
+                result = NULL;
+                goto retry;
+            }
+        }
+        __TBB_POISON_TASK_PTR(victim_pool[victim_slot.head - 1]);
+    }
+    if ( first_skipped_proxy != none ) {
+        if ( result ) {
+            victim_pool[victim_slot.head - 1] = victim_pool[first_skipped_proxy];
+            __TBB_POISON_TASK_PTR( victim_pool[first_skipped_proxy] );
+            __TBB_store_with_release( victim_slot.head, first_skipped_proxy + 1 );
+        }
+        else
+            __TBB_store_with_release( victim_slot.head, first_skipped_proxy );
+    }
+    unlock_task_pool( &victim_slot, victim_pool );
+    return result;
+}
+
+
+#define ConcurrentWaitsEnabled(t) (t.prefix().context->my_version_and_traits & task_group_context::concurrent_wait)
+#define CancellationInfoPresent(t) (t->prefix().context->my_cancellation_requested)
+
+#if TBB_USE_CAPTURED_EXCEPTION
+    inline tbb_exception* TbbCurrentException( task_group_context*, tbb_exception* src) { return src->move(); }
+    inline tbb_exception* TbbCurrentException( task_group_context*, captured_exception* src) { return src; }
+#else
+    // Using macro instead of an inline function here allows to avoid evaluation of the 
+    // TbbCapturedException expression when exact propagation is enabled for the context.
+    #define TbbCurrentException(context, TbbCapturedException) \
+        context->my_version_and_traits & task_group_context::exact_exception    \
+            ? tbb_exception_ptr::allocate()    \
+            : tbb_exception_ptr::allocate( *(TbbCapturedException) );
+#endif /* !TBB_USE_CAPTURED_EXCEPTION */
+
+#define TbbRegisterCurrentException(context, TbbCapturedException) \
+    if ( context->cancel_group_execution() ) {  \
+        /* We are the first to signal cancellation, so store the exception that caused it. */  \
+        context->my_exception = TbbCurrentException( context, TbbCapturedException ); \
+    }
+
+#define TbbCatchAll(context)  \
+    catch ( tbb_exception& exc ) {  \
+        TbbRegisterCurrentException( context, &exc );   \
+    } catch ( std::exception& exc ) {   \
+        TbbRegisterCurrentException( context, captured_exception::allocate(typeid(exc).name(), exc.what()) ); \
+    } catch ( ... ) {   \
+        TbbRegisterCurrentException( context, captured_exception::allocate("...", "Unidentified exception") );\
+    }
+
+template<typename SchedulerTraits>
+void CustomScheduler<SchedulerTraits>::local_wait_for_all( task& parent, task* child ) {
+    __TBB_ASSERT( Governor::is_set(this), NULL );
+    if( child ) {
+        child->prefix().owner = this;
+    }
+    __TBB_ASSERT( parent.ref_count() >= (child && child->parent() == &parent ? 2 : 1), "ref_count is too small" );
+    __TBB_ASSERT( assert_okay(), NULL );
+    // Using parent's refcount in sync_prepare (in the stealing loop below) is 
+    // a workaround for TP. We need to name it here to display correctly in Ampl.
+    if( SchedulerTraits::itt_possible )
+        ITT_SYNC_CREATE(&parent.prefix().ref_count, SyncType_Scheduler, SyncObj_TaskStealingLoop);
+#if __TBB_EXCEPTIONS
+    __TBB_ASSERT( parent.prefix().context || (is_worker() && &parent == dummy_task), "parent task does not have context" );
+#endif /* __TBB_EXCEPTIONS */
+    task* t = child;
+    // Constants all_work_done and all_local_work_done are actually unreacheable 
+    // refcount values that prevent early quitting the dispatch loop. They are 
+    // defined to be in the middle of the range of negative values representable 
+    // by the reference_count type.
+    static const reference_count 
+        // For nested dispatch loops in masters and any dispatch loops in workers
+        parents_work_done = 1,
+        // For outermost dispatch loops in masters
+        all_work_done = (reference_count)3 << (sizeof(reference_count) * 8 - 2),
+        // For termination dispatch loops in masters
+        all_local_work_done = all_work_done + 1;
+    reference_count quit_point;
+    if( innermost_running_task == dummy_task ) {
+        // We are in the outermost task dispatch loop of a master thread,
+        __TBB_ASSERT( !is_worker(), NULL );
+        quit_point = &parent == dummy_task ? all_local_work_done : all_work_done;
+    } else {
+        quit_point = parents_work_done;
+    }
+    task* old_innermost_running_task = innermost_running_task;
+#if __TBB_EXCEPTIONS
+exception_was_caught:
+    try {
+#endif /* __TBB_EXCEPTIONS */
+    // Outer loop steals tasks when necessary.
+    for(;;) {
+        // Middle loop evaluates tasks that are pulled off "array".
+        do {
+            // Inner loop evaluates tasks that are handed directly to us by other tasks.
+            while(t) {
+                __TBB_ASSERT( inbox.assert_is_idle(false), NULL );
+#if TBB_USE_ASSERT
+                __TBB_ASSERT(!is_proxy(*t),"unexpected proxy");
+                __TBB_ASSERT( t->prefix().owner==this, NULL );
+#if __TBB_EXCEPTIONS
+                if ( !t->prefix().context->my_cancellation_requested ) 
+#endif
+                    __TBB_ASSERT( 1L<<t->state() & (1L<<task::allocated|1L<<task::ready|1L<<task::reexecute), NULL );
+                __TBB_ASSERT(assert_okay(),NULL);
+#endif /* TBB_USE_ASSERT */
+                task* t_next = NULL;
+                innermost_running_task = t;
+                t->prefix().state = task::executing;
+#if __TBB_EXCEPTIONS
+                if ( !t->prefix().context->my_cancellation_requested )
+#endif
+                {
+                    TBB_TRACE(("%p.wait_for_all: %p.execute\n",this,t));
+                    GATHER_STATISTIC( ++execute_count );
+                    t_next = t->execute();
+#if STATISTICS
+                    if (t_next) {
+                        affinity_id next_affinity=t_next->prefix().affinity;
+                        if (next_affinity != 0 && next_affinity != my_affinity_id)
+                            GATHER_STATISTIC( ++proxy_bypass_count );
+                    }
+#endif
+                }
+                if( t_next ) {
+                    __TBB_ASSERT( t_next->state()==task::allocated,
+                                "if task::execute() returns task, it must be marked as allocated" );
+                    // The store here has a subtle secondary effect - it fetches *t_next into cache.
+                    t_next->prefix().owner = this;
+                }
+                __TBB_ASSERT(assert_okay(),NULL);
+                switch( task::state_type(t->prefix().state) ) {
+                    case task::executing: {
+                        // this block was copied below to case task::recycle
+                        // when making changes, check it too
+                        task* s = t->parent();
+                        __TBB_ASSERT( innermost_running_task==t, NULL );
+                        __TBB_ASSERT( t->prefix().ref_count==0, "Task still has children after it has been executed" );
+                        t->~task();
+                        if( s ) {
+                            if( tally_completion_of_one_predecessor(*s) ) {
+#if TBB_USE_ASSERT
+                                s->prefix().extra_state &= ~es_ref_count_active;
+#endif /* TBB_USE_ASSERT */
+                                s->prefix().owner = this;
+
+                                if( !t_next ) {
+                                    t_next = s;
+                                } else {
+                                    LocalSpawn( *s, s->prefix().next );
+                                    __TBB_ASSERT(assert_okay(),NULL);
+                                }
+                            }
+                        }
+                        free_task<no_hint>( *t );
+                        break;
+                    }
+
+                    case task::recycle: { // state set by recycle_as_safe_continuation()
+                        t->prefix().state = task::allocated;
+                        // for safe continuation, need atomically decrement ref_count;
+                        // the block was copied from above case task::executing, and changed.
+                        // Use "s" here as name for t, so that code resembles case task::executing more closely.
+                        task* const& s = t;
+                        if( tally_completion_of_one_predecessor(*s) ) {
+                            // Unused load is put here for sake of inserting an "acquire" fence.
+#if TBB_USE_ASSERT
+                            s->prefix().extra_state &= ~es_ref_count_active;
+                            __TBB_ASSERT( s->prefix().owner==this, "ownership corrupt?" );
+#endif /* TBB_USE_ASSERT */
+                            if( !t_next ) {
+                                t_next = s;
+                            } else {
+                                LocalSpawn( *s, s->prefix().next );
+                                __TBB_ASSERT(assert_okay(),NULL);
+                            }
+                        }
+                        break;
+                    }
+
+                    case task::reexecute: // set by recycle_to_reexecute()
+                        __TBB_ASSERT( t_next && t_next != t, "reexecution requires that method 'execute' return another task" );
+                        TBB_TRACE(("%p.wait_for_all: put task %p back into array",this,t));
+                        t->prefix().state = task::allocated;
+                        LocalSpawn( *t, t->prefix().next );
+                        __TBB_ASSERT(assert_okay(),NULL);
+                        break;
+#if TBB_USE_ASSERT
+                    case task::allocated:
+                        break;
+                    case task::ready:
+                        __TBB_ASSERT( false, "task is in READY state upon return from method execute()" );
+                        break;
+                    default:
+                        __TBB_ASSERT( false, "illegal state" );
+#else
+                    default: // just to shut up some compilation warnings
+                        break;
+#endif /* TBB_USE_ASSERT */
+                }
+
+                t = t_next;
+            } // end of scheduler bypass loop
+            __TBB_ASSERT(assert_okay(),NULL);
+
+            // If the parent's descendants are finished with and we are not in 
+            // the outermost dispatch loop of a master thread, then we are done.
+            // This is necessary to prevent unbounded stack growth in case of deep
+            // wait_for_all nesting. 
+            // Note that we cannot return from master's outermost dispatch loop 
+            // until we process all the tasks in the local pool, since in case 
+            // of multiple masters this could have left some of them forever 
+            // waiting for their stolen children to be processed.
+            if ( parent.prefix().ref_count == quit_point )
+                break;
+            t = get_task();
+            __TBB_ASSERT(!t || !is_proxy(*t),"unexpected proxy");
+#if TBB_USE_ASSERT
+            __TBB_ASSERT(assert_okay(),NULL);
+            if(t) {
+                AssertOkay(*t);
+                __TBB_ASSERT( t->prefix().owner==this, "thread got task that it does not own" );
+            }
+#endif /* TBB_USE_ASSERT */
+        } while( t ); // end of local task array processing loop
+
+        if ( quit_point == all_local_work_done ) {
+            __TBB_ASSERT( arena_slot == &dummy_slot && arena_slot->head == 0 && arena_slot->tail == 0, NULL );
+            innermost_running_task = old_innermost_running_task;
+            return;
+        }
+        inbox.set_is_idle( true );
+        __TBB_ASSERT( arena->prefix().number_of_workers>0||parent.prefix().ref_count==1, "deadlock detected" );
+        // The state "failure_count==-1" is used only when itt_possible is true,
+        // and denotes that a sync_prepare has not yet been issued.
+        for( int failure_count = -static_cast<int>(SchedulerTraits::itt_possible);; ++failure_count) {
+            if( parent.prefix().ref_count==1 ) {
+                if( SchedulerTraits::itt_possible ) {
+                    if( failure_count!=-1 ) {
+                        ITT_NOTIFY(sync_prepare, &parent.prefix().ref_count);
+                        // Notify Intel(R) Thread Profiler that thread has stopped spinning.
+                        ITT_NOTIFY(sync_acquired, this);
+                    }
+                    ITT_NOTIFY(sync_acquired, &parent.prefix().ref_count);
+                }
+                inbox.set_is_idle( false );
+                goto done;
+            }
+            // Try to steal a task from a random victim.
+            size_t n = arena->prefix().limit;
+            if( n>1 ) {
+                if( !my_affinity_id || !(t=get_mailbox_task()) ) {
+                    if ( !can_steal() )
+                        goto fail;
+                    size_t k = random.get() % (n-1);
+                    ArenaSlot* victim = &arena->slot[k];
+                    // The following condition excludes the master that might have 
+                    // already taken our previous place in the arena from the list .
+                    // of potential victims. But since such a situation can take 
+                    // place only in case of significant oversubscription, keeping
+                    // the checks simple seems to be preferable to complicating the code.
+                    if( k >= arena_index )
+                        ++victim;               // Adjusts random distribution to exclude self
+                    t = steal_task( *victim );
+                    if( !t ) goto fail;
+                    if( is_proxy(*t) ) {
+                        t = strip_proxy((task_proxy*)t);
+                        if( !t ) goto fail;
+                        GATHER_STATISTIC( ++proxy_steal_count );
+                    }
+                    GATHER_STATISTIC( ++steal_count );
+                    if( is_version_3_task(*t) ) {
+                        innermost_running_task = t;
+                        t->note_affinity( my_affinity_id );
+                    }
+                } else {
+                    GATHER_STATISTIC( ++mail_received_count );
+                }
+                __TBB_ASSERT(t,NULL);
+#if __TBB_SCHEDULER_OBSERVER
+                // No memory fence required for read of global_last_observer_proxy, because prior fence on steal/mailbox suffices.
+                if( local_last_observer_proxy!=global_last_observer_proxy ) {
+                    notify_entry_observers();
+                }
+#endif /* __TBB_SCHEDULER_OBSERVER */
+                {
+                    if( SchedulerTraits::itt_possible ) {
+                        if( failure_count!=-1 ) {
+                            // FIXME - might be victim, or might be selected from a mailbox
+                            // Notify Intel(R) Thread Profiler that thread has stopped spinning.
+                            ITT_NOTIFY(sync_acquired, this);
+                            // FIXME - might be victim, or might be selected from a mailbox
+                        }
+                    }
+                    __TBB_ASSERT(t,NULL);
+                    inbox.set_is_idle( false );
+                    break;
+                }
+            }
+fail:
+            if( SchedulerTraits::itt_possible && failure_count==-1 ) {
+                // The first attempt to steal work failed, so notify Intel(R) Thread Profiler that
+                // the thread has started spinning.  Ideally, we would do this notification
+                // *before* the first failed attempt to steal, but at that point we do not
+                // know that the steal will fail.
+                ITT_NOTIFY(sync_prepare, this);
+                failure_count = 0;
+            }
+            // Pause, even if we are going to yield, because the yield might return immediately.
+            __TBB_Pause(PauseTime);
+            int yield_threshold = 2*int(n);
+            if( failure_count>=yield_threshold ) {
+                __TBB_Yield();
+                if( failure_count>=yield_threshold+100 ) {
+                    if( !old_innermost_running_task && arena->check_if_pool_is_empty() ) {
+                        // Current thread was created by RML and has nothing to do, so return it to the RML.
+                        // For purposes of affinity support, the thread is considered idle while it is in RML.
+                        // Restore innermost_running_task to its original value.
+                        innermost_running_task = NULL;
+                        return;
+                    }
+                    failure_count = yield_threshold;
+                }
+            }
+        }
+        __TBB_ASSERT(t,NULL);
+        __TBB_ASSERT(!is_proxy(*t),"unexpected proxy");
+        t->prefix().owner = this;
+    } // end of stealing loop
+#if __TBB_EXCEPTIONS
+    } TbbCatchAll( t->prefix().context );
+
+    if( task::state_type(t->prefix().state) == task::recycle ) { // state set by recycle_as_safe_continuation()
+        t->prefix().state = task::allocated;
+        // for safe continuation, need to atomically decrement ref_count;
+        if( SchedulerTraits::itt_possible )
+            ITT_NOTIFY(sync_releasing, &t->prefix().ref_count);
+        if( __TBB_FetchAndDecrementWrelease(&t->prefix().ref_count)==1 ) {
+            if( SchedulerTraits::itt_possible )
+                ITT_NOTIFY(sync_acquired, &t->prefix().ref_count);
+        }else{
+            t = NULL;
+        }
+    }
+    goto exception_was_caught;
+#endif /* __TBB_EXCEPTIONS */
+done:
+    if ( !ConcurrentWaitsEnabled(parent) )
+        parent.prefix().ref_count = 0;
+#if TBB_USE_ASSERT
+    parent.prefix().extra_state &= ~es_ref_count_active;
+#endif /* TBB_USE_ASSERT */
+    innermost_running_task = old_innermost_running_task;
+#if __TBB_EXCEPTIONS
+    __TBB_ASSERT(parent.prefix().context && dummy_task->prefix().context, NULL);
+    task_group_context* parent_ctx = parent.prefix().context;
+    if ( parent_ctx->my_cancellation_requested ) {
+        task_group_context::exception_container_type *pe = parent_ctx->my_exception;
+        if ( innermost_running_task == dummy_task && parent_ctx == dummy_task->prefix().context ) {
+            // We are in the outermost task dispatch loop of a master thread, and 
+            // the whole task tree has been collapsed. So we may clear cancellation data.
+            parent_ctx->my_cancellation_requested = 0;
+            __TBB_ASSERT(dummy_task->prefix().context == parent_ctx || !CancellationInfoPresent(dummy_task), 
+                         "Unexpected exception or cancellation data in the dummy task");
+            // If possible, add assertion that master's dummy task context does not have children
+        }
+        if ( pe )
+            pe->throw_self();
+    }
+    __TBB_ASSERT(!is_worker() || !CancellationInfoPresent(dummy_task), 
+                 "Worker's dummy task context modified");
+    __TBB_ASSERT(innermost_running_task != dummy_task || !CancellationInfoPresent(dummy_task), 
+                 "Unexpected exception or cancellation data in the master's dummy task");
+#endif /* __TBB_EXCEPTIONS */
+    __TBB_ASSERT( assert_okay(), NULL );
+}
+
+#undef CancellationInfoPresent
+
+inline void GenericScheduler::do_enter_arena() {
+    arena_slot = &arena->slot[arena_index];
+    __TBB_ASSERT ( arena_slot->head == arena_slot->tail, "task deque of a free slot must be empty" );
+    arena_slot->head = dummy_slot.head;
+    arena_slot->tail = dummy_slot.tail;
+    // Release signal on behalf of previously spawned tasks (when this thread was not in arena yet)
+    ITT_NOTIFY(sync_releasing, arena_slot);
+    __TBB_store_with_release( arena_slot->task_pool, dummy_slot.task_pool );
+    // We'll leave arena only when it's empty, so clean up local instances of indices.
+    dummy_slot.head = dummy_slot.tail = 0;
+}
+
+void GenericScheduler::enter_arena() {
+    __TBB_ASSERT ( is_worker(), "only workers should use enter_arena()" );
+    __TBB_ASSERT ( arena, "no arena: initialization not completed?" );
+    __TBB_ASSERT ( !in_arena(), "worker already in arena?" );
+    __TBB_ASSERT ( arena_index < arena->prefix().number_of_workers, "invalid worker arena slot index" );
+    __TBB_ASSERT ( arena->slot[arena_index].task_pool == EmptyTaskPool, "someone else grabbed my arena slot?" );
+    do_enter_arena();
+}
+
+void GenericScheduler::try_enter_arena() {
+    __TBB_ASSERT ( !is_worker(), "only masters should use try_enter_arena()" );
+    __TBB_ASSERT ( arena, "no arena: initialization not completed?" );
+    __TBB_ASSERT ( !in_arena(), "master already in arena?" );
+    __TBB_ASSERT ( arena_index >= arena->prefix().number_of_workers && 
+                   arena_index < arena->prefix().number_of_slots, "invalid arena slot hint value" );
+
+
+    size_t h = arena_index;
+    // We do not lock task pool upon successful entering arena
+    if( arena->slot[h].task_pool != EmptyTaskPool || 
+        __TBB_CompareAndSwapW( &arena->slot[h].task_pool, (intptr_t)LockedTaskPool, 
+                                                          (intptr_t)EmptyTaskPool ) != (intptr_t)EmptyTaskPool )
+    {
+        // Hinted arena slot is already busy, try some of the others at random
+        unsigned first = arena->prefix().number_of_workers,
+                 last = arena->prefix().number_of_slots;
+        unsigned n = last - first - 1;
+        /// \todo Is this limit reasonable?
+        size_t max_attempts = last - first;
+        for (;;) {
+            size_t k = first + random.get() % n;
+            if( k >= h )
+                ++k;    // Adjusts random distribution to exclude previously tried slot
+            h = k;
+            if( arena->slot[h].task_pool == EmptyTaskPool && 
+                __TBB_CompareAndSwapW( &arena->slot[h].task_pool, (intptr_t)LockedTaskPool, 
+                                                                  (intptr_t)EmptyTaskPool ) == (intptr_t)EmptyTaskPool )
+            {
+                break;
+            }
+            if ( --max_attempts == 0 ) {
+                // After so many attempts we are still unable to find a vacant arena slot.
+                // Cease the vain effort and work outside of arena for a while.
+                return;
+            }
+        }
+    }
+    // Successfully claimed a slot in the arena.
+    ITT_NOTIFY(sync_acquired, &arena->slot[h]);
+    __TBB_ASSERT ( arena->slot[h].task_pool == LockedTaskPool, "Arena slot is not actually acquired" );
+    arena_index = h;
+    do_enter_arena();
+    attach_mailbox( affinity_id(h+1) );
+}
+
+void GenericScheduler::leave_arena() {
+    __TBB_ASSERT( in_arena(), "Not in arena" );
+    // Do not reset arena_index. It will be used to (attempt to) re-acquire the slot next time
+    __TBB_ASSERT( &arena->slot[arena_index] == arena_slot, "Arena slot and slot index mismatch" );
+    __TBB_ASSERT ( arena_slot->task_pool == LockedTaskPool, "Task pool must be locked when leaving arena" );
+    __TBB_ASSERT ( arena_slot->head == arena_slot->tail, "Cannot leave arena when the task pool is not empty" );
+    if ( !is_worker() ) {
+        my_affinity_id = 0;
+        inbox.detach();
+    }
+    ITT_NOTIFY(sync_releasing, &arena->slot[arena_index]);
+    __TBB_store_with_release( arena_slot->task_pool, EmptyTaskPool );
+    arena_slot = &dummy_slot;
+}
+
+
+GenericScheduler* GenericScheduler::create_worker( Arena& a, size_t index ) {
+    GenericScheduler* s = GenericScheduler::allocate_scheduler(&a);
+
+    // Put myself into the arena
+#if __TBB_EXCEPTIONS
+    s->dummy_task->prefix().context = &dummy_context;
+    // Sync up the local cancellation state with the global one. No need for fence here.
+    s->local_cancel_count = global_cancel_count;
+#endif /* __TBB_EXCEPTIONS */
+    s->attach_mailbox( index+1 );
+    s->arena_index = index;
+    s->init_stack_info();
+
+    __TBB_store_with_release( a.prefix().worker_list[index].scheduler, s );
+    return s;
+}
+
+
+GenericScheduler* GenericScheduler::create_master( Arena* arena ) {
+    GenericScheduler* s = GenericScheduler::allocate_scheduler( arena );
+    task& t = *s->dummy_task;
+    s->innermost_running_task = &t;
+    t.prefix().ref_count = 1;
+    Governor::sign_on(s);
+#if __TBB_EXCEPTIONS
+    // Context to be used by root tasks by default (if the user has not specified one).
+    // Allocation is done by NFS allocator because we cannot reuse memory allocated 
+    // for task objects since the free list is empty at the moment.
+    t.prefix().context = new ( NFS_Allocate(sizeof(task_group_context), 1, NULL) ) task_group_context(task_group_context::isolated);
+    scheduler_list_node_t &node = s->my_node;
+    {
+        mutex::scoped_lock lock(the_scheduler_list_mutex);
+        node.my_next = the_scheduler_list_head.my_next;
+        node.my_prev = &the_scheduler_list_head;
+        the_scheduler_list_head.my_next->my_prev = &node;
+        the_scheduler_list_head.my_next = &node;
+#endif /* __TBB_EXCEPTIONS */
+        unsigned last = arena->prefix().number_of_slots,
+                 cur_limit = arena->prefix().limit;
+        // This slot index assignment is just a hint to ...
+        if ( cur_limit < last ) {
+            // ... to prevent competition between the first few masters.
+            s->arena_index = cur_limit++;
+            // In the absence of exception handling this code is a subject to data 
+            // race in case of multiple masters concurrently entering empty arena.
+            // But it does not affect correctness, and can only result in a few 
+            // masters competing for the same arena slot during the first acquisition.
+            // The cost of competition is low in comparison to that of oversubscription.
+            arena->prefix().limit = cur_limit;
+        }
+        else {
+            // ... to minimize the probability of competition between multiple masters.
+            unsigned first = arena->prefix().number_of_workers;
+            s->arena_index = first + s->random.get() % (last - first);
+        }
+#if __TBB_EXCEPTIONS
+    }
+#endif
+    s->init_stack_info();
+#if __TBB_EXCEPTIONS
+    // Sync up the local cancellation state with the global one. No need for fence here.
+    s->local_cancel_count = global_cancel_count;
+#endif
+    __TBB_ASSERT( &task::self()==&t, NULL );
+#if __TBB_SCHEDULER_OBSERVER
+    // Process any existing observers.
+    s->notify_entry_observers();
+#endif /* __TBB_SCHEDULER_OBSERVER */
+    return s;
+}
+
+
+void GenericScheduler::cleanup_worker( void* arg ) {
+    TBB_TRACE(("%p.cleanup_worker entered\n",arg));
+    GenericScheduler& s = *(GenericScheduler*)arg;
+    __TBB_ASSERT( s.dummy_slot.task_pool, "cleaning up worker with missing task pool" );
+#if __TBB_SCHEDULER_OBSERVER
+    s.notify_exit_observers(/*is_worker=*/true);
+#endif /* __TBB_SCHEDULER_OBSERVER */
+    __TBB_ASSERT( s.arena_slot->task_pool == EmptyTaskPool || s.arena_slot->head == s.arena_slot->tail, 
+                  "worker has unfinished work at run down" );
+    s.free_scheduler();
+}
+
+void GenericScheduler::cleanup_master() {
+    TBB_TRACE(("%p.cleanup_master entered\n",this));
+    GenericScheduler& s = *this; // for similarity with cleanup_worker
+    __TBB_ASSERT( s.dummy_slot.task_pool, "cleaning up master with missing task pool" );
+#if __TBB_SCHEDULER_OBSERVER
+    s.notify_exit_observers(/*is_worker=*/false);
+#endif /* __TBB_SCHEDULER_OBSERVER */
+    if ( !is_local_task_pool_empty() ) {
+        __TBB_ASSERT ( Governor::is_set(this), "TLS slot is cleared before the task pool cleanup" );
+        s.wait_for_all( *dummy_task, NULL );
+        __TBB_ASSERT ( Governor::is_set(this), "Other thread reused our TLS key during the task pool cleanup" );
+    }
+    s.free_scheduler();
+    Governor::finish_with_arena();
+}
+
+//------------------------------------------------------------------------
+// UnpaddedArenaPrefix
+//------------------------------------------------------------------------
+inline Arena& UnpaddedArenaPrefix::arena() {
+    return *static_cast<Arena*>(static_cast<void*>( static_cast<ArenaPrefix*>(this)+1 ));
+}
+
+void UnpaddedArenaPrefix::process( job& j ) {
+    GenericScheduler& s = static_cast<GenericScheduler&>(j);
+    __TBB_ASSERT( Governor::is_set(&s), NULL );
+    __TBB_ASSERT( !s.innermost_running_task, NULL );
+    s.wait_for_all(*s.dummy_task,NULL);
+    __TBB_ASSERT( !s.innermost_running_task, NULL );
+}
+
+void UnpaddedArenaPrefix::cleanup( job& j ) {
+    GenericScheduler& s = static_cast<GenericScheduler&>(j);
+    GenericScheduler::cleanup_worker( &s );
+}
+
+void UnpaddedArenaPrefix::open_connection_to_rml() {
+    __TBB_ASSERT( !server, NULL );
+    __TBB_ASSERT( stack_size>0, NULL );
+    if( !use_private_rml ) {
+        ::rml::factory::status_type status = rml_server_factory.make_server( server, *this );
+        if( status==::rml::factory::st_success ) {
+            __TBB_ASSERT( server, NULL );
+            return;
+        }
+        use_private_rml = true;
+        fprintf(stderr,"warning from TBB: make_server failed with status %x, falling back on private rml",status);
+    }
+    server = rml::make_private_server( *this );
+}
+
+void UnpaddedArenaPrefix::acknowledge_close_connection() {
+    arena().free_arena();
+}
+
+::rml::job* UnpaddedArenaPrefix::create_one_job() {
+    GenericScheduler* s = GenericScheduler::create_worker( arena(), next_job_index++ );
+    Governor::sign_on(s);
+    return s;
+}
+
+//------------------------------------------------------------------------
+// Methods of allocate_root_proxy
+//------------------------------------------------------------------------
+task& allocate_root_proxy::allocate( size_t size ) {
+    internal::GenericScheduler* v = Governor::local_scheduler();
+    __TBB_ASSERT( v, "thread did not activate a task_scheduler_init object?" );
+#if __TBB_EXCEPTIONS
+    task_prefix& p = v->innermost_running_task->prefix();
+#endif
+    // New root task becomes part of the currently running task's cancellation context
+    return v->allocate_task( size, __TBB_CONTEXT_ARG(NULL, p.context) );
+}
+
+void allocate_root_proxy::free( task& task ) {
+    internal::GenericScheduler* v = Governor::local_scheduler();
+    __TBB_ASSERT( v, "thread does not have initialized task_scheduler_init object?" );
+#if __TBB_EXCEPTIONS
+    // No need to do anything here as long as there is no context -> task connection
+#endif /* __TBB_EXCEPTIONS */
+    v->free_task<GenericScheduler::is_local>( task );
+}
+
+#if __TBB_EXCEPTIONS
+//------------------------------------------------------------------------
+// Methods of allocate_root_with_context_proxy
+//------------------------------------------------------------------------
+task& allocate_root_with_context_proxy::allocate( size_t size ) const {
+    internal::GenericScheduler* v = Governor::local_scheduler();
+    __TBB_ASSERT( v, "thread did not activate a task_scheduler_init object?" );
+    task_prefix& p = v->innermost_running_task->prefix();
+    task& t = v->allocate_task( size, __TBB_CONTEXT_ARG(NULL, &my_context) );
+    // The supported usage model prohibits concurrent initial binding. Thus we 
+    // do not need interlocked operations or fences here.
+    if ( my_context.my_kind == task_group_context::binding_required ) {
+        __TBB_ASSERT ( my_context.my_owner, "Context without owner" );
+        __TBB_ASSERT ( !my_context.my_parent, "Parent context set before initial binding" );
+        // If we are in the outermost task dispatch loop of a master thread, then
+        // there is nothing to bind this context to, and we skip the binding part.
+        if ( v->innermost_running_task != v->dummy_task ) {
+            // By not using the fence here we get faster code in case of normal execution 
+            // flow in exchange of a bit higher probability that in cases when cancellation 
+            // is in flight we will take deeper traversal branch. Normally cache coherency 
+            // mechanisms are efficient enough to deliver updated value most of the time.
+            uintptr_t local_count_snapshot = ((GenericScheduler*)my_context.my_owner)->local_cancel_count;
+            __TBB_store_with_release(my_context.my_parent, p.context);
+            uintptr_t global_count_snapshot = __TBB_load_with_acquire(global_cancel_count);
+            if ( !my_context.my_cancellation_requested ) {
+                if ( local_count_snapshot == global_count_snapshot ) {
+                    // It is possible that there is active cancellation request in our 
+                    // parents chain. Fortunately the equality of the local and global 
+                    // counters means that if this is the case it's already been propagated
+                    // to our parent.
+                    my_context.my_cancellation_requested = p.context->my_cancellation_requested;
+                } else {
+                    // Another thread was propagating cancellation request at the moment 
+                    // when we set our parent, but since we do not use locks we could've 
+                    // been skipped. So we have to make sure that we get the cancellation 
+                    // request if one of our ancestors has been canceled.
+                    my_context.propagate_cancellation_from_ancestors();
+                }
+            }
+        }
+        my_context.my_kind = task_group_context::binding_completed;
+    }
+    // else the context either has already been associated with its parent or is isolated
+    return t;
+}
+
+void allocate_root_with_context_proxy::free( task& task ) const {
+    internal::GenericScheduler* v = Governor::local_scheduler();
+    __TBB_ASSERT( v, "thread does not have initialized task_scheduler_init object?" );
+    // No need to do anything here as long as unbinding is performed by context destructor only.
+    v->free_task<GenericScheduler::is_local>( task );
+}
+#endif /* __TBB_EXCEPTIONS */
+
+//------------------------------------------------------------------------
+// Methods of allocate_continuation_proxy
+//------------------------------------------------------------------------
+task& allocate_continuation_proxy::allocate( size_t size ) const {
+    task& t = *((task*)this);
+    __TBB_ASSERT( AssertOkay(t), NULL );
+    GenericScheduler* s = Governor::local_scheduler();
+    task* parent = t.parent();
+    t.prefix().parent = NULL;
+    return s->allocate_task( size, __TBB_CONTEXT_ARG(parent, t.prefix().context) );
+}
+
+void allocate_continuation_proxy::free( task& mytask ) const {
+    // Restore the parent as it was before the corresponding allocate was called.
+    ((task*)this)->prefix().parent = mytask.parent();
+    Governor::local_scheduler()->free_task<GenericScheduler::is_local>(mytask);
+}
+
+//------------------------------------------------------------------------
+// Methods of allocate_child_proxy
+//------------------------------------------------------------------------
+task& allocate_child_proxy::allocate( size_t size ) const {
+    task& t = *((task*)this);
+    __TBB_ASSERT( AssertOkay(t), NULL );
+    GenericScheduler* s = Governor::local_scheduler();
+    return s->allocate_task( size, __TBB_CONTEXT_ARG(&t, t.prefix().context) );
+}
+
+void allocate_child_proxy::free( task& mytask ) const {
+    Governor::local_scheduler()->free_task<GenericScheduler::is_local>(mytask);
+}
+
+//------------------------------------------------------------------------
+// Methods of allocate_additional_child_of_proxy
+//------------------------------------------------------------------------
+task& allocate_additional_child_of_proxy::allocate( size_t size ) const {
+    __TBB_ASSERT( AssertOkay(self), NULL );
+    parent.increment_ref_count();
+    GenericScheduler* s = Governor::local_scheduler();
+    return s->allocate_task( size, __TBB_CONTEXT_ARG(&parent, parent.prefix().context) );
+}
+
+void allocate_additional_child_of_proxy::free( task& task ) const {
+    // Undo the increment.  We do not check the result of the fetch-and-decrement.
+    // We could consider be spawning the task if the fetch-and-decrement returns 1.
+    // But we do not know that was the programmer's intention.
+    // Furthermore, if it was the programmer's intention, the program has a fundamental
+    // race condition (that we warn about in Reference manual), because the
+    // reference count might have become zero before the corresponding call to
+    // allocate_additional_child_of_proxy::allocate.
+    parent.internal_decrement_ref_count();
+    Governor::local_scheduler()->free_task<GenericScheduler::is_local>(task);
+}
+
+//------------------------------------------------------------------------
+// Support for auto_partitioner
+//------------------------------------------------------------------------
+size_t get_initial_auto_partitioner_divisor() {
+    const size_t X_FACTOR = 4;
+    return X_FACTOR * (Governor::number_of_workers_in_arena()+1);
+}
+
+//------------------------------------------------------------------------
+// Methods of affinity_partitioner_base_v3
+//------------------------------------------------------------------------
+void affinity_partitioner_base_v3::resize( unsigned factor ) {
+    // Check factor to avoid asking for number of workers while there might be no arena.
+    size_t new_size = factor ? factor*(Governor::number_of_workers_in_arena()+1) : 0;
+    if( new_size!=my_size ) {
+        if( my_array ) {
+            NFS_Free( my_array );
+            // Following two assignments must be done here for sake of exception safety.
+            my_array = NULL;
+            my_size = 0;
+        } 
+        if( new_size ) {
+            my_array = static_cast<affinity_id*>(NFS_Allocate(new_size,sizeof(affinity_id), NULL ));
+            memset( my_array, 0, sizeof(affinity_id)*new_size );
+            my_size = new_size;
+        } 
+    } 
+}
+
+} // namespace internal
+
+using namespace tbb::internal;
+
+#if __TBB_EXCEPTIONS
+
+//------------------------------------------------------------------------
+// captured_exception
+//------------------------------------------------------------------------
+
+inline 
+void copy_string ( char*& dst, const char* src ) {
+    if ( src ) {
+        size_t len = strlen(src) + 1;
+        dst = (char*)allocate_via_handler_v3(len);
+        strncpy (dst, src, len);
+    }
+    else
+        dst = NULL;
+}
+
+void captured_exception::set ( const char* name, const char* info ) throw()
+{
+    copy_string(const_cast<char*&>(my_exception_name), name);
+    copy_string(const_cast<char*&>(my_exception_info), info);
+}
+
+void captured_exception::clear () throw() {
+    deallocate_via_handler_v3 (const_cast<char*>(my_exception_name));
+    deallocate_via_handler_v3 (const_cast<char*>(my_exception_info));
+}
+
+captured_exception* captured_exception::move () throw() {
+    captured_exception *e = (captured_exception*)allocate_via_handler_v3(sizeof(captured_exception));
+    if ( e ) {
+        ::new (e) captured_exception();
+        e->my_exception_name = my_exception_name;
+        e->my_exception_info = my_exception_info;
+        e->my_dynamic = true;
+        my_exception_name = my_exception_info = NULL;
+    }
+    return e;
+}
+
+void captured_exception::destroy () throw() {
+    __TBB_ASSERT ( my_dynamic, "Method destroy can be used only on objects created by clone or allocate" );
+    if ( my_dynamic ) {
+        this->captured_exception::~captured_exception();
+        deallocate_via_handler_v3 (this);
+    }
+}
+
+captured_exception* captured_exception::allocate ( const char* name, const char* info ) {
+    captured_exception *e = (captured_exception*)allocate_via_handler_v3( sizeof(captured_exception) );
+    if ( e ) {
+        ::new (e) captured_exception(name, info);
+        e->my_dynamic = true;
+    }
+    return e;
+}
+
+const char* captured_exception::name() const throw() {
+    return my_exception_name;
+}
+
+const char* captured_exception::what() const throw() {
+    return my_exception_info;
+}
+
+
+//------------------------------------------------------------------------
+// tbb_exception_ptr
+//------------------------------------------------------------------------
+
+#if !TBB_USE_CAPTURED_EXCEPTION
+
+namespace internal {
+
+template<typename T>
+tbb_exception_ptr* AllocateExceptionContainer( const T& src ) {
+    tbb_exception_ptr *eptr = (tbb_exception_ptr*)allocate_via_handler_v3( sizeof(tbb_exception_ptr) );
+    if ( eptr )
+        new (eptr) tbb_exception_ptr(src);
+    return eptr;
+}
+
+tbb_exception_ptr* tbb_exception_ptr::allocate () {
+    return AllocateExceptionContainer( std::current_exception() );
+}
+
+tbb_exception_ptr* tbb_exception_ptr::allocate ( const tbb_exception& ) {
+    return AllocateExceptionContainer( std::current_exception() );
+}
+
+tbb_exception_ptr* tbb_exception_ptr::allocate ( captured_exception& src ) {
+    tbb_exception_ptr *res = AllocateExceptionContainer( src );
+    src.destroy();
+    return res;
+}
+
+void tbb_exception_ptr::destroy () throw() {
+    this->tbb_exception_ptr::~tbb_exception_ptr();
+    deallocate_via_handler_v3 (this);
+}
+
+} // namespace internal
+#endif /* !TBB_USE_CAPTURED_EXCEPTION */
+
+
+//------------------------------------------------------------------------
+// task_group_context
+//------------------------------------------------------------------------
+
+task_group_context::~task_group_context () {
+    if ( my_kind != isolated ) {
+        GenericScheduler *s = (GenericScheduler*)my_owner;
+        __TBB_ASSERT ( Governor::is_set(s), "Task group context is destructed by wrong thread" );
+        my_node.my_next->my_prev = my_node.my_prev;
+        uintptr_t local_count_snapshot = s->local_cancel_count;
+        my_node.my_prev->my_next = my_node.my_next;
+        __TBB_rel_acq_fence();
+        if ( local_count_snapshot != global_cancel_count ) {
+            // Another thread was propagating cancellation request when we removed
+            // ourselves from the list. We must ensure that it does not access us 
+            // when this destructor finishes. We'll be able to acquire the lock 
+            // below only after the other thread finishes with us.
+            spin_mutex::scoped_lock lock(s->context_list_mutex);
+        }
+    }
+#if TBB_USE_DEBUG
+    my_version_and_traits = 0xDeadBeef;
+#endif /* TBB_USE_DEBUG */
+    if ( my_exception )
+        my_exception->destroy();
+}
+
+void task_group_context::init () {
+    __TBB_ASSERT ( sizeof(uintptr_t) < 32, "Layout of my_version_and_traits must be reconsidered on this platform" );
+    __TBB_ASSERT ( sizeof(task_group_context) == 2 * NFS_MaxLineSize, "Context class has wrong size - check padding and members alignment" );
+    __TBB_ASSERT ( (uintptr_t(this) & (sizeof(my_cancellation_requested) - 1)) == 0, "Context is improperly aligned" );
+    __TBB_ASSERT ( my_kind == isolated || my_kind == bound, "Context can be created only as isolated or bound" );
+    my_parent = NULL;
+    my_cancellation_requested = 0;
+    my_exception = NULL;
+    if ( my_kind == bound ) {
+        GenericScheduler *s = Governor::local_scheduler();
+        my_owner = s;
+        __TBB_ASSERT ( my_owner, "Thread has not activated a task_scheduler_init object?" );
+        // Backward links are used by this thread only, thus no fences are necessary
+        my_node.my_prev = &s->context_list_head;
+        s->context_list_head.my_next->my_prev = &my_node;
+        // The only operation on the thread local list of contexts that may be performed 
+        // concurrently is its traversal by another thread while propagating cancellation
+        // request. Therefore the release fence below is necessary to ensure that the new 
+        // value of my_node.my_next is visible to the traversing thread 
+        // after it reads new value of v->context_list_head.my_next.
+        my_node.my_next = s->context_list_head.my_next;
+        __TBB_store_with_release(s->context_list_head.my_next, &my_node);
+    }
+}
+
+bool task_group_context::cancel_group_execution () {
+    __TBB_ASSERT ( my_cancellation_requested == 0 || my_cancellation_requested == 1, "Invalid cancellation state");
+    if ( my_cancellation_requested || __TBB_CompareAndSwapW(&my_cancellation_requested, 1, 0) ) {
+        // This task group has already been canceled
+        return false;
+    }
+    Governor::local_scheduler()->propagate_cancellation(this);
+    return true;
+}
+
+bool task_group_context::is_group_execution_cancelled () const {
+    return my_cancellation_requested != 0;
+}
+
+// IMPORTANT: It is assumed that this method is not used concurrently!
+void task_group_context::reset () {
+    //! \todo Add assertion that this context does not have children
+    // No fences are necessary since this context can be accessed from another thread
+    // only after stealing happened (which means necessary fences were used).
+    if ( my_exception )  {
+        my_exception->destroy();
+        my_exception = NULL;
+    }
+    my_cancellation_requested = 0;
+}
+
+void task_group_context::propagate_cancellation_from_ancestors () {
+    task_group_context *parent = my_parent;
+    while ( parent && !parent->my_cancellation_requested )
+        parent = parent->my_parent;
+    if ( parent ) {
+        // One of our ancestor groups was canceled. Cancel all its descendants.
+        task_group_context *ctx = this;
+        do {
+            __TBB_store_with_release(ctx->my_cancellation_requested, 1);
+            ctx = ctx->my_parent;
+        } while ( ctx != parent );
+    }
+}
+
+void task_group_context::register_pending_exception () {
+    if ( my_cancellation_requested )
+        return;
+    try {
+        throw;
+    } TbbCatchAll( this );
+}
+
+#endif /* __TBB_EXCEPTIONS */
+
+//------------------------------------------------------------------------
+// task
+//------------------------------------------------------------------------
+
+void task::internal_set_ref_count( int count ) {
+    __TBB_ASSERT( count>=0, "count must not be negative" );
+    __TBB_ASSERT( !(prefix().extra_state&GenericScheduler::es_ref_count_active), "ref_count race detected" );
+    ITT_NOTIFY(sync_releasing, &prefix().ref_count);
+    prefix().ref_count = count;
+}
+
+internal::reference_count task::internal_decrement_ref_count() {
+    ITT_NOTIFY( sync_releasing, &prefix().ref_count );
+    internal::reference_count k = __TBB_FetchAndDecrementWrelease( &prefix().ref_count );
+    __TBB_ASSERT( k>=1, "task's reference count underflowed" );
+    if( k==1 )
+        ITT_NOTIFY( sync_acquired, &prefix().ref_count );
+    return k-1;
+}
+
+task& task::self() {
+    GenericScheduler *v = Governor::local_scheduler();
+    __TBB_ASSERT( v->assert_okay(), NULL );
+    __TBB_ASSERT( v->innermost_running_task, NULL );
+    return *v->innermost_running_task;
+}
+
+bool task::is_owned_by_current_thread() const {
+    return true;
+}
+
+void task::destroy( task& victim ) {
+    __TBB_ASSERT( victim.prefix().ref_count== (ConcurrentWaitsEnabled(victim) ? 1 : 0), "Task being destroyed must not have children" );
+    __TBB_ASSERT( victim.state()==task::allocated, "illegal state for victim task" );
+    task* parent = victim.parent();
+    victim.~task();
+    if( parent ) {
+        __TBB_ASSERT( parent->state()==task::allocated, "attempt to destroy child of running or corrupted parent?" );
+        parent->internal_decrement_ref_count();
+    }
+    Governor::local_scheduler()->free_task<GenericScheduler::no_hint>( victim );
+}
+
+void task::spawn_and_wait_for_all( task_list& list ) {
+    scheduler* s = Governor::local_scheduler();
+    task* t = list.first;
+    if( t ) {
+        if( &t->prefix().next!=list.next_ptr )
+            s->spawn( *t->prefix().next, *list.next_ptr );
+        list.clear();
+    }
+    s->wait_for_all( *this, t );
+}
+
+/** Defined out of line so that compiler does not replicate task's vtable. 
+    It's pointless to define it inline anyway, because all call sites to it are virtual calls
+    that the compiler is unlikely to optimize. */
+void task::note_affinity( affinity_id ) {
+}
+
+//------------------------------------------------------------------------
+// task_scheduler_init
+//------------------------------------------------------------------------
+
+/** Left out-of-line for the sake of the backward binary compatibility **/
+void task_scheduler_init::initialize( int number_of_threads ) {
+    initialize( number_of_threads, 0 );
+}
+
+void task_scheduler_init::initialize( int number_of_threads, stack_size_type thread_stack_size ) {
+    if( number_of_threads!=deferred ) {
+        __TBB_ASSERT( !my_scheduler, "task_scheduler_init already initialized" );
+        __TBB_ASSERT( number_of_threads==-1 || number_of_threads>=1,
+                    "number_of_threads for task_scheduler_init must be -1 or positive" );
+        my_scheduler = Governor::init_scheduler( number_of_threads, thread_stack_size );
+    } else {
+        __TBB_ASSERT( !thread_stack_size, "deferred initialization ignores stack size setting" );
+    }
+}
+
+void task_scheduler_init::terminate() {
+    GenericScheduler* s = static_cast<GenericScheduler*>(my_scheduler);
+    my_scheduler = NULL;
+    __TBB_ASSERT( s, "task_scheduler_init::terminate without corresponding task_scheduler_init::initialize()");
+    Governor::terminate_scheduler(s);
+}
+
+int task_scheduler_init::default_num_threads() {
+    // No memory fence required, because at worst each invoking thread calls NumberOfHardwareThreads.
+    int n = DefaultNumberOfThreads;
+    if( !n ) {
+        DefaultNumberOfThreads = n = DetectNumberOfWorkers();
+    }
+    return n;
+}
+
+#if __TBB_SCHEDULER_OBSERVER
+//------------------------------------------------------------------------
+// Methods of observer_proxy
+//------------------------------------------------------------------------
+namespace internal {
+
+#if TBB_USE_ASSERT
+static atomic<int> observer_proxy_count;
+
+struct check_observer_proxy_count {
+    ~check_observer_proxy_count() {
+        if( observer_proxy_count!=0 ) {
+            fprintf(stderr,"warning: leaked %ld observer_proxy objects\n", long(observer_proxy_count));
+        }
+    }
+};
+
+static check_observer_proxy_count the_check_observer_proxy_count;
+#endif /* TBB_USE_ASSERT */
+
+observer_proxy::observer_proxy( task_scheduler_observer_v3& tso ) : next(NULL), observer(&tso) {
+#if TBB_USE_ASSERT
+    ++observer_proxy_count;
+#endif /* TBB_USE_ASSERT */
+    // 1 for observer
+    gc_ref_count = 1;
+    {
+        // Append to the global list
+        task_scheduler_observer_mutex_scoped_lock lock(the_task_scheduler_observer_mutex.begin()[0],/*is_writer=*/true);
+        observer_proxy* p = global_last_observer_proxy;
+        prev = p;
+        if( p ) 
+            p->next=this;
+        else 
+            global_first_observer_proxy = this;
+        global_last_observer_proxy = this;
+    }
+}
+
+void observer_proxy::remove_from_list() {
+    // Take myself off the global list.  
+    if( next ) 
+        next->prev = prev;
+    else 
+        global_last_observer_proxy = prev;
+    if( prev )
+        prev->next = next;
+    else 
+        global_first_observer_proxy = next;
+#if TBB_USE_ASSERT
+    poison_pointer(prev);
+    poison_pointer(next);
+    gc_ref_count = -666;
+#endif /* TBB_USE_ASSERT */
+}
+
+void observer_proxy::remove_ref_slow() {
+    int r = gc_ref_count;
+    while(r>1) {
+        __TBB_ASSERT( r!=0, NULL );
+        int r_old = gc_ref_count.compare_and_swap(r-1,r);
+        if( r_old==r ) {
+            // Successfully decremented count.
+            return;
+        } 
+        r = r_old;
+    } 
+    __TBB_ASSERT( r==1, NULL );
+    // Reference count might go to zero
+    {
+        task_scheduler_observer_mutex_scoped_lock lock(the_task_scheduler_observer_mutex.begin()[0],/*is_writer=*/true);
+        r = --gc_ref_count;
+        if( !r ) {
+            remove_from_list();
+        } 
+    }
+    if( !r ) {
+        __TBB_ASSERT( gc_ref_count == -666, NULL );
+#if TBB_USE_ASSERT
+        --observer_proxy_count;
+#endif /* TBB_USE_ASSERT */
+        delete this;
+    }
+}
+
+observer_proxy* observer_proxy::process_list( observer_proxy* local_last, bool is_worker, bool is_entry ) {
+    // Pointer p marches though the list.
+    // If is_entry, start with our previous list position, otherwise start at beginning of list.
+    observer_proxy* p = is_entry ? local_last : NULL;
+    for(;;) { 
+        task_scheduler_observer* tso=NULL;
+        // Hold lock on list only long enough to advance to next proxy in list.
+        { 
+            task_scheduler_observer_mutex_scoped_lock lock(the_task_scheduler_observer_mutex.begin()[0],/*is_writer=*/false);
+            do {
+                if( local_last && local_last->observer ) {
+                    // 2 = 1 for observer and 1 for local_last
+                    __TBB_ASSERT( local_last->gc_ref_count>=2, NULL );  
+                    // Can decrement count quickly, because it cannot become zero here.
+                    --local_last->gc_ref_count;
+                    local_last = NULL;
+                } else {
+                    // Use slow form of decrementing the reference count, after lock is released.
+                }  
+                if( p ) {
+                    // We were already processing the list.
+                    if( observer_proxy* q = p->next ) {
+                        // Step to next item in list.
+                        p=q;
+                    } else {
+                        // At end of list.
+                        if( is_entry ) {  
+                            // Remember current position in the list, so we can start at on the next call.
+                            ++p->gc_ref_count;
+                        } else {
+                            // Finishin running off the end of the list
+                            p=NULL;
+                        }
+                        goto done;
+                    }
+                } else {
+                    // Starting pass through the list
+                    p = global_first_observer_proxy;
+                    if( !p ) 
+                        goto done;
+                } 
+                tso = p->observer;
+            } while( !tso );
+            ++p->gc_ref_count;
+            ++tso->my_busy_count;
+        }
+        __TBB_ASSERT( !local_last || p!=local_last, NULL );
+        if( local_last )
+            local_last->remove_ref_slow();
+        // Do not hold any locks on the list while calling user's code.
+        try {    
+            if( is_entry )
+                tso->on_scheduler_entry( is_worker );
+            else
+                tso->on_scheduler_exit( is_worker );
+        } catch(...) {
+            // Suppress exception, because user routines are supposed to be observing, not changing
+            // behavior of a master or worker thread.
+#if TBB_USE_ASSERT
+            fprintf(stderr,"warning: %s threw exception\n",is_entry?"on_scheduler_entry":"on_scheduler_exit"); 
+#endif /* __TBB_USE_ASSERT */        
+        }
+        intptr bc = --tso->my_busy_count;
+        __TBB_ASSERT_EX( bc>=0, "my_busy_count underflowed" );
+        local_last = p;
+    }
+done:
+    // Return new value to be used as local_last next time.
+    if( local_last )
+        local_last->remove_ref_slow();
+    __TBB_ASSERT( !p || is_entry, NULL );
+    return p;
+}
+
+void task_scheduler_observer_v3::observe( bool state ) {
+    if( state ) {
+        if( !my_proxy ) {
+            if( !__TBB_InitOnce::initialization_done() )
+                DoOneTimeInitializations();
+            my_busy_count = 0;
+            my_proxy = new observer_proxy(*this);
+            if( GenericScheduler* s = Governor::local_scheduler() ) {
+                // Notify newly created observer of its own thread.
+                // Any other pending observers are notified too.
+                s->notify_entry_observers();
+            }
+        } 
+    } else {
+        if( observer_proxy* proxy = my_proxy ) {
+            my_proxy = NULL;
+            __TBB_ASSERT( proxy->gc_ref_count>=1, "reference for observer missing" );
+            {
+                task_scheduler_observer_mutex_scoped_lock lock(the_task_scheduler_observer_mutex.begin()[0],/*is_writer=*/true);
+                proxy->observer = NULL;
+            }
+            proxy->remove_ref_slow();
+            while( my_busy_count ) {
+                __TBB_Yield();
+            }
+        }
+    }
+}
+
+} // namespace internal
+#endif /* __TBB_SCHEDULER_OBSERVER */
+
+} // namespace tbb
+
+
diff --git a/dep/tbb/src/tbb/tbb_assert_impl.h b/dep/tbb/src/tbb/tbb_assert_impl.h
new file mode 100644
index 000000000..2a381f9d0
--- /dev/null
+++ b/dep/tbb/src/tbb/tbb_assert_impl.h
@@ -0,0 +1,101 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+// IMPORTANT: To use assertion handling in TBB, exactly one of the TBB source files
+// should #include tbb_assert_impl.h thus instantiating assertion handling routines.
+// The intent of putting it to a separate file is to allow some tests to use it
+// as well in order to avoid dependency on the library.
+
+// include headers for required function declarations
+#include <cstdlib>
+#include <stdio.h>
+#include <string.h>
+#include <stdarg.h>
+#if _MSC_VER
+#include <crtdbg.h>
+#define __TBB_USE_DBGBREAK_DLG TBB_USE_DEBUG
+#endif
+
+#if _MSC_VER >= 1400
+#define __TBB_EXPORTED_FUNC   __cdecl
+#else
+#define __TBB_EXPORTED_FUNC
+#endif
+
+using namespace std;
+
+namespace tbb {
+    //! Type for an assertion handler
+    typedef void(*assertion_handler_type)( const char* filename, int line, const char* expression, const char * comment );
+
+    static assertion_handler_type assertion_handler;
+
+    assertion_handler_type __TBB_EXPORTED_FUNC set_assertion_handler( assertion_handler_type new_handler ) {
+        assertion_handler_type old_handler = assertion_handler;
+        assertion_handler = new_handler;
+        return old_handler;
+    }
+
+    void __TBB_EXPORTED_FUNC assertion_failure( const char* filename, int line, const char* expression, const char* comment ) {
+        if( assertion_handler_type a = assertion_handler ) {
+            (*a)(filename,line,expression,comment);
+        } else {
+            static bool already_failed;
+            if( !already_failed ) {
+                already_failed = true;
+                fprintf( stderr, "Assertion %s failed on line %d of file %s\n",
+                         expression, line, filename );
+                if( comment )
+                    fprintf( stderr, "Detailed description: %s\n", comment );
+#if __TBB_USE_DBGBREAK_DLG
+                if(1 == _CrtDbgReport(_CRT_ASSERT, filename, line, "tbb_debug.dll", "%s\r\n%s", expression, comment?comment:""))
+                        _CrtDbgBreak();
+#else
+                fflush(stderr);
+                abort();
+#endif
+            }
+        }
+    }
+
+#if defined(_MSC_VER)&&_MSC_VER<1400
+#   define vsnprintf _vsnprintf
+#endif
+
+    namespace internal {
+        //! Report a runtime warning.
+        void __TBB_EXPORTED_FUNC runtime_warning( const char* format, ... )
+        {
+            char str[1024]; memset(str, 0, 1024);
+            va_list args; va_start(args, format);
+            vsnprintf( str, 1024-1, format, args);
+            fprintf( stderr, "TBB Warning: %s\n", str);
+        }
+    } // namespace internal
+
+} /* namespace tbb */
diff --git a/dep/tbb/src/tbb/tbb_misc.cpp b/dep/tbb/src/tbb/tbb_misc.cpp
new file mode 100644
index 000000000..75ba5d582
--- /dev/null
+++ b/dep/tbb/src/tbb/tbb_misc.cpp
@@ -0,0 +1,157 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+// Source file for miscellaneous entities that are infrequently referenced by 
+// an executing program.
+
+#include "tbb/tbb_stddef.h"
+// Out-of-line TBB assertion handling routines are instantiated here.
+#include "tbb_assert_impl.h"
+
+#include "tbb_misc.h"
+#include <cstdio>
+#include <cstdlib>
+#include <cstring>
+#if defined(__EXCEPTIONS) || defined(_CPPUNWIND) || defined(__SUNPRO_CC)
+    #include "tbb/tbb_exception.h"
+    #include <string> // std::string is used to construct runtime_error
+    #include <stdexcept>
+#endif
+
+using namespace std;
+
+#include "tbb/tbb_machine.h"
+
+namespace tbb {
+
+namespace internal {
+
+#if defined(__EXCEPTIONS) || defined(_CPPUNWIND) || defined(__SUNPRO_CC)
+// The above preprocessor symbols are defined by compilers when exception handling is enabled.
+// However, in some cases it could be disabled for this file.
+
+void handle_perror( int error_code, const char* what ) {
+    char buf[128];
+    sprintf(buf,"%s: ",what);
+    char* end = strchr(buf,0);
+    size_t n = buf+sizeof(buf)-end;
+    strncpy( end, strerror( error_code ), n );
+    // Ensure that buffer ends in terminator.
+    buf[sizeof(buf)-1] = 0; 
+    throw runtime_error(buf);
+}
+
+void throw_bad_last_alloc_exception_v4() 
+{
+    throw bad_last_alloc();
+}
+#endif //__EXCEPTIONS || _CPPUNWIND
+
+bool GetBoolEnvironmentVariable( const char * name ) {
+    if( const char* s = getenv(name) )
+        return strcmp(s,"0") != 0;
+    return false;
+}
+
+#include "tbb_version.h"
+
+/** The leading "\0" is here so that applying "strings" to the binary delivers a clean result. */
+static const char VersionString[] = "\0" TBB_VERSION_STRINGS;
+
+static bool PrintVersionFlag = false;
+
+void PrintVersion() {
+    PrintVersionFlag = true;
+    fputs(VersionString+1,stderr);
+}
+
+void PrintExtraVersionInfo( const char* category, const char* description ) {
+    if( PrintVersionFlag ) 
+        fprintf(stderr, "%s: %s\t%s\n", "TBB", category, description );
+}
+
+void PrintRMLVersionInfo( void* arg, const char* server_info )
+{
+    PrintExtraVersionInfo( server_info, (const char *)arg );
+}
+
+} // namespace internal
+ 
+extern "C" int TBB_runtime_interface_version() {
+    return TBB_INTERFACE_VERSION;
+}
+
+} // namespace tbb
+
+#if !__TBB_RML_STATIC
+#if __TBB_x86_32
+
+#include "tbb/atomic.h"
+
+// in MSVC environment, int64_t defined in tbb::internal namespace only (see tbb_stddef.h)
+#if _MSC_VER
+using tbb::internal::int64_t;
+#endif
+
+//! Warn about 8-byte store that crosses a cache line.
+extern "C" void __TBB_machine_store8_slow_perf_warning( volatile void *ptr ) {
+    // Report run-time warning unless we have already recently reported warning for that address.
+    const unsigned n = 4;
+    static tbb::atomic<void*> cache[n];
+    static tbb::atomic<unsigned> k;
+    for( unsigned i=0; i<n; ++i ) 
+        if( ptr==cache[i] ) 
+            goto done;
+    cache[(k++)%n] = const_cast<void*>(ptr);
+    tbb::internal::runtime_warning( "atomic store on misaligned 8-byte location %p is slow", ptr );
+done:;
+}
+
+//! Handle 8-byte store that crosses a cache line.
+extern "C" void __TBB_machine_store8_slow( volatile void *ptr, int64_t value ) {
+    for( tbb::internal::atomic_backoff b;; b.pause() ) {
+        int64_t tmp = *(int64_t*)ptr;
+        if( __TBB_machine_cmpswp8(ptr,value,tmp)==tmp ) 
+            break;
+    }
+}
+
+#endif /* __TBB_x86_32 */
+#endif /* !__TBB_RML_STATIC */
+
+#if __TBB_ipf
+extern "C" intptr_t __TBB_machine_lockbyte( volatile unsigned char& flag ) {
+    if ( !__TBB_TryLockByte(flag) ) {
+        tbb::internal::atomic_backoff b;
+        do {
+            b.pause();
+        } while ( !__TBB_TryLockByte(flag) );
+    }
+    return 0;
+}
+#endif
diff --git a/dep/tbb/src/tbb/tbb_misc.h b/dep/tbb/src/tbb/tbb_misc.h
new file mode 100644
index 000000000..7481899c4
--- /dev/null
+++ b/dep/tbb/src/tbb/tbb_misc.h
@@ -0,0 +1,132 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef _TBB_tbb_misc_H
+#define _TBB_tbb_misc_H
+
+#include "tbb/tbb_stddef.h"
+#include "tbb/tbb_machine.h"
+
+#if _WIN32||_WIN64
+#include <windows.h>
+#elif defined(__linux__)
+#include <sys/sysinfo.h>
+#elif defined(__sun)
+#include <sys/sysinfo.h>
+#include <unistd.h>
+#elif defined(__APPLE__)
+#include <sys/types.h>
+#include <sys/sysctl.h>
+#elif defined(__FreeBSD__)
+#include <unistd.h>
+#endif
+
+namespace tbb {
+
+namespace internal {
+
+#if defined(__TBB_DetectNumberOfWorkers)
+static inline int DetectNumberOfWorkers() {
+    return __TBB_DetectNumberOfWorkers(); 
+}
+
+#else
+
+#if _WIN32||_WIN64
+static inline int DetectNumberOfWorkers() {
+    SYSTEM_INFO si;
+    GetSystemInfo(&si);
+    return static_cast<int>(si.dwNumberOfProcessors);
+}
+
+#elif defined(__linux__) || defined(__APPLE__) || defined(__FreeBSD__) || defined(__sun) 
+static inline int DetectNumberOfWorkers() {
+    long number_of_workers;
+
+#if (defined(__FreeBSD__) || defined(__sun)) && defined(_SC_NPROCESSORS_ONLN) 
+    number_of_workers = sysconf(_SC_NPROCESSORS_ONLN);
+
+// In theory, sysconf should work everywhere.
+// But in practice, system-specific methods are more reliable
+#elif defined(__linux__)
+    number_of_workers = get_nprocs();
+#elif defined(__APPLE__)
+    int name[2] = {CTL_HW, HW_AVAILCPU};
+    int ncpu;
+    size_t size = sizeof(ncpu);
+    sysctl( name, 2, &ncpu, &size, NULL, 0 );
+    number_of_workers = ncpu;
+#else
+#error DetectNumberOfWorkers: Method to detect the number of online CPUs is unknown
+#endif
+
+// Fail-safety strap
+    if ( number_of_workers < 1 ) {
+        number_of_workers = 1;
+    }
+    
+    return number_of_workers;
+}
+
+#else
+#error DetectNumberOfWorkers: OS detection method is unknown
+
+#endif /* os kind */
+
+#endif
+
+// assertion_failure is declared in tbb/tbb_stddef.h because it user code
+// needs to see its declaration.
+
+//! Throw std::runtime_error of form "(what): (strerror of error_code)"
+/* The "what" should be fairly short, not more than about 64 characters.
+   Because we control all the call sites to handle_perror, it is pointless
+   to bullet-proof it for very long strings.
+
+   Design note: ADR put this routine off to the side in tbb_misc.cpp instead of
+   Task.cpp because the throw generates a pathetic lot of code, and ADR wanted
+   this large chunk of code to be placed on a cold page. */
+void __TBB_EXPORTED_FUNC handle_perror( int error_code, const char* what );
+
+//! True if environment variable with given name is set and not 0; otherwise false.
+bool GetBoolEnvironmentVariable( const char * name );
+
+//! Print TBB version information on stderr
+void PrintVersion();
+
+//! Print extra TBB version information on stderr
+void PrintExtraVersionInfo( const char* category, const char* description );
+
+//! A callback routine to print RML version information on stderr
+void PrintRMLVersionInfo( void* arg, const char* server_info );
+
+} // namespace internal
+
+} // namespace tbb
+
+#endif /* _TBB_tbb_misc_H */
diff --git a/dep/tbb/src/tbb/tbb_resource.rc b/dep/tbb/src/tbb/tbb_resource.rc
new file mode 100644
index 000000000..d61cac42b
--- /dev/null
+++ b/dep/tbb/src/tbb/tbb_resource.rc
@@ -0,0 +1,126 @@
+// Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+//
+// This file is part of Threading Building Blocks.
+//
+// Threading Building Blocks is free software; you can redistribute it
+// and/or modify it under the terms of the GNU General Public License
+// version 2 as published by the Free Software Foundation.
+//
+// Threading Building Blocks is distributed in the hope that it will be
+// useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+// of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with Threading Building Blocks; if not, write to the Free Software
+// Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+//
+// As a special exception, you may use this file as part of a free software
+// library without restriction.  Specifically, if other files instantiate
+// templates or use macros or inline functions from this file, or you compile
+// this file and link it with other files to produce an executable, this
+// file does not by itself cause the resulting executable to be covered by
+// the GNU General Public License.  This exception does not however
+// invalidate any other reasons why the executable file might be covered by
+// the GNU General Public License.
+
+// Microsoft Visual C++ generated resource script.
+//
+#ifdef APSTUDIO_INVOKED
+#ifndef APSTUDIO_READONLY_SYMBOLS
+#define _APS_NO_MFC                     1
+#define _APS_NEXT_RESOURCE_VALUE        102
+#define _APS_NEXT_COMMAND_VALUE         40001
+#define _APS_NEXT_CONTROL_VALUE         1001
+#define _APS_NEXT_SYMED_VALUE           101
+#endif
+#endif
+
+#define APSTUDIO_READONLY_SYMBOLS
+/////////////////////////////////////////////////////////////////////////////
+//
+// Generated from the TEXTINCLUDE 2 resource.
+//
+#include <winresrc.h>
+#define ENDL "\r\n"
+#include "tbb_version.h"
+
+/////////////////////////////////////////////////////////////////////////////
+#undef APSTUDIO_READONLY_SYMBOLS
+
+/////////////////////////////////////////////////////////////////////////////
+// Neutral resources
+
+//#if !defined(AFX_RESOURCE_DLL) || defined(AFX_TARG_NEU)
+#ifdef _WIN32
+LANGUAGE LANG_NEUTRAL, SUBLANG_NEUTRAL
+#pragma code_page(1252)
+#endif //_WIN32
+
+/////////////////////////////////////////////////////////////////////////////
+// manifest integration
+#ifdef TBB_MANIFEST
+#include "winuser.h"
+2 RT_MANIFEST tbbmanifest.exe.manifest
+#endif
+
+/////////////////////////////////////////////////////////////////////////////
+//
+// Version
+//
+
+VS_VERSION_INFO VERSIONINFO
+ FILEVERSION TBB_VERNUMBERS
+ PRODUCTVERSION TBB_VERNUMBERS
+ FILEFLAGSMASK 0x17L
+#ifdef _DEBUG
+ FILEFLAGS 0x1L
+#else
+ FILEFLAGS 0x0L
+#endif
+ FILEOS 0x40004L
+ FILETYPE 0x2L
+ FILESUBTYPE 0x0L
+BEGIN
+    BLOCK "StringFileInfo"
+    BEGIN
+        BLOCK "000004b0"
+        BEGIN
+            VALUE "CompanyName", "Intel Corporation\0"
+            VALUE "FileDescription", "Threading Building Blocks library\0"
+            VALUE "FileVersion", TBB_VERSION "\0"
+//what is it?            VALUE "InternalName", "tbb\0"
+            VALUE "LegalCopyright", "Copyright 2005-2009 Intel Corporation.  All Rights Reserved.\0"
+            VALUE "LegalTrademarks", "\0"
+#ifndef TBB_USE_DEBUG
+            VALUE "OriginalFilename", "tbb.dll\0"
+#else
+            VALUE "OriginalFilename", "tbb_debug.dll\0"
+#endif
+            VALUE "ProductName", "Intel(R) Threading Building Blocks for Windows\0"
+            VALUE "ProductVersion", TBB_VERSION "\0"
+            VALUE "Comments", TBB_VERSION_STRINGS "\0"
+            VALUE "PrivateBuild", "\0"
+            VALUE "SpecialBuild", "\0"
+        END
+    END
+    BLOCK "VarFileInfo"
+    BEGIN
+        VALUE "Translation", 0x0, 1200
+    END
+END
+
+//#endif    // Neutral resources
+/////////////////////////////////////////////////////////////////////////////
+
+
+#ifndef APSTUDIO_INVOKED
+/////////////////////////////////////////////////////////////////////////////
+//
+// Generated from the TEXTINCLUDE 3 resource.
+//
+
+
+/////////////////////////////////////////////////////////////////////////////
+#endif    // not APSTUDIO_INVOKED
+
diff --git a/dep/tbb/src/tbb/tbb_thread.cpp b/dep/tbb/src/tbb/tbb_thread.cpp
new file mode 100644
index 000000000..bb328e242
--- /dev/null
+++ b/dep/tbb/src/tbb/tbb_thread.cpp
@@ -0,0 +1,209 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#if _WIN32||_WIN64
+#include <process.h>        /* Need _beginthreadex from there */
+#include <stdexcept>        /* Need std::runtime_error from there */
+#include <string>           /* Need std::string from there */
+#endif // _WIN32||_WIN64
+#include "tbb_misc.h" // for handle_perror
+#include "tbb/tbb_stddef.h"
+#include "tbb/tbb_thread.h"
+#include "tbb/tbb_allocator.h"
+#include "tbb/task_scheduler_init.h" /* Need task_scheduler_init::default_num_threads() */
+
+namespace tbb {
+
+namespace internal {
+
+//! Allocate a closure
+void* allocate_closure_v3( size_t size )
+{
+    return allocate_via_handler_v3( size );
+}
+
+//! Free a closure allocated by allocate_closure_v3
+void free_closure_v3( void *ptr )
+{
+    deallocate_via_handler_v3( ptr );
+}
+
+#if _WIN32||_WIN64 
+#if defined(__EXCEPTIONS) || defined(_CPPUNWIND)
+// The above preprocessor symbols are defined by compilers when exception handling is enabled.
+
+void handle_win_error( int error_code ) 
+{
+    LPTSTR msg_buf;
+
+    FormatMessage(
+        FORMAT_MESSAGE_ALLOCATE_BUFFER | 
+        FORMAT_MESSAGE_FROM_SYSTEM |
+        FORMAT_MESSAGE_IGNORE_INSERTS,
+        NULL,
+        error_code,
+        0,
+        (LPTSTR) &msg_buf,
+        0, NULL );
+    const std::string msg_str(msg_buf);
+    LocalFree(msg_buf);
+    throw std::runtime_error(msg_str);
+}
+#endif //__EXCEPTIONS || _CPPUNWIND
+#endif // _WIN32||_WIN64
+
+void tbb_thread_v3::join()
+{
+    __TBB_ASSERT( joinable(), "thread should be joinable when join called" );
+#if _WIN32||_WIN64 
+    DWORD status = WaitForSingleObject( my_handle, INFINITE );
+    if ( status == WAIT_FAILED )
+        handle_win_error( GetLastError() );
+    BOOL close_stat = CloseHandle( my_handle );
+    if ( close_stat == 0 )
+        handle_win_error( GetLastError() );
+    my_thread_id = 0;
+#else
+    int status = pthread_join( my_handle, NULL );
+    if( status )
+        handle_perror( status, "pthread_join" );
+#endif // _WIN32||_WIN64 
+    my_handle = 0;
+}
+
+void tbb_thread_v3::detach() {
+    __TBB_ASSERT( joinable(), "only joinable thread can be detached" );
+#if _WIN32||_WIN64
+    BOOL status = CloseHandle( my_handle );
+    if ( status == 0 )
+      handle_win_error( GetLastError() );
+    my_thread_id = 0;
+#else
+    int status = pthread_detach( my_handle );
+    if( status )
+        handle_perror( status, "pthread_detach" );
+#endif // _WIN32||_WIN64
+    my_handle = 0;
+}
+
+const size_t MB = 1<<20;
+#if !defined(__TBB_WORDSIZE)
+const size_t ThreadStackSize = 1*MB;
+#elif __TBB_WORDSIZE<=4
+const size_t ThreadStackSize = 2*MB;
+#else
+const size_t ThreadStackSize = 4*MB;
+#endif
+
+void tbb_thread_v3::internal_start( __TBB_NATIVE_THREAD_ROUTINE_PTR(start_routine),
+                                    void* closure ) {
+#if _WIN32||_WIN64
+    unsigned thread_id;
+    // The return type of _beginthreadex is "uintptr_t" on new MS compilers,
+    // and 'unsigned long' on old MS compilers.  Our uintptr works for both.
+    uintptr status = _beginthreadex( NULL, ThreadStackSize, start_routine,
+                                     closure, 0, &thread_id ); 
+    if( status==0 )
+        handle_perror(errno,"__beginthreadex");
+    else {
+        my_handle = (HANDLE)status;
+        my_thread_id = thread_id;
+    }
+#else
+    pthread_t thread_handle;
+    int status;
+    pthread_attr_t stack_size;
+    status = pthread_attr_init( &stack_size );
+    if( status )
+        handle_perror( status, "pthread_attr_init" );
+    status = pthread_attr_setstacksize( &stack_size, ThreadStackSize );
+    if( status )
+        handle_perror( status, "pthread_attr_setstacksize" );
+
+    status = pthread_create( &thread_handle, &stack_size, start_routine, closure );
+    if( status )
+        handle_perror( status, "pthread_create" );
+
+    my_handle = thread_handle;
+#endif // _WIN32||_WIN64
+}
+
+unsigned tbb_thread_v3::hardware_concurrency() {
+    return task_scheduler_init::default_num_threads();
+}
+
+tbb_thread_v3::id thread_get_id_v3() {
+#if _WIN32||_WIN64
+    return tbb_thread_v3::id( GetCurrentThreadId() );
+#else
+    return tbb_thread_v3::id( pthread_self() );
+#endif // _WIN32||_WIN64
+}
+    
+void move_v3( tbb_thread_v3& t1, tbb_thread_v3& t2 )
+{
+    if (t1.joinable())
+        t1.detach();
+    t1.my_handle = t2.my_handle;
+    t2.my_handle = 0;
+#if _WIN32||_WIN64
+    t1.my_thread_id = t2.my_thread_id;
+    t2.my_thread_id = 0;
+#endif // _WIN32||_WIN64
+}
+
+void thread_yield_v3()
+{
+    __TBB_Yield();
+}
+
+void thread_sleep_v3(const tick_count::interval_t &i)
+{
+#if _WIN32||_WIN64
+     tick_count t0 = tick_count::now();
+     tick_count t1 = t0;
+     for(;;) {
+         double remainder = (i-(t1-t0)).seconds()*1e3;  // milliseconds remaining to sleep
+         if( remainder<=0 ) break;
+         DWORD t = remainder>=INFINITE ? INFINITE-1 : DWORD(remainder);
+         Sleep( t );
+         t1 = tick_count::now();
+    }
+#else
+    struct timespec req;
+    double sec = i.seconds();
+
+    req.tv_sec = static_cast<long>(sec);
+    req.tv_nsec = static_cast<long>( (sec - req.tv_sec)*1e9 );
+    nanosleep(&req, NULL);
+#endif // _WIN32||_WIN64
+}
+
+} // internal
+
+} // tbb
diff --git a/dep/tbb/src/tbb/tbb_version.h b/dep/tbb/src/tbb/tbb_version.h
new file mode 100644
index 000000000..07a91d6f5
--- /dev/null
+++ b/dep/tbb/src/tbb/tbb_version.h
@@ -0,0 +1,101 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+// Please define version number in the file:
+#include "../../include/tbb/tbb_stddef.h"
+
+// And don't touch anything below
+#ifndef ENDL
+#define ENDL "\n"
+#endif
+#include "../../build/vsproject/version_string.tmp"
+
+#ifndef __TBB_VERSION_STRINGS
+#pragma message("Warning: version_string.tmp isn't generated properly by version_info.sh script!")
+// here is an example of macros value:
+#define __TBB_VERSION_STRINGS \
+"TBB: BUILD_HOST\tUnknown\n" \
+"TBB: BUILD_ARCH\tUnknown\n" \
+"TBB: BUILD_OS\t\tUnknown\n" \
+"TBB: BUILD_CL\t\tUnknown\n" \
+"TBB: BUILD_COMPILER\tUnknown\n" \
+"TBB: BUILD_COMMAND\tUnknown\n"
+#endif
+#ifndef __TBB_DATETIME
+#ifdef RC_INVOKED
+#define __TBB_DATETIME "Unknown"
+#else
+#define __TBB_DATETIME __DATE__ __TIME__
+#endif
+#endif
+
+#define __TBB_VERSION_NUMBER "TBB: VERSION\t\t" __TBB_STRING(TBB_VERSION_MAJOR.TBB_VERSION_MINOR) ENDL
+#define __TBB_INTERFACE_VERSION_NUMBER "TBB: INTERFACE VERSION\t" __TBB_STRING(TBB_INTERFACE_VERSION) ENDL
+#define __TBB_VERSION_DATETIME "TBB: BUILD_DATE\t\t" __TBB_DATETIME ENDL
+#ifndef TBB_USE_DEBUG
+    #define __TBB_VERSION_USE_DEBUG "TBB: TBB_USE_DEBUG\tundefined" ENDL
+#elif TBB_USE_DEBUG==0
+    #define __TBB_VERSION_USE_DEBUG "TBB: TBB_USE_DEBUG\t0" ENDL
+#elif TBB_USE_DEBUG==1
+    #define __TBB_VERSION_USE_DEBUG "TBB: TBB_USE_DEBUG\t1" ENDL
+#elif TBB_USE_DEBUG==2
+    #define __TBB_VERSION_USE_DEBUG "TBB: TBB_USE_DEBUG\t2" ENDL
+#else
+    #error Unexpected value for TBB_USE_DEBUG
+#endif
+#ifndef TBB_USE_ASSERT
+    #define __TBB_VERSION_USE_ASSERT "TBB: TBB_USE_ASSERT\tundefined" ENDL
+#elif TBB_USE_ASSERT==0
+    #define __TBB_VERSION_USE_ASSERT "TBB: TBB_USE_ASSERT\t0" ENDL
+#elif TBB_USE_ASSERT==1
+    #define __TBB_VERSION_USE_ASSERT "TBB: TBB_USE_ASSERT\t1" ENDL
+#elif TBB_USE_ASSERT==2
+    #define __TBB_VERSION_USE_ASSERT "TBB: TBB_USE_ASSERT\t2" ENDL
+#else
+    #error Unexpected value for TBB_USE_ASSERT
+#endif
+#ifndef DO_ITT_NOTIFY
+    #define __TBB_VERSION_DO_NOTIFY "TBB: DO_ITT_NOTIFY\tundefined" ENDL
+#elif DO_ITT_NOTIFY==1
+    #define __TBB_VERSION_DO_NOTIFY "TBB: DO_ITT_NOTIFY\t1" ENDL
+#elif DO_ITT_NOTIFY==0
+    #define __TBB_VERSION_DO_NOTIFY
+#else
+    #error Unexpected value for DO_ITT_NOTIFY
+#endif
+
+#define TBB_VERSION_STRINGS __TBB_VERSION_NUMBER __TBB_INTERFACE_VERSION_NUMBER __TBB_VERSION_DATETIME __TBB_VERSION_STRINGS __TBB_VERSION_USE_DEBUG __TBB_VERSION_USE_ASSERT __TBB_VERSION_DO_NOTIFY
+
+// numbers
+#ifndef __TBB_VERSION_YMD
+#define __TBB_VERSION_YMD 0, 0
+#endif
+
+#define TBB_VERNUMBERS TBB_VERSION_MAJOR, TBB_VERSION_MINOR, __TBB_VERSION_YMD
+
+#define TBB_VERSION __TBB_STRING(TBB_VERNUMBERS)
diff --git a/dep/tbb/src/tbb/tls.h b/dep/tbb/src/tbb/tls.h
new file mode 100644
index 000000000..2e4768c15
--- /dev/null
+++ b/dep/tbb/src/tbb/tls.h
@@ -0,0 +1,119 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef _TBB_tls_H
+#define _TBB_tls_H
+
+#if USE_PTHREAD
+#include <pthread.h>
+#else /* assume USE_WINTHREAD */
+#include <windows.h>
+#endif
+
+namespace tbb {
+
+namespace internal {
+
+typedef void (*tls_dtor_t)(void*);
+
+//! Basic cross-platform wrapper class for TLS operations.
+template <typename T>
+class basic_tls {
+#if USE_PTHREAD
+    typedef pthread_key_t tls_key_t;
+public:
+    int  create( tls_dtor_t dtor = NULL ) {
+        return pthread_key_create(&my_key, dtor);
+    }
+    int  destroy()      { return pthread_key_delete(my_key); }
+    void set( T value ) { pthread_setspecific(my_key, (void*)value); }
+    T    get()          { return (T)pthread_getspecific(my_key); }
+#else /* USE_WINTHREAD */
+    typedef DWORD tls_key_t;
+public:
+    int create() {
+        tls_key_t tmp = TlsAlloc();
+        if( tmp==TLS_OUT_OF_INDEXES )
+            return TLS_OUT_OF_INDEXES;
+        my_key = tmp;
+        return 0;
+    }
+    int  destroy()      { TlsFree(my_key); my_key=0; return 0; }
+    void set( T value ) { TlsSetValue(my_key, (LPVOID)value); }
+    T    get()          { return (T)TlsGetValue(my_key); }
+#endif
+private:
+    tls_key_t my_key;
+};
+
+//! More advanced TLS support template class.
+/** It supports RAII and to some extent mimic __declspec(thread) variables. */
+template <typename T>
+class tls : public basic_tls<T> {
+    typedef basic_tls<T> base;
+public:
+    tls()  { base::create();  }
+    ~tls() { base::destroy(); }
+    T operator=(T value) { base::set(value); return value; }
+    operator T() { return base::get(); }
+};
+
+template <typename T>
+class tls<T*> : basic_tls<T*> {
+    typedef basic_tls<T*> base;
+    static void internal_dtor(void* ptr) {
+        if (ptr) delete (T*)ptr;
+    }
+    T* internal_get() {
+        T* result = base::get();
+        if (!result) {
+            result = new T;
+            base::set(result);
+        }
+        return result;
+    }
+public:
+    tls()  {
+#if USE_PTHREAD
+        base::create( internal_dtor );
+#else
+        base::create();
+#endif
+    }
+    ~tls() { base::destroy(); }
+    T* operator=(T* value) { base::set(value); return value; }
+    operator T*()   { return  internal_get(); }
+    T* operator->() { return  internal_get(); }
+    T& operator*()  { return *internal_get(); }
+};
+
+} // namespace internal
+
+} // namespace tbb
+
+#endif /* _TBB_tls_H */
diff --git a/dep/tbb/src/tbb/tools_api/_config.h b/dep/tbb/src/tbb/tools_api/_config.h
new file mode 100644
index 000000000..17c97e53e
--- /dev/null
+++ b/dep/tbb/src/tbb/tools_api/_config.h
@@ -0,0 +1,94 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __CONFIG_H_
+#define __CONFIG_H_
+
+#ifndef ITT_OS_WIN
+#  define ITT_OS_WIN   1
+#endif /* ITT_OS_WIN */
+
+#ifndef ITT_OS_LINUX
+#  define ITT_OS_LINUX 2
+#endif /* ITT_OS_LINUX */
+
+#ifndef ITT_OS_MAC
+#  define ITT_OS_MAC   3
+#endif /* ITT_OS_MAC */
+
+#ifndef ITT_OS
+#  if defined WIN32 || defined _WIN32
+#    define ITT_OS ITT_OS_WIN
+#  elif defined( __APPLE__ ) && defined( __MACH__ )
+#    define ITT_OS ITT_OS_MAC
+#  else
+#    define ITT_OS ITT_OS_LINUX
+#  endif
+#endif /* ITT_OS */
+
+#ifndef ITT_ARCH_IA32
+#  define ITT_ARCH_IA32  1
+#endif /* ITT_ARCH_IA32 */
+
+#ifndef ITT_ARCH_IA32E
+#  define ITT_ARCH_IA32E 2
+#endif /* ITT_ARCH_IA32E */
+
+#ifndef ITT_ARCH_IA64
+#  define ITT_ARCH_IA64  3
+#endif /* ITT_ARCH_IA64 */
+
+
+#ifndef ITT_ARCH
+#  if defined _M_X64 || defined _M_AMD64 || defined __x86_64__
+#    define ITT_ARCH ITT_ARCH_IA32E
+#  elif defined _M_IA64 || defined __ia64
+#    define ITT_ARCH ITT_ARCH_IA64
+#  else
+#    define ITT_ARCH ITT_ARCH_IA32
+#  endif
+#endif
+
+#ifndef ITT_PLATFORM_WIN
+#  define ITT_PLATFORM_WIN 1
+#endif /* ITT_PLATFORM_WIN */ 
+
+#ifndef ITT_PLATFORM_POSIX
+#  define ITT_PLATFORM_POSIX 2
+#endif /* ITT_PLATFORM_POSIX */
+
+#ifndef ITT_PLATFORM
+#  if ITT_OS==ITT_OS_WIN
+#    define ITT_PLATFORM ITT_PLATFORM_WIN
+#  else
+#    define ITT_PLATFORM ITT_PLATFORM_POSIX
+#  endif /* _WIN32 */
+#endif /* ITT_PLATFORM */
+
+#endif /* __CONFIG_H_ */
+
diff --git a/dep/tbb/src/tbb/tools_api/_disable_warnings.h b/dep/tbb/src/tbb/tools_api/_disable_warnings.h
new file mode 100644
index 000000000..e32f24ff5
--- /dev/null
+++ b/dep/tbb/src/tbb/tools_api/_disable_warnings.h
@@ -0,0 +1,42 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "_config.h"
+
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+
+#pragma warning (disable: 593)  /* parameter "XXXX" was set but never used                 */
+#pragma warning (disable: 344)  /* typedef name has already been declared (with same type) */
+#pragma warning (disable: 174)  /* expression has no effect                                */
+
+#elif defined __INTEL_COMPILER
+
+#pragma warning (disable: 869)  /* parameter "XXXXX" was never referenced                  */
+#pragma warning (disable: 1418) /* external function definition with no prior declaration  */
+
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
diff --git a/dep/tbb/src/tbb/tools_api/_ittnotify_static.h b/dep/tbb/src/tbb/tools_api/_ittnotify_static.h
new file mode 100644
index 000000000..9604b4c4f
--- /dev/null
+++ b/dep/tbb/src/tbb/tools_api/_ittnotify_static.h
@@ -0,0 +1,166 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "_config.h"
+
+#ifndef ITT_STUB
+#define ITT_STUB ITT_STUBV
+#endif /* ITT_STUB */
+
+#ifndef ITTAPI_CALL
+#define ITTAPI_CALL CDECL
+#endif /* ITTAPI_CALL */
+
+/* parameters for macro:
+   type, func_name, arguments, params, func_name_in_dll, group
+   */
+
+ITT_STUBV(void, pause,(void),(), pause, __itt_control_group)
+
+ITT_STUBV(void, resume,(void),(), resume, __itt_control_group)
+
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+
+ITT_STUB(int, markA,(__itt_mark_type mt, const char *parameter),(mt,parameter), markA, __itt_mark_group)
+
+ITT_STUB(int, markW,(__itt_mark_type mt, const wchar_t *parameter),(mt,parameter), markW, __itt_mark_group)
+
+ITT_STUB(int, mark_globalA,(__itt_mark_type mt, const char *parameter),(mt,parameter), mark_globalA, __itt_mark_group)
+
+ITT_STUB(int, mark_globalW,(__itt_mark_type mt, const wchar_t *parameter),(mt,parameter), mark_globalW, __itt_mark_group)
+
+ITT_STUBV(void, thread_set_nameA,( const char *name),(name), thread_set_nameA, __itt_thread_group)
+
+ITT_STUBV(void, thread_set_nameW,( const wchar_t *name),(name), thread_set_nameW, __itt_thread_group)
+
+ITT_STUBV(void, sync_createA,(void *addr, const char *objtype, const char *objname, int attribute), (addr, objtype, objname, attribute), sync_createA, __itt_sync_group | __itt_fsync_group)
+
+ITT_STUBV(void, sync_createW,(void *addr, const wchar_t *objtype, const wchar_t *objname, int attribute), (addr, objtype, objname, attribute), sync_createW, __itt_sync_group | __itt_fsync_group)
+
+ITT_STUBV(void, sync_renameA, (void *addr, const char *name), (addr, name), sync_renameA, __itt_sync_group | __itt_fsync_group)
+
+ITT_STUBV(void, sync_renameW, (void *addr, const wchar_t *name), (addr, name), sync_renameW, __itt_sync_group | __itt_fsync_group)
+#else /* WIN32 */
+
+ITT_STUB(int, mark,(__itt_mark_type mt, const char *parameter),(mt,parameter), mark, __itt_mark_group)
+ITT_STUB(int, mark_global,(__itt_mark_type mt, const char *parameter),(mt,parameter), mark_global, __itt_mark_group)
+
+ITT_STUBV(void, sync_set_name,(void *addr, const char *objtype, const char *objname, int attribute),(addr,objtype,objname,attribute), sync_set_name, __itt_sync_group | __itt_fsync_group)
+
+ITT_STUBV(void, thread_set_name,( const char *name),(name), thread_set_name, __itt_thread_group)
+
+ITT_STUBV(void, sync_create,(void *addr, const char *objtype, const char *objname, int attribute), (addr, objtype, objname, attribute), sync_create, __itt_sync_group | __itt_fsync_group)
+
+ITT_STUBV(void, sync_rename, (void *addr, const char *name), (addr, name), sync_rename, __itt_sync_group | __itt_fsync_group)
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+
+ITT_STUBV(void, sync_destroy,(void *addr), (addr), sync_destroy, __itt_sync_group | __itt_fsync_group)
+
+ITT_STUB(int, mark_off,(__itt_mark_type mt),(mt), mark_off, __itt_mark_group)
+ITT_STUB(int, mark_global_off,(__itt_mark_type mt),(mt), mark_global_off, __itt_mark_group)
+
+ITT_STUBV(void, thread_ignore,(void),(), thread_ignore, __itt_thread_group)
+
+ITT_STUBV(void, sync_prepare,(void* addr),(addr), sync_prepare, __itt_sync_group | __itt_fsync_group)
+
+ITT_STUBV(void, sync_cancel,(void *addr),(addr), sync_cancel, __itt_sync_group)
+
+ITT_STUBV(void, sync_acquired,(void *addr),(addr), sync_acquired, __itt_sync_group)
+
+ITT_STUBV(void, sync_releasing,(void* addr),(addr), sync_releasing, __itt_sync_group)
+
+ITT_STUBV(void, sync_released,(void* addr),(addr), sync_released, __itt_sync_group)
+
+ITT_STUBV(void, memory_read,( void *address, size_t size ), (address, size), memory_read, __itt_all_group)
+ITT_STUBV(void, memory_write,( void *address, size_t size ), (address, size), memory_write, __itt_all_group)
+ITT_STUBV(void, memory_update,( void *address, size_t size ), (address, size), memory_update, __itt_all_group)
+
+ITT_STUB(int, jit_notify_event,(__itt_jit_jvm_event event_type, void* event_data),(event_type, event_data), jit_notify_event, __itt_jit_group)
+
+#ifndef NO_ITT_LEGACY
+
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+ITT_STUB(__itt_mark_type, mark_createA,(const char *name),(name), mark_createA, __itt_mark_group)
+ITT_STUB(__itt_mark_type, mark_createW,(const wchar_t *name),(name), mark_createW, __itt_mark_group)
+#else /* WIN32 */
+ITT_STUB(__itt_mark_type, mark_create,(const char *name),(name), mark_create, __itt_mark_group)
+#endif
+ITT_STUBV(void, fsync_prepare,(void* addr),(addr), sync_prepare, __itt_fsync_group)
+
+ITT_STUBV(void, fsync_cancel,(void *addr),(addr), sync_cancel, __itt_fsync_group)
+
+ITT_STUBV(void, fsync_acquired,(void *addr),(addr), sync_acquired, __itt_fsync_group)
+
+ITT_STUBV(void, fsync_releasing,(void* addr),(addr), sync_releasing, __itt_fsync_group)
+
+ITT_STUBV(void, fsync_released,(void* addr),(addr), sync_released, __itt_fsync_group)
+
+ITT_STUBV(void, notify_sync_prepare,(void *p),(p), notify_sync_prepare, __itt_sync_group | __itt_fsync_group)
+
+ITT_STUBV(void, notify_sync_cancel,(void *p),(p), notify_sync_cancel, __itt_sync_group | __itt_fsync_group)
+
+ITT_STUBV(void, notify_sync_acquired,(void *p),(p), notify_sync_acquired, __itt_sync_group | __itt_fsync_group)
+
+ITT_STUBV(void, notify_sync_releasing,(void *p),(p), notify_sync_releasing, __itt_sync_group | __itt_fsync_group)
+
+ITT_STUBV(void, notify_cpath_target,(void),(), notify_cpath_target, __itt_all_group)
+
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+ITT_STUBV(void, sync_set_nameA,(void *addr, const char *objtype, const char *objname, int attribute),(addr,objtype,objname,attribute), sync_set_nameA, __itt_sync_group | __itt_fsync_group)
+
+ITT_STUBV(void, sync_set_nameW,(void *addr, const wchar_t *objtype, const wchar_t *objname, int attribute),(addr,objtype,objname,attribute), sync_set_nameW, __itt_sync_group | __itt_fsync_group)
+
+ITT_STUB (int, thr_name_setA,( char *name, int namelen ),(name,namelen), thr_name_setA, __itt_thread_group)
+
+ITT_STUB (int, thr_name_setW,( wchar_t *name, int namelen ),(name,namelen), thr_name_setW, __itt_thread_group)
+
+ITT_STUB (__itt_event, event_createA,( char *name, int namelen ),(name,namelen), event_createA, __itt_mark_group)
+
+ITT_STUB (__itt_event, event_createW,( wchar_t *name, int namelen ),(name,namelen), event_createW, __itt_mark_group)
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+ITT_STUB (int, thr_name_set,( char *name, int namelen ),(name,namelen), thr_name_set, __itt_thread_group)
+
+ITT_STUB (__itt_event, event_create,( char *name, int namelen ),(name,namelen), event_create, __itt_mark_group)
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+
+ITT_STUBV(void, thr_ignore,(void),(), thr_ignore, __itt_thread_group)
+
+ITT_STUB (int, event_start,( __itt_event event ),(event), event_start, __itt_mark_group)
+
+ITT_STUB (int, event_end,( __itt_event event ),(event), event_end, __itt_mark_group)
+
+ITT_STUB (__itt_state_t, state_get, (void), (), state_get, __itt_all_group)
+ITT_STUB (__itt_state_t, state_set,( __itt_state_t state), (state), state_set, __itt_all_group)
+ITT_STUB (__itt_obj_state_t, obj_mode_set, ( __itt_obj_prop_t prop, __itt_obj_state_t state), (prop, state), obj_mode_set, __itt_all_group)
+ITT_STUB (__itt_thr_state_t, thr_mode_set, (__itt_thr_prop_t prop, __itt_thr_state_t state), (prop, state), thr_mode_set, __itt_all_group)
+
+ITT_STUB (const char*, api_version,(void),(), api_version, __itt_all_group)
+ITT_STUB (unsigned int, jit_get_new_method_id, (void), (), jit_get_new_method_id, __itt_jit_group)
+
+#endif /* NO_ITT_LEGACY */
+
diff --git a/dep/tbb/src/tbb/tools_api/ittnotify.h b/dep/tbb/src/tbb/tools_api/ittnotify.h
new file mode 100644
index 000000000..e9ebb0f20
--- /dev/null
+++ b/dep/tbb/src/tbb/tools_api/ittnotify.h
@@ -0,0 +1,1234 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+/** @mainpage
+ * Ability to control the collection during runtime. User API can be inserted into the user application.
+ * Commands include:
+	- Pause/resume analysis
+	- Stop analysis and application, view results
+	- Cancel analysis and application without generating results
+	- Mark current time in results
+ * The User API provides ability to control the collection, set marks at the execution of specific user code and
+ * specify custom synchronization primitives implemented without standard system APIs. 
+ * 
+ * Use case: User inserts API calls to the desired places in her code. The code is then compiled and
+ * linked with static part of User API library. User can recompile the code with specific macro defined 
+ * to enable API calls.  If this macro is not defined there is no run-time overhead and no need to  link 
+ * with static part of User API library. During  runtime the static library loads and initializes the dynamic part.
+ * In case of instrumentation-based collection, only a stub library is loaded; otherwise a proxy library is loaded,
+ * which calls the collector.
+ * 
+ * User API set is native (C/C++) only (no MRTE support). As amitigation can use JNI or C/C++ function 
+ * call from managed code where needed. If the collector causes significant overhead or data storage, then 
+ * pausing analysis should reduce the overhead to minimal levels.
+*/
+/** @example example.cpp
+ * @brief The following example program shows the usage of User API
+ */
+
+#ifndef _ITTNOTIFY_H_
+#define _ITTNOTIFY_H_
+/** @file ittnotify.h
+ *  @brief Header file which contains declaration of user API functions and types
+ */
+
+/** @cond exclude_from_documentation */
+#ifndef ITT_OS_WIN
+#  define ITT_OS_WIN   1
+#endif /* ITT_OS_WIN */
+
+#ifndef ITT_OS_LINUX
+#  define ITT_OS_LINUX 2
+#endif /* ITT_OS_LINUX */
+
+#ifndef ITT_OS_MAC
+#  define ITT_OS_MAC   3
+#endif /* ITT_OS_MAC */
+
+#ifndef ITT_OS
+#  if defined WIN32 || defined _WIN32
+#    define ITT_OS ITT_OS_WIN
+#  elif defined( __APPLE__ ) && defined( __MACH__ )
+#    define ITT_OS ITT_OS_MAC
+#  else
+#    define ITT_OS ITT_OS_LINUX
+#  endif
+#endif /* ITT_OS */
+
+#ifndef ITT_PLATFORM_WIN
+#  define ITT_PLATFORM_WIN 1
+#endif /* ITT_PLATFORM_WIN */ 
+
+#ifndef ITT_PLATFORM_POSIX
+#  define ITT_PLATFORM_POSIX 2
+#endif /* ITT_PLATFORM_POSIX */
+
+#ifndef ITT_PLATFORM
+#  if ITT_OS==ITT_OS_WIN
+#    define ITT_PLATFORM ITT_PLATFORM_WIN
+#  else
+#    define ITT_PLATFORM ITT_PLATFORM_POSIX
+#  endif /* _WIN32 */
+#endif /* ITT_PLATFORM */
+
+
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+#include <tchar.h>
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+
+#ifdef __cplusplus
+extern "C" {
+#endif /* __cplusplus */
+
+#define ITTAPI_CALL CDECL
+
+#ifndef CDECL
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+#    define CDECL __cdecl
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+#    define CDECL
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+#endif /* CDECL */
+
+/** @endcond */
+
+/** @brief user event type */
+typedef int __itt_mark_type;
+typedef int __itt_event;
+typedef int __itt_state_t;
+
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+#  ifdef UNICODE
+     typedef wchar_t __itt_char;
+#  else /* UNICODE */
+     typedef char __itt_char;
+#  endif /* UNICODE */
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+/** @brief Typedef for char or wchar_t (if Unicode symbol is allowed) on Windows.
+  * And typedef for char on Linux. 
+  */
+     typedef char __itt_char;
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+/** @cond exclude_from_documentation */
+typedef enum __itt_obj_state {
+    __itt_obj_state_err = 0,
+    __itt_obj_state_clr = 1,
+    __itt_obj_state_set = 2,
+    __itt_obj_state_use = 3
+} __itt_obj_state_t;
+
+typedef enum __itt_thr_state {
+    __itt_thr_state_err = 0,
+    __itt_thr_state_clr = 1,
+    __itt_thr_state_set = 2
+} __itt_thr_state_t;
+
+typedef enum __itt_obj_prop {
+    __itt_obj_prop_watch    = 1,
+    __itt_obj_prop_ignore   = 2,
+    __itt_obj_prop_sharable = 3
+} __itt_obj_prop_t;
+
+typedef enum __itt_thr_prop {
+    __itt_thr_prop_quiet = 1
+} __itt_thr_prop_t;
+/** @endcond */
+typedef enum __itt_error_code {
+    __itt_error_success       = 0, /*!< no error                */
+    __itt_error_no_module     = 1, /*!< module can't be loaded  */
+    __itt_error_no_symbol     = 2, /*!< symbol not found        */
+    __itt_error_unknown_group = 3, /*!< unknown group specified */
+    __itt_error_cant_read_env = 4  /*!< variable value too long */
+} __itt_error_code;
+
+typedef void (__itt_error_notification_t)(__itt_error_code code, const char* msg);
+
+/*******************************************
+ * Various constants used by JIT functions *
+ *******************************************/
+
+ /*! @enum ___itt_jit_jvm_event
+  * event notification 
+  */
+ typedef enum ___itt_jit_jvm_event
+ {
+
+   __itt_JVM_EVENT_TYPE_SHUTDOWN = 2,           /*!< Shutdown. Program exiting. EventSpecificData NA*/      
+   __itt_JVM_EVENT_TYPE_METHOD_LOAD_FINISHED=13,/*!< JIT profiling. Issued after method code jitted into memory but before code is executed
+												 *  event_data is an __itt_JIT_Method_Load */
+   __itt_JVM_EVENT_TYPE_METHOD_UNLOAD_START     /*!< JIT profiling. Issued before unload. Method code will no longer be executed, but code and info are still in memory.
+    											 *	The VTune profiler may capture method code only at this point. event_data is __itt_JIT_Method_Id */
+
+ } __itt_jit_jvm_event;
+
+/*! @enum ___itt_jit_environment_type
+ * @brief Enumerator for the environment of methods 
+ */
+typedef enum ___itt_jit_environment_type
+{
+    __itt_JIT_JITTINGAPI = 2
+} __itt_jit_environment_type;
+
+/**********************************
+ * Data structures for the events *
+ **********************************/
+
+ /*! @struct ___itt_jit_method_id
+  * @brief structure for the events: __itt_iJVM_EVENT_TYPE_METHOD_UNLOAD_START    
+  */
+typedef struct ___itt_jit_method_id 
+{
+	/** @brief Id of the method (same as the one passed in the __itt_JIT_Method_Load struct */
+    unsigned int       method_id;  
+
+} *__itt_pjit_method_id, __itt_jit_method_id;
+
+/*! @struct ___itt_jit_line_number_info
+ *  @brief structure for the events: __itt_iJVM_EVENT_TYPE_METHOD_LOAD_FINISHED 
+ */
+typedef struct ___itt_jit_line_number_info 
+{
+	/** @brief x86 Offset from the begining of the method */
+    unsigned int        offset;    
+	/** @brief source line number from the begining of the source file. */
+    unsigned int        line_number;     
+
+} *__itt_pjit_line_number_info, __itt_jit_line_number_info;
+/*! @struct ___itt_jit_method_load
+ *  @brief structure for the events: __itt_iJVM_EVENT_TYPE_METHOD_LOAD_FINISHED 
+ */
+typedef struct ___itt_jit_method_load 
+{
+	/** @brief unique method ID - can be any unique value, (except 0 - 999) */
+    unsigned int        method_id;
+    /** @brief method name (can be with or without the class and signature, in any case the class name will be added to it) */
+    char*               method_name;
+    /** @brief virtual address of that method  - This determines the method range for the iJVM_EVENT_TYPE_ENTER/LEAVE_METHOD_ADDR events */
+	void*               method_load_address;
+    /** @brief Size in memory - Must be exact */
+	unsigned int        method_size;
+    /** @brief Line Table size in number of entries - Zero if none */
+	unsigned int        line_number_size;
+    /** @brief Pointer to the begining of the line numbers info array */
+	__itt_pjit_line_number_info line_number_table;
+    /** @brief unique class ID */
+	unsigned int        class_id;
+    /** @brief class file name */
+	char*               class_file_name;
+    /** @brief source file name */
+	char*               source_file_name;
+    /** @brief bits supplied by the user for saving in the JIT file... */
+	void*               user_data;
+    /** @brief the size of the user data buffer */
+	unsigned int        user_data_size;
+    /** @note no need to fill this field, it's filled by VTune */
+	__itt_jit_environment_type env;
+} *__itt_pjit_method_load, __itt_jit_method_load;
+
+/** 
+ * @brief General behavior: application continues to run, but no profiling information is being collected
+
+ * - Pausing occurs not only for the current thread but for all process as well as spawned processes
+ * - Intel(R) Parallel Inspector: does not analyze or report errors that involve memory access.
+ * - Intel(R) Parallel Inspector: Other errors are reported as usual. Pausing data collection in
+     Intel(R) Parallel Inspector only pauses tracing and analyzing memory access. It does not pause
+     tracing or analyzing threading APIs.
+ * - Intel(R) Parallel Amplifier: does continue to record when new threads are started
+ * - Other effects: possible reduction of runtime overhead
+ */
+void ITTAPI_CALL __itt_pause(void);
+
+/** 
+ * @brief General behavior: application continues to run, collector resumes profiling information 
+ * collection for all threads and processes of profiled application
+ */
+void ITTAPI_CALL __itt_resume(void);
+
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+__itt_mark_type ITTAPI_CALL __itt_mark_createA(const char *name);
+__itt_mark_type ITTAPI_CALL __itt_mark_createW(const wchar_t *name);
+#ifdef UNICODE
+#  define __itt_mark_create __itt_mark_createW
+#  define __itt_mark_create_ptr __itt_mark_createW_ptr
+#else /* UNICODE */
+#  define __itt_mark_create __itt_mark_createA
+#  define __itt_mark_create_ptr __itt_mark_createA_ptr
+#endif /* UNICODE */
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+/** @brief Creates a user event type (mark) with the specified name using char or Unicode string.
+ * @param[in] name - name of mark to create
+ * @return Returns a handle to the mark type
+ */
+__itt_mark_type ITTAPI_CALL __itt_mark_create(const __itt_char* name);
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+int ITTAPI_CALL __itt_markA(__itt_mark_type mt, const char *parameter);
+int ITTAPI_CALL __itt_markW(__itt_mark_type mt, const wchar_t *parameter);
+
+int ITTAPI_CALL __itt_mark_globalA(__itt_mark_type mt, const char *parameter);
+int ITTAPI_CALL __itt_mark_globalW(__itt_mark_type mt, const wchar_t *parameter);
+
+#ifdef UNICODE
+#  define __itt_mark __itt_markW
+#  define __itt_mark_ptr __itt_markW_ptr
+
+#  define __itt_mark_global __itt_mark_globalW
+#  define __itt_mark_global_ptr __itt_mark_globalW_ptr
+#else /* UNICODE  */
+#  define __itt_mark __itt_markA
+#  define __itt_mark_ptr __itt_markA_ptr
+
+#  define __itt_mark_global __itt_mark_globalA
+#  define __itt_mark_global_ptr __itt_mark_globalA_ptr
+#endif /* UNICODE */
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+/** @brief Creates a "discrete" user event type (mark) of the specified type and an optional parameter using char or Unicode string.
+
+ * - The mark of "discrete" type is placed to collection results in case of success. It appears in overtime view(s) as a special tick sign. 
+ * - The call is "synchronous" - function returns after mark is actually added to results.
+ * - This function is useful, for example, to mark different phases of application (beginning of the next mark automatically meand end of current region).
+ * - Can be used together with "continuous" marks (see below) at the same collection session
+ * @param[in] mt - mark, created by __itt_mark_create(const __itt_char* name) function
+ * @param[in] parameter - string parameter of mark
+ * @return Returns zero value in case of success, non-zero value otherwise.
+ */
+int ITTAPI_CALL __itt_mark(__itt_mark_type mt, const __itt_char* parameter);
+/** @brief Use this if necessary to create a "discrete" user event type (mark) for process
+ * rather then for one thread
+ * @see int ITTAPI_CALL __itt_mark(__itt_mark_type mt, const __itt_char* parameter);
+ */
+int ITTAPI_CALL __itt_mark_global(__itt_mark_type mt, const __itt_char* parameter);
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+/** 
+ * @brief Creates an "end" point for "continuous" mark with specified name.
+
+ * - Returns zero value in case of success, non-zero value otherwise. Also returns non-zero value when preceding "begin" point for the mark with the same name failed to be created or not created. (*)
+ * - The mark of "continuous" type is placed to collection results in case of success. It appears in overtime view(s) as a special tick sign (different from "discrete" mark) together with line from corresponding "begin" mark to "end" mark. (*) * - Continuous marks can overlap (*) and be nested inside each other. Discrete mark can be nested inside marked region
+ * 
+ * @param[in] mt - mark, created by __itt_mark_create(const __itt_char* name) function
+ * 
+ * @return Returns zero value in case of success, non-zero value otherwise.
+ */
+int ITTAPI_CALL __itt_mark_off(__itt_mark_type mt);
+/** @brief Use this if necessary to create an "end" point for mark of process
+ * @see int ITTAPI_CALL __itt_mark_off(__itt_mark_type mt);
+ */
+int ITTAPI_CALL __itt_mark_global_off(__itt_mark_type mt);
+
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+void ITTAPI_CALL __itt_thread_set_nameA(const char *name);
+void ITTAPI_CALL __itt_thread_set_nameW(const wchar_t *name);
+#ifdef UNICODE
+#  define __itt_thread_set_name __itt_thread_set_nameW
+#  define __itt_thread_set_name_ptr __itt_thread_set_nameW_ptr
+#else /* UNICODE */
+#  define __itt_thread_set_name __itt_thread_set_nameA
+#  define __itt_thread_set_name_ptr __itt_thread_set_nameA_ptr
+#endif /* UNICODE */
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+/** @brief Sets thread name using char or Unicode string
+ * @param[in] name - name of thread
+ */
+void ITTAPI_CALL __itt_thread_set_name(const __itt_char* name);
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+
+/** @brief Mark current thread as ignored from this point on, for the duration of its existence. */
+void ITTAPI_CALL __itt_thread_ignore(void);
+/** @brief Is called when sync object is destroyed (needed to track lifetime of objects) */
+void ITTAPI_CALL __itt_sync_destroy(void *addr);
+/** @brief Enter spin loop on user-defined sync object */
+void ITTAPI_CALL __itt_sync_prepare(void* addr);
+/** @brief Quit spin loop without acquiring spin object */
+void ITTAPI_CALL __itt_sync_cancel(void *addr);
+/** @brief Successful spin loop completion (sync object acquired) */
+void ITTAPI_CALL __itt_sync_acquired(void *addr);
+/** @brief Start sync object releasing code. Is called before the lock release call. */
+void ITTAPI_CALL __itt_sync_releasing(void* addr);
+/** @brief Sync object released. Is called after the release call */
+void ITTAPI_CALL __itt_sync_released(void* addr);
+
+/** @brief Fast synchronization which does no require spinning.
+
+  * - This special function is to be used by TBB and OpenMP libraries only when they know 
+  *   there is no spin but they need to suppress TC warnings about shared variable modifications.
+  * - It only has corresponding pointers in static library and does not have corresponding function
+  *   in dynamic library.
+  * @see void ITTAPI_CALL __itt_sync_prepare(void* addr);
+*/
+void ITTAPI_CALL __itt_fsync_prepare(void* addr);
+/** @brief Fast synchronization which does no require spinning.
+
+  * - This special function is to be used by TBB and OpenMP libraries only when they know 
+  *   there is no spin but they need to suppress TC warnings about shared variable modifications.
+  * - It only has corresponding pointers in static library and does not have corresponding function
+  *   in dynamic library.
+  * @see void ITTAPI_CALL __itt_sync_cancel(void *addr);
+*/
+void ITTAPI_CALL __itt_fsync_cancel(void *addr);
+/** @brief Fast synchronization which does no require spinning.
+
+  * - This special function is to be used by TBB and OpenMP libraries only when they know 
+  *   there is no spin but they need to suppress TC warnings about shared variable modifications.
+  * - It only has corresponding pointers in static library and does not have corresponding function
+  *   in dynamic library.
+  * @see void ITTAPI_CALL __itt_sync_acquired(void *addr);
+*/
+void ITTAPI_CALL __itt_fsync_acquired(void *addr);
+/** @brief Fast synchronization which does no require spinning.
+
+  * - This special function is to be used by TBB and OpenMP libraries only when they know 
+  *   there is no spin but they need to suppress TC warnings about shared variable modifications.
+  * - It only has corresponding pointers in static library and does not have corresponding function
+  *   in dynamic library.
+  * @see void ITTAPI_CALL __itt_sync_releasing(void* addr);
+*/
+void ITTAPI_CALL __itt_fsync_releasing(void* addr);
+/** @brief Fast synchronization which does no require spinning.
+
+  * - This special function is to be used by TBB and OpenMP libraries only when they know 
+  *   there is no spin but they need to suppress TC warnings about shared variable modifications.
+  * - It only has corresponding pointers in static library and does not have corresponding function
+  *   in dynamic library.
+  * @see void ITTAPI_CALL __itt_sync_released(void* addr);
+*/
+void ITTAPI_CALL __itt_fsync_released(void* addr);
+
+/** @hideinitializer
+ * @brief possible value of attribute argument for sync object type
+ */
+#define __itt_attr_barrier 1
+/** @hideinitializer 
+ * @brief possible value of attribute argument for sync object type
+ */
+#define __itt_attr_mutex   2
+
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+void ITTAPI_CALL __itt_sync_set_nameA(void *addr, const char *objtype, const char *objname, int attribute);
+void ITTAPI_CALL __itt_sync_set_nameW(void *addr, const wchar_t *objtype, const wchar_t *objname, int attribute);
+#ifdef UNICODE
+#  define __itt_sync_set_name __itt_sync_set_nameW
+#  define __itt_sync_set_name_ptr __itt_sync_set_nameW_ptr
+#else /* UNICODE */
+#  define __itt_sync_set_name __itt_sync_set_nameA
+#  define __itt_sync_set_name_ptr __itt_sync_set_nameA_ptr
+#endif /* UNICODE */
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+/** @deprecated Legacy API 
+ * @brief Assign a name to a sync object using char or Unicode string
+ *  @param[in] addr -    pointer to the sync object. You should use a real pointer to your object
+ *                       to make sure that the values don't clash with other object addresses
+ *  @param[in] objtype - null-terminated object type string. If NULL is passed, the object will
+ *                       be assumed to be of generic "User Synchronization" type
+ *  @param[in] objname - null-terminated object name string. If NULL, no name will be assigned
+ *                       to the object -- you can use the __itt_sync_rename call later to assign
+ *                       the name
+ *  @param[in] attribute - one of [ #__itt_attr_barrier , #__itt_attr_mutex] values which defines the
+ *                       exact semantics of how prepare/acquired/releasing calls work.
+ */
+void ITTAPI_CALL __itt_sync_set_name(void *addr, const __itt_char* objtype, const __itt_char* objname, int attribute);
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+
+
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+void ITTAPI_CALL __itt_sync_createA(void *addr, const char *objtype, const char *objname, int attribute);
+void ITTAPI_CALL __itt_sync_createW(void *addr, const wchar_t *objtype, const wchar_t *objname, int attribute);
+#ifdef UNICODE
+#define __itt_sync_create __itt_sync_createW
+#  define __itt_sync_create_ptr __itt_sync_createW_ptr
+#else /* UNICODE */
+#define __itt_sync_create __itt_sync_createA
+#  define __itt_sync_create_ptr __itt_sync_createA_ptr
+#endif /* UNICODE */
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+/** @brief Register the creation of a sync object using char or Unicode string
+ *  @param[in] addr -    pointer to the sync object. You should use a real pointer to your object
+ *                       to make sure that the values don't clash with other object addresses
+ *  @param[in] objtype - null-terminated object type string. If NULL is passed, the object will
+ *                       be assumed to be of generic "User Synchronization" type
+ *  @param[in] objname - null-terminated object name string. If NULL, no name will be assigned
+ *                       to the object -- you can use the __itt_sync_rename call later to assign
+ *                       the name
+ *  @param[in] attribute - one of [ #__itt_attr_barrier, #__itt_attr_mutex] values which defines the
+ *                       exact semantics of how prepare/acquired/releasing calls work.
+**/
+void ITTAPI_CALL __itt_sync_create(void *addr, const __itt_char* objtype, const __itt_char* objname, int attribute);
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+/** @brief Assign a name to a sync object using char or Unicode string.
+
+ * Sometimes you cannot assign the name to a sync object in the __itt_sync_set_name call because it
+ * is not yet known there. In this case you should use the rename call which allows to assign the
+ * name after the creation has been registered. The renaming can be done multiple times. All waits
+ * after a new name has been assigned will be attributed to the sync object with this name.
+ * @param[in] addr -    pointer to the sync object
+ * @param[in] name - null-terminated object name string
+**/
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+void ITTAPI_CALL __itt_sync_renameA(void *addr, const char *name);
+void ITTAPI_CALL __itt_sync_renameW(void *addr, const wchar_t *name);
+#ifdef UNICODE
+#define __itt_sync_rename __itt_sync_renameW
+#  define __itt_sync_rename_ptr __itt_sync_renameW_ptr
+#else /* UNICODE */
+#define __itt_sync_rename __itt_sync_renameA
+#  define __itt_sync_rename_ptr __itt_sync_renameA_ptr
+#endif /* UNICODE */
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+void ITTAPI_CALL __itt_sync_rename(void *addr, const __itt_char* name);
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+
+/** @cond exclude_from_documentaion */
+int __itt_jit_notify_event(__itt_jit_jvm_event event_type, void* event_data);
+unsigned int __itt_jit_get_new_method_id(void);
+const char* ITTAPI_CALL __itt_api_version(void);
+__itt_error_notification_t* __itt_set_error_handler(__itt_error_notification_t*);
+
+#if ITT_OS == ITT_OS_WIN
+#define LIBITTNOTIFY_CC __cdecl
+#define LIBITTNOTIFY_EXPORT __declspec(dllexport) 
+#define LIBITTNOTIFY_IMPORT __declspec(dllimport) 
+#elif ITT_OS == ITT_OS_MAC || ITT_OS == ITT_OS_LINUX
+#define LIBITTNOTIFY_CC /* nothing */
+#define LIBITTNOTIFY_EXPORT /* nothing */
+#define LIBITTNOTIFY_IMPORT /* nothing */
+#else /* ITT_OS == ITT_OS_WIN */
+#error "Unsupported OS"
+#endif /* ITT_OS == ITT_OS_WIN */
+
+#define LIBITTNOTIFY_API
+/** @endcond */
+
+/** @deprecated Legacy API
+ * @brief Hand instrumentation of user synchronization 
+ */
+LIBITTNOTIFY_API void LIBITTNOTIFY_CC __itt_notify_sync_prepare(void *p);
+/** @deprecated Legacy API
+ * @brief Hand instrumentation of user synchronization 
+ */
+LIBITTNOTIFY_API void LIBITTNOTIFY_CC __itt_notify_sync_cancel(void *p);
+/** @deprecated Legacy API
+ * @brief Hand instrumentation of user synchronization 
+ */
+LIBITTNOTIFY_API void LIBITTNOTIFY_CC __itt_notify_sync_acquired(void *p);
+/** @deprecated Legacy API
+ * @brief Hand instrumentation of user synchronization 
+ */
+LIBITTNOTIFY_API void LIBITTNOTIFY_CC __itt_notify_sync_releasing(void *p);
+/** @deprecated Legacy API
+ * @brief itt_notify_cpath_target is handled by Thread Profiler only.
+ *  Inform Thread Profiler that the current thread has recahed a critical path target. 
+ */
+LIBITTNOTIFY_API void LIBITTNOTIFY_CC __itt_notify_cpath_target(void);
+
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+LIBITTNOTIFY_API int LIBITTNOTIFY_CC __itt_thr_name_setA( char *name, int namelen );
+LIBITTNOTIFY_API int LIBITTNOTIFY_CC __itt_thr_name_setW( wchar_t *name, int namelen );
+# ifdef UNICODE
+#  define __itt_thr_name_set __itt_thr_name_setW
+#  define __itt_thr_name_set_ptr __itt_thr_name_setW_ptr
+# else
+#  define __itt_thr_name_set __itt_thr_name_setA
+#  define __itt_thr_name_set_ptr __itt_thr_name_setA_ptr
+# endif /* UNICODE */
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+/** @deprecated Legacy API
+ * @brief Set name to be associated with thread in analysis GUI.
+ *  Return __itt_err upon failure (name or namelen being null,name and namelen mismatched)
+ */
+LIBITTNOTIFY_API int LIBITTNOTIFY_CC __itt_thr_name_set( __itt_char *name, int namelen );
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+
+/** @brief Mark current thread as ignored from this point on, for the duration of its existence. */
+LIBITTNOTIFY_API void LIBITTNOTIFY_CC  __itt_thr_ignore(void);
+
+/* User event notification                                                                    */
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+/** @deprecated Legacy API 
+ * @brief User event notification.
+ * Event create APIs return non-zero event identifier upon success and __itt_err otherwise
+ * (name or namelen being null/name and namelen not matching, user event feature not enabled)
+ */
+LIBITTNOTIFY_API __itt_event LIBITTNOTIFY_CC __itt_event_createA( char *name, int namelen );
+LIBITTNOTIFY_API __itt_event LIBITTNOTIFY_CC __itt_event_createW( wchar_t *name, int namelen );
+# ifdef UNICODE
+#  define __itt_event_create __itt_event_createW
+#  define __itt_event_create_ptr __itt_event_createW_ptr
+# else
+#  define __itt_event_create __itt_event_createA
+#  define __itt_event_create_ptr __itt_event_createA_ptr
+# endif /* UNICODE */
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+/** @deprecated Legacy API 
+ * @brief User event notification.
+ * Event create APIs return non-zero event identifier upon success and __itt_err otherwise
+ * (name or namelen being null/name and namelen not matching, user event feature not enabled)
+ */
+LIBITTNOTIFY_API __itt_event LIBITTNOTIFY_CC __itt_event_create( __itt_char *name, int namelen );
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+
+/** @deprecated Legacy API 
+ * @brief Record an event occurance. 
+ * These APIs return __itt_err upon failure (invalid event id/user event feature not enabled) 
+ */
+LIBITTNOTIFY_API int LIBITTNOTIFY_CC __itt_event_start( __itt_event event );
+/** @deprecated Legacy API 
+ * @brief Record an event occurance.  event_end is optional if events do not have durations. 
+ * These APIs return __itt_err upon failure (invalid event id/user event feature not enabled) 
+ */
+LIBITTNOTIFY_API int LIBITTNOTIFY_CC __itt_event_end( __itt_event event ); /** optional */
+
+
+/** @deprecated Legacy API 
+ * @brief managing thread and object states
+ */
+LIBITTNOTIFY_API __itt_state_t LIBITTNOTIFY_CC __itt_state_get(void);
+/** @deprecated Legacy API 
+ * @brief managing thread and object states
+ */
+LIBITTNOTIFY_API __itt_state_t LIBITTNOTIFY_CC __itt_state_set( __itt_state_t );
+
+/** @deprecated Legacy API 
+ * @brief managing thread and object modes
+ */
+LIBITTNOTIFY_API __itt_thr_state_t LIBITTNOTIFY_CC __itt_thr_mode_set( __itt_thr_prop_t, __itt_thr_state_t );
+/** @deprecated Legacy API 
+ * @brief managing thread and object modes
+ */
+LIBITTNOTIFY_API __itt_obj_state_t LIBITTNOTIFY_CC __itt_obj_mode_set( __itt_obj_prop_t, __itt_obj_state_t );
+
+/** @deprecated Non-supported Legacy API
+ * @brief Inform the tool of memory accesses on reading
+ */
+LIBITTNOTIFY_API void LIBITTNOTIFY_CC __itt_memory_read( void *address, size_t size );
+/** @deprecated Non-supported Legacy API
+ * @brief Inform the tool of memory accesses on writing
+ */
+LIBITTNOTIFY_API void LIBITTNOTIFY_CC __itt_memory_write( void *address, size_t size );
+/** @deprecated Non-supported Legacy API
+ * @brief Inform the tool of memory accesses on updating
+ */
+LIBITTNOTIFY_API void LIBITTNOTIFY_CC __itt_memory_update( void *address, size_t size );
+
+/** @cond exclude_from_documentation */
+/* The following 3 are currently for INTERNAL use only */
+/** @internal */
+LIBITTNOTIFY_API void LIBITTNOTIFY_CC __itt_test_delay( int );
+/** @internal */
+LIBITTNOTIFY_API void LIBITTNOTIFY_CC __itt_test_seq_init( void *, int );
+/** @internal */
+LIBITTNOTIFY_API void LIBITTNOTIFY_CC __itt_test_seq_wait( void *, int );
+/** @endcond */
+
+#ifdef __cplusplus
+}
+#endif /* __cplusplus */
+
+
+/* *********************************************************************************
+   *********************************************************************************
+   ********************************************************************************* */
+/** @cond exclude_from_documentation */
+#define ITT_JOIN_AUX(p,n) p##n
+#define ITT_JOIN(p,n) ITT_JOIN_AUX(p,n)
+
+#ifndef INTEL_ITTNOTIFY_PREFIX
+#define INTEL_ITTNOTIFY_PREFIX __itt_
+#endif /* INTEL_ITTNOTIFY_PREFIX */
+#ifndef INTEL_ITTNOTIFY_POSTFIX
+#  define INTEL_ITTNOTIFY_POSTFIX _ptr_
+#endif /* INTEL_ITTNOTIFY_POSTFIX */
+
+#ifndef _ITTNOTIFY_H_MACRO_BODY_
+
+#define ____ITTNOTIFY_NAME_(p,n) p##n
+#define ___ITTNOTIFY_NAME_(p,n) ____ITTNOTIFY_NAME_(p,n)
+#define __ITTNOTIFY_NAME_(n) ___ITTNOTIFY_NAME_(INTEL_ITTNOTIFY_PREFIX,n)
+#define _ITTNOTIFY_NAME_(n) __ITTNOTIFY_NAME_(ITT_JOIN(n,INTEL_ITTNOTIFY_POSTFIX))
+
+#ifdef ITT_STUBV
+#undef ITT_STUBV
+#endif
+#define ITT_STUBV(type,name,args,params)                     \
+    typedef type (ITTAPI_CALL* ITT_JOIN(_ITTNOTIFY_NAME_(name),_t)) args; \
+    extern ITT_JOIN(_ITTNOTIFY_NAME_(name),_t) _ITTNOTIFY_NAME_(name);
+#undef ITT_STUB
+#define ITT_STUB ITT_STUBV
+
+#ifdef __cplusplus
+extern "C" {
+#endif /* __cplusplus */
+
+#define __itt_error_handler ITT_JOIN(INTEL_ITTNOTIFY_PREFIX, error_handler)
+void __itt_error_handler(__itt_jit_jvm_event event_type, void* event_data);
+
+extern const __itt_state_t _ITTNOTIFY_NAME_(state_err);
+extern const __itt_event   _ITTNOTIFY_NAME_(event_err);
+extern const int           _ITTNOTIFY_NAME_(err);
+
+#define __itt_state_err _ITTNOTIFY_NAME_(state_err)
+#define __itt_event_err _ITTNOTIFY_NAME_(event_err)
+#define __itt_err       _ITTNOTIFY_NAME_(err)
+
+ITT_STUBV(void, pause,(void),())
+ITT_STUBV(void, resume,(void),())
+
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+
+ITT_STUB(__itt_mark_type, mark_createA,(const char *name),(name))
+
+ITT_STUB(__itt_mark_type, mark_createW,(const wchar_t *name),(name))
+
+ITT_STUB(int, markA,(__itt_mark_type mt, const char *parameter),(mt,parameter))
+
+ITT_STUB(int, markW,(__itt_mark_type mt, const wchar_t *parameter),(mt,parameter))
+
+ITT_STUB(int, mark_globalA,(__itt_mark_type mt, const char *parameter),(mt,parameter))
+
+ITT_STUB(int, mark_globalW,(__itt_mark_type mt, const wchar_t *parameter),(mt,parameter))
+
+ITT_STUBV(void, thread_set_nameA,( const char *name),(name))
+
+ITT_STUBV(void, thread_set_nameW,( const wchar_t *name),(name))
+
+ITT_STUBV(void, sync_createA,(void *addr, const char *objtype, const char *objname, int attribute), (addr, objtype, objname, attribute))
+
+ITT_STUBV(void, sync_createW,(void *addr, const wchar_t *objtype, const wchar_t *objname, int attribute), (addr, objtype, objname, attribute))
+
+ITT_STUBV(void, sync_renameA, (void *addr, const char *name), (addr, name))
+
+ITT_STUBV(void, sync_renameW, (void *addr, const wchar_t *name), (addr, name))
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+
+ITT_STUB(__itt_mark_type, mark_create,(const char *name),(name))
+
+ITT_STUB(int, mark,(__itt_mark_type mt, const char *parameter),(mt,parameter))
+
+ITT_STUB(int, mark_global,(__itt_mark_type mt, const char *parameter),(mt,parameter))
+
+ITT_STUBV(void, sync_set_name,(void *addr, const char *objtype, const char *objname, int attribute),(addr,objtype,objname,attribute))
+
+ITT_STUBV(void, thread_set_name,( const char *name),(name))
+
+ITT_STUBV(void, sync_create,(void *addr, const char *objtype, const char *objname, int attribute), (addr, objtype, objname, attribute))
+
+ITT_STUBV(void, sync_rename, (void *addr, const char *name), (addr, name))
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+
+ITT_STUB(int, mark_off,(__itt_mark_type mt),(mt))
+
+ITT_STUB(int, mark_global_off,(__itt_mark_type mt),(mt))
+
+ITT_STUBV(void, thread_ignore,(void),())
+
+ITT_STUBV(void, sync_prepare,(void* addr),(addr))
+
+ITT_STUBV(void, sync_cancel,(void *addr),(addr))
+
+ITT_STUBV(void, sync_acquired,(void *addr),(addr))
+
+ITT_STUBV(void, sync_releasing,(void* addr),(addr))
+
+ITT_STUBV(void, sync_released,(void* addr),(addr))
+
+ITT_STUBV(void, fsync_prepare,(void* addr),(addr))
+
+ITT_STUBV(void, fsync_cancel,(void *addr),(addr))
+
+ITT_STUBV(void, fsync_acquired,(void *addr),(addr))
+
+ITT_STUBV(void, fsync_releasing,(void* addr),(addr))
+
+ITT_STUBV(void, fsync_released,(void* addr),(addr))
+
+ITT_STUBV(void, sync_destroy,(void *addr), (addr))
+
+ITT_STUBV(void, notify_sync_prepare,(void *p),(p))
+
+ITT_STUBV(void, notify_sync_cancel,(void *p),(p))
+
+ITT_STUBV(void, notify_sync_acquired,(void *p),(p))
+
+ITT_STUBV(void, notify_sync_releasing,(void *p),(p))
+
+ITT_STUBV(void, notify_cpath_target,(),())
+
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+ITT_STUBV(void, sync_set_nameA,(void *addr, const char *objtype, const char *objname, int attribute),(addr,objtype,objname,attribute))
+
+ITT_STUBV(void, sync_set_nameW,(void *addr, const wchar_t *objtype, const wchar_t *objname, int attribute),(addr,objtype,objname,attribute))
+
+ITT_STUB (int, thr_name_setA,( char *name, int namelen ),(name,namelen))
+
+ITT_STUB (int, thr_name_setW,( wchar_t *name, int namelen ),(name,namelen))
+
+ITT_STUB (__itt_event, event_createA,( char *name, int namelen ),(name,namelen))
+
+ITT_STUB (__itt_event, event_createW,( wchar_t *name, int namelen ),(name,namelen))
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+ITT_STUB (int, thr_name_set,( char *name, int namelen ),(name,namelen))
+
+ITT_STUB (__itt_event, event_create,( char *name, int namelen ),(name,namelen))
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+
+ITT_STUBV(void, thr_ignore,(void),())
+
+ITT_STUB (int, event_start,( __itt_event event ),(event))
+
+ITT_STUB (int, event_end,( __itt_event event ),(event))
+
+ITT_STUB (__itt_state_t, state_get, (), ())
+ITT_STUB (__itt_state_t, state_set,( __itt_state_t state), (state))
+ITT_STUB (__itt_obj_state_t, obj_mode_set, ( __itt_obj_prop_t prop, __itt_obj_state_t state), (prop, state))
+ITT_STUB (__itt_thr_state_t, thr_mode_set, (__itt_thr_prop_t prop, __itt_thr_state_t state), (prop, state))
+
+ITT_STUB(const char*, api_version,(void),())
+
+ITT_STUB(int, jit_notify_event, (__itt_jit_jvm_event event_type, void* event_data), (event_type, event_data))
+ITT_STUB(unsigned int, jit_get_new_method_id, (void), ())
+
+ITT_STUBV(void, memory_read,( void *address, size_t size ), (address, size))
+ITT_STUBV(void, memory_write,( void *address, size_t size ), (address, size))
+ITT_STUBV(void, memory_update,( void *address, size_t size ), (address, size))
+
+ITT_STUBV(void, test_delay, (int p1), (p1))
+ITT_STUBV(void, test_seq_init, ( void* p1, int p2), (p1, p2))
+ITT_STUBV(void, test_seq_wait, ( void* p1, int p2), (p1, p2))
+#ifdef __cplusplus
+} /* extern "C" */
+#endif /* __cplusplus */
+
+
+#ifndef INTEL_NO_ITTNOTIFY_API
+
+#define __ITTNOTIFY_VOID_CALL__(n) (!_ITTNOTIFY_NAME_(n)) ? (void)0 : _ITTNOTIFY_NAME_(n)
+#define __ITTNOTIFY_DATA_CALL__(n) (!_ITTNOTIFY_NAME_(n)) ? 0 : _ITTNOTIFY_NAME_(n)
+
+#define __itt_pause __ITTNOTIFY_VOID_CALL__(pause)
+#define __itt_pause_ptr _ITTNOTIFY_NAME_(pause)
+
+#define __itt_resume __ITTNOTIFY_VOID_CALL__(resume)
+#define __itt_resume_ptr _ITTNOTIFY_NAME_(resume)
+
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+
+#define __itt_mark_createA __ITTNOTIFY_DATA_CALL__(mark_createA)
+#define __itt_mark_createA_ptr _ITTNOTIFY_NAME_(mark_createA)
+
+#define __itt_mark_createW __ITTNOTIFY_DATA_CALL__(mark_createW)
+#define __itt_mark_createW_ptr _ITTNOTIFY_NAME_(mark_createW)
+
+#define __itt_markA __ITTNOTIFY_DATA_CALL__(markA)
+#define __itt_markA_ptr _ITTNOTIFY_NAME_(markA)
+
+#define __itt_markW __ITTNOTIFY_DATA_CALL__(markW)
+#define __itt_markW_ptr _ITTNOTIFY_NAME_(markW)
+
+#define __itt_mark_globalA __ITTNOTIFY_DATA_CALL__(mark_globalA)
+#define __itt_mark_globalA_ptr _ITTNOTIFY_NAME_(mark_globalA)
+
+#define __itt_mark_globalW __ITTNOTIFY_DATA_CALL__(mark_globalW)
+#define __itt_mark_globalW_ptr _ITTNOTIFY_NAME_(mark_globalW)
+
+#define __itt_thread_set_nameA __ITTNOTIFY_VOID_CALL__(thread_set_nameA)
+#define __itt_thread_set_nameA_ptr _ITTNOTIFY_NAME_(thread_set_nameA)
+
+#define __itt_thread_set_nameW __ITTNOTIFY_VOID_CALL__(thread_set_nameW)
+#define __itt_thread_set_nameW_ptr _ITTNOTIFY_NAME_(thread_set_nameW)
+
+#define __itt_sync_createA __ITTNOTIFY_VOID_CALL__(sync_createA)
+#define __itt_sync_createA_ptr _ITTNOTIFY_NAME_(sync_createA)
+
+#define __itt_sync_createW __ITTNOTIFY_VOID_CALL__(sync_createW)
+#define __itt_sync_createW_ptr _ITTNOTIFY_NAME_(sync_createW)
+
+#define __itt_sync_renameA __ITTNOTIFY_VOID_CALL__(sync_renameA)
+#define __itt_sync_renameA_ptr _ITTNOTIFY_NAME_(sync_renameA)
+
+#define __itt_sync_renameW __ITTNOTIFY_VOID_CALL__(sync_renameW)
+#define __itt_sync_renameW_ptr _ITTNOTIFY_NAME_(sync_renameW)
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+
+#define __itt_mark_create __ITTNOTIFY_DATA_CALL__(mark_create)
+#define __itt_mark_create_ptr _ITTNOTIFY_NAME_(mark_create)
+
+#define __itt_mark __ITTNOTIFY_DATA_CALL__(mark)
+#define __itt_mark_ptr _ITTNOTIFY_NAME_(mark)
+
+#define __itt_mark_global __ITTNOTIFY_DATA_CALL__(mark_global)
+#define __itt_mark_global_ptr _ITTNOTIFY_NAME_(mark_global)
+
+#define __itt_sync_set_name __ITTNOTIFY_VOID_CALL__(sync_set_name)
+#define __itt_sync_set_name_ptr _ITTNOTIFY_NAME_(sync_set_name)
+
+#define __itt_thread_set_name __ITTNOTIFY_VOID_CALL__(thread_set_name)
+#define __itt_thread_set_name_ptr _ITTNOTIFY_NAME_(thread_set_name)
+
+#define __itt_sync_create __ITTNOTIFY_VOID_CALL__(sync_create)
+#define __itt_sync_create_ptr _ITTNOTIFY_NAME_(sync_create)
+
+#define __itt_sync_rename __ITTNOTIFY_VOID_CALL__(sync_rename)
+#define __itt_sync_rename_ptr _ITTNOTIFY_NAME_(sync_rename)
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+
+#define __itt_mark_off __ITTNOTIFY_DATA_CALL__(mark_off)
+#define __itt_mark_off_ptr _ITTNOTIFY_NAME_(mark_off)
+
+#define __itt_thread_ignore __ITTNOTIFY_VOID_CALL__(thread_ignore)
+#define __itt_thread_ignore_ptr _ITTNOTIFY_NAME_(thread_ignore)
+
+#define __itt_sync_prepare __ITTNOTIFY_VOID_CALL__(sync_prepare)
+#define __itt_sync_prepare_ptr _ITTNOTIFY_NAME_(sync_prepare)
+
+#define __itt_sync_cancel __ITTNOTIFY_VOID_CALL__(sync_cancel)
+#define __itt_sync_cancel_ptr _ITTNOTIFY_NAME_(sync_cancel)
+
+#define __itt_sync_acquired __ITTNOTIFY_VOID_CALL__(sync_acquired)
+#define __itt_sync_acquired_ptr _ITTNOTIFY_NAME_(sync_acquired)
+
+#define __itt_sync_releasing __ITTNOTIFY_VOID_CALL__(sync_releasing)
+#define __itt_sync_releasing_ptr _ITTNOTIFY_NAME_(sync_releasing)
+
+#define __itt_sync_released __ITTNOTIFY_VOID_CALL__(sync_released)
+#define __itt_sync_released_ptr _ITTNOTIFY_NAME_(sync_released)
+
+#define __itt_fsync_prepare __ITTNOTIFY_VOID_CALL__(fsync_prepare)
+#define __itt_fsync_prepare_ptr _ITTNOTIFY_NAME_(fsync_prepare)
+
+#define __itt_fsync_cancel __ITTNOTIFY_VOID_CALL__(fsync_cancel)
+#define __itt_fsync_cancel_ptr _ITTNOTIFY_NAME_(fsync_cancel)
+
+#define __itt_fsync_acquired __ITTNOTIFY_VOID_CALL__(fsync_acquired)
+#define __itt_fsync_acquired_ptr _ITTNOTIFY_NAME_(fsync_acquired)
+
+#define __itt_fsync_releasing __ITTNOTIFY_VOID_CALL__(fsync_releasing)
+#define __itt_fsync_releasing_ptr _ITTNOTIFY_NAME_(fsync_releasing)
+
+#define __itt_fsync_released __ITTNOTIFY_VOID_CALL__(fsync_released)
+#define __itt_fsync_released_ptr _ITTNOTIFY_NAME_(fsync_released)
+
+#define __itt_sync_destroy __ITTNOTIFY_VOID_CALL__(sync_destroy)
+#define __itt_sync_destroy_ptr _ITTNOTIFY_NAME_(sync_destroy)
+
+#define __itt_notify_sync_prepare __ITTNOTIFY_VOID_CALL__(notify_sync_prepare)
+#define __itt_notify_sync_prepare_ptr _ITTNOTIFY_NAME_(notify_sync_prepare)
+
+#define __itt_notify_sync_cancel __ITTNOTIFY_VOID_CALL__(notify_sync_cancel)
+#define __itt_notify_sync_cancel_ptr _ITTNOTIFY_NAME_(notify_sync_cancel)
+
+#define __itt_notify_sync_acquired __ITTNOTIFY_VOID_CALL__(notify_sync_acquired)
+#define __itt_notify_sync_acquired_ptr _ITTNOTIFY_NAME_(notify_sync_acquired)
+
+#define __itt_notify_sync_releasing __ITTNOTIFY_VOID_CALL__(notify_sync_releasing)
+#define __itt_notify_sync_releasing_ptr _ITTNOTIFY_NAME_(notify_sync_releasing)
+
+#define __itt_notify_cpath_target __ITTNOTIFY_VOID_CALL__(notify_cpath_target)
+#define __itt_notify_cpath_target_ptr _ITTNOTIFY_NAME_(notify_cpath_target)
+
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+#define __itt_sync_set_nameA __ITTNOTIFY_VOID_CALL__(sync_set_nameA)
+#define __itt_sync_set_nameA_ptr _ITTNOTIFY_NAME_(sync_set_nameA)
+
+#define __itt_sync_set_nameW __ITTNOTIFY_VOID_CALL__(sync_set_nameW)
+#define __itt_sync_set_nameW_ptr _ITTNOTIFY_NAME_(sync_set_nameW)
+
+#define __itt_thr_name_setA __ITTNOTIFY_DATA_CALL__(thr_name_setA)
+#define __itt_thr_name_setA_ptr _ITTNOTIFY_NAME_(thr_name_setA)
+
+#define __itt_thr_name_setW __ITTNOTIFY_DATA_CALL__(thr_name_setW)
+#define __itt_thr_name_setW_ptr _ITTNOTIFY_NAME_(thr_name_setW)
+
+#define __itt_event_createA __ITTNOTIFY_DATA_CALL__(event_createA)
+#define __itt_event_createA_ptr _ITTNOTIFY_NAME_(event_createA)
+
+#define __itt_event_createW __ITTNOTIFY_DATA_CALL__(event_createW)
+#define __itt_event_createW_ptr _ITTNOTIFY_NAME_(event_createW)
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+#define __itt_thr_name_set __ITTNOTIFY_DATA_CALL__(thr_name_set)
+#define __itt_thr_name_set_ptr _ITTNOTIFY_NAME_(thr_name_set)
+
+#define __itt_event_create __ITTNOTIFY_DATA_CALL__(event_create)
+#define __itt_event_create_ptr _ITTNOTIFY_NAME_(event_create)
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+
+#define __itt_thr_ignore __ITTNOTIFY_VOID_CALL__(thr_ignore)
+#define __itt_thr_ignore_ptr _ITTNOTIFY_NAME_(thr_ignore)
+
+#define __itt_event_start __ITTNOTIFY_DATA_CALL__(event_start)
+#define __itt_event_start_ptr _ITTNOTIFY_NAME_(event_start)
+
+#define __itt_event_end __ITTNOTIFY_DATA_CALL__(event_end)
+#define __itt_event_end_ptr _ITTNOTIFY_NAME_(event_end)
+
+#define __itt_state_get __ITTNOTIFY_DATA_CALL__(state_get)
+#define __itt_state_get_ptr _ITTNOTIFY_NAME_(state_get)
+
+#define __itt_state_set __ITTNOTIFY_DATA_CALL__(state_set)
+#define __itt_state_set_ptr _ITTNOTIFY_NAME_(state_set)
+
+#define __itt_obj_mode_set __ITTNOTIFY_DATA_CALL__(obj_mode_set)
+#define __itt_obj_mode_set_ptr _ITTNOTIFY_NAME_(obj_mode_set)
+
+#define __itt_thr_mode_set __ITTNOTIFY_DATA_CALL__(thr_mode_set)
+#define __itt_thr_mode_set_ptr _ITTNOTIFY_NAME_(thr_mode_set)
+
+#define __itt_api_version __ITTNOTIFY_DATA_CALL__(api_version)
+#define __itt_api_version_ptr _ITTNOTIFY_NAME_(api_version)
+
+#define __itt_jit_notify_event __ITTNOTIFY_DATA_CALL__(jit_notify_event)
+#define __itt_jit_notify_event_ptr _ITTNOTIFY_NAME_(jit_notify_event)
+
+#define __itt_jit_get_new_method_id __ITTNOTIFY_DATA_CALL__(jit_get_new_method_id)
+#define __itt_jit_get_new_method_id_ptr _ITTNOTIFY_NAME_(jit_get_new_method_id)
+
+#define __itt_memory_read __ITTNOTIFY_VOID_CALL__(memory_read)
+#define __itt_memory_read_ptr _ITTNOTIFY_NAME_(memory_read)
+
+#define __itt_memory_write __ITTNOTIFY_VOID_CALL__(memory_write)
+#define __itt_memory_write_ptr _ITTNOTIFY_NAME_(memory_write)
+
+#define __itt_memory_update __ITTNOTIFY_VOID_CALL__(memory_update)
+#define __itt_memory_update_ptr _ITTNOTIFY_NAME_(memory_update)
+
+
+#define __itt_test_delay __ITTNOTIFY_VOID_CALL__(test_delay)
+#define __itt_test_delay_ptr _ITTNOTIFY_NAME_(test_delay)
+
+#define __itt_test_seq_init __ITTNOTIFY_VOID_CALL__(test_seq_init)
+#define __itt_test_seq_init_ptr _ITTNOTIFY_NAME_(test_seq_init)
+
+#define __itt_test_seq_wait __ITTNOTIFY_VOID_CALL__(test_seq_wait)
+#define __itt_test_seq_wait_ptr _ITTNOTIFY_NAME_(test_seq_wait)
+
+#define __itt_set_error_handler ITT_JOIN(INTEL_ITTNOTIFY_PREFIX, set_error_handler)
+
+#else /* INTEL_NO_ITTNOTIFY_API */
+
+#define __itt_pause()
+#define __itt_pause_ptr 0
+
+#define __itt_resume()
+#define __itt_resume_ptr 0
+
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+
+#define __itt_mark_createA(name) (__itt_mark_type)0
+#define __itt_mark_createA_ptr 0
+
+#define __itt_mark_createW(name) (__itt_mark_type)0
+#define __itt_mark_createW_ptr 0
+
+#define __itt_markA(mt,parameter) (int)0
+#define __itt_markA_ptr 0
+
+#define __itt_markW(mt,parameter) (int)0
+#define __itt_markW_ptr 0
+
+#define __itt_mark_globalA(mt,parameter) (int)0
+#define __itt_mark_globalA_ptr 0
+
+#define __itt_mark_globalW(mt,parameter) (int)0
+#define __itt_mark_globalW_ptr 0
+
+#define __itt_thread_set_nameA(name)
+#define __itt_thread_set_nameA_ptr 0
+
+#define __itt_thread_set_nameW(name)
+#define __itt_thread_set_nameW_ptr 0
+
+#define __itt_sync_createA(addr, objtype, objname, attribute)
+#define __itt_sync_createA_ptr 0
+
+#define __itt_sync_createW(addr, objtype, objname, attribute)
+#define __itt_sync_createW_ptr 0
+
+#define __itt_sync_renameA(addr, name)
+#define __itt_sync_renameA_ptr 0
+
+#define __itt_sync_renameW(addr, name)
+#define __itt_sync_renameW_ptr 0
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+
+#define __itt_mark_create(name) (__itt_mark_type)0
+#define __itt_mark_create_ptr 0
+
+#define __itt_mark(mt,parameter) (int)0
+#define __itt_mark_ptr 0
+
+#define __itt_mark_global(mt,parameter) (int)0
+#define __itt_mark_global_ptr 0
+
+#define __itt_sync_set_name(addr,objtype,objname,attribute)
+#define __itt_sync_set_name_ptr 0
+
+#define __itt_thread_set_name(name)
+#define __itt_thread_set_name_ptr 0
+
+#define __itt_sync_create(addr, objtype, objname, attribute)
+#define __itt_sync_create_ptr 0
+
+#define __itt_sync_rename(addr, name)
+#define __itt_sync_rename_ptr 0
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+
+#define __itt_mark_off(mt) (int)0
+#define __itt_mark_off_ptr 0
+
+#define __itt_thread_ignore()
+#define __itt_thread_ignore_ptr 0
+
+#define __itt_sync_prepare(addr)
+#define __itt_sync_prepare_ptr 0
+
+#define __itt_sync_cancel(addr)
+#define __itt_sync_cancel_ptr 0
+
+#define __itt_sync_acquired(addr)
+#define __itt_sync_acquired_ptr 0
+
+#define __itt_sync_releasing(addr)
+#define __itt_sync_releasing_ptr 0
+
+#define __itt_sync_released(addr)
+#define __itt_sync_released_ptr 0
+
+#define __itt_fsync_prepare(addr)
+#define __itt_fsync_prepare_ptr 0
+
+#define __itt_fsync_cancel(addr)
+#define __itt_fsync_cancel_ptr 0
+
+#define __itt_fsync_acquired(addr)
+#define __itt_fsync_acquired_ptr 0
+
+#define __itt_fsync_releasing(addr)
+#define __itt_fsync_releasing_ptr 0
+
+#define __itt_fsync_released(addr)
+#define __itt_fsync_released_ptr 0
+
+#define __itt_sync_destroy(addr)
+#define __itt_sync_destroy_ptr 0
+
+#define __itt_notify_sync_prepare(p)
+#define __itt_notify_sync_prepare_ptr 0
+
+#define __itt_notify_sync_cancel(p)
+#define __itt_notify_sync_cancel_ptr 0
+
+#define __itt_notify_sync_acquired(p)
+#define __itt_notify_sync_acquired_ptr 0
+
+#define __itt_notify_sync_releasing(p)
+#define __itt_notify_sync_releasing_ptr 0
+
+#define __itt_notify_cpath_target()
+#define __itt_notify_cpath_target_ptr 0
+
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+#define __itt_sync_set_nameA(addr,objtype,objname,attribute)
+#define __itt_sync_set_nameA_ptr 0
+
+#define __itt_sync_set_nameW(addr,objtype,objname,attribute)
+#define __itt_sync_set_nameW_ptr 0
+
+#define __itt_thr_name_setA(name,namelen) (int)0
+#define __itt_thr_name_setA_ptr 0
+
+#define __itt_thr_name_setW(name,namelen) (int)0
+#define __itt_thr_name_setW_ptr 0
+
+#define __itt_event_createA(name,namelen) (__itt_event)0
+#define __itt_event_createA_ptr 0
+
+#define __itt_event_createW(name,namelen) (__itt_event)0
+#define __itt_event_createW_ptr 0
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+#define __itt_thr_name_set(name,namelen) (int)0
+#define __itt_thr_name_set_ptr 0
+
+#define __itt_event_create(name,namelen) (__itt_event)0
+#define __itt_event_create_ptr 0
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+
+#define __itt_thr_ignore()
+#define __itt_thr_ignore_ptr 0
+
+#define __itt_event_start(event) (int)0
+#define __itt_event_start_ptr 0
+
+#define __itt_event_end(event) (int)0
+#define __itt_event_end_ptr 0
+
+#define __itt_state_get() (__itt_state_t)0
+#define __itt_state_get_ptr 0
+
+#define __itt_state_set(state) (__itt_state_t)0
+#define __itt_state_set_ptr 0
+
+#define __itt_obj_mode_set(prop, state) (__itt_obj_state_t)0
+#define __itt_obj_mode_set_ptr 0
+
+#define __itt_thr_mode_set(prop, state) (__itt_thr_state_t)0
+#define __itt_thr_mode_set_ptr 0
+
+#define __itt_api_version() (const char*)0
+#define __itt_api_version_ptr 0
+
+#define __itt_jit_notify_event(event_type,event_data) (int)0
+#define __itt_jit_notify_event_ptr 0
+
+#define __itt_jit_get_new_method_id() (unsigned int)0
+#define __itt_jit_get_new_method_id_ptr 0
+
+#define __itt_memory_read(address, size)
+#define __itt_memory_read_ptr 0
+
+#define __itt_memory_write(address, size)
+#define __itt_memory_write_ptr 0
+
+#define __itt_memory_update(address, size)
+#define __itt_memory_update_ptr 0
+
+#define __itt_test_delay(p1)
+#define __itt_test_delay_ptr 0
+
+#define __itt_test_seq_init(p1,p2)
+#define __itt_test_seq_init_ptr 0
+
+#define __itt_test_seq_wait(p1,p2)
+#define __itt_test_seq_wait_ptr 0
+
+#define __itt_set_error_handler(x)
+
+#endif /* INTEL_NO_ITTNOTIFY_API */
+
+#endif /* _ITTNOTIFY_H_MACRO_BODY_ */
+
+#endif /* _ITTNOTIFY_H_ */
+/** @endcond */
+
diff --git a/dep/tbb/src/tbb/tools_api/ittnotify_static.c b/dep/tbb/src/tbb/tools_api/ittnotify_static.c
new file mode 100644
index 000000000..d03758bc6
--- /dev/null
+++ b/dep/tbb/src/tbb/tools_api/ittnotify_static.c
@@ -0,0 +1,577 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "_config.h"
+
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+#include <windows.h>
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+#include <pthread.h>
+#include <dlfcn.h>
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+#include <stdlib.h>
+#include <stdio.h>
+#include <string.h>
+
+#define __ITT_INTERNAL_INCLUDE
+
+#define _ITTNOTIFY_H_MACRO_BODY_
+
+#include "_disable_warnings.h"
+
+#include "ittnotify.h"
+
+#ifdef __cplusplus
+#  define ITT_EXTERN_C extern "C"
+#else
+#  define ITT_EXTERN_C /* nothing */
+#endif /* __cplusplus */
+
+#ifndef __itt_init_lib_name
+#  define __itt_init_lib_name __itt_init_lib
+#endif /* __itt_init_lib_name */
+
+static int __itt_init_lib(void);
+
+#ifndef INTEL_ITTNOTIFY_PREFIX
+#define INTEL_ITTNOTIFY_PREFIX __itt_
+#endif /* INTEL_ITTNOTIFY_PREFIX */
+#ifndef INTEL_ITTNOTIFY_POSTFIX
+#  define INTEL_ITTNOTIFY_POSTFIX _ptr_
+#endif /* INTEL_ITTNOTIFY_POSTFIX */
+
+#define ___N_(p,n) p##n
+#define __N_(p,n) ___N_(p,n)
+#define _N_(n) __N_(INTEL_ITTNOTIFY_PREFIX,n)
+
+/* building pointers to imported funcs */
+#undef ITT_STUBV
+#undef ITT_STUB
+#define ITT_STUB(type,name,args,params,ptr,group)                       \
+    static type ITTAPI_CALL ITT_JOIN(_N_(name),_init) args;             \
+    typedef type ITTAPI_CALL name##_t args;                             \
+    ITT_EXTERN_C name##_t* ITT_JOIN(_N_(name),INTEL_ITTNOTIFY_POSTFIX) = ITT_JOIN(_N_(name),_init); \
+    static type ITTAPI_CALL ITT_JOIN(_N_(name),_init) args              \
+    {                                                                   \
+        __itt_init_lib_name();                                          \
+        if(ITT_JOIN(_N_(name),INTEL_ITTNOTIFY_POSTFIX))                                             \
+            return ITT_JOIN(_N_(name),INTEL_ITTNOTIFY_POSTFIX) params;                              \
+        else                                                            \
+            return (type)0;                                             \
+    }
+
+#define ITT_STUBV(type,name,args,params,ptr,group)                      \
+    static type ITTAPI_CALL ITT_JOIN(_N_(name),_init) args;             \
+    typedef type ITTAPI_CALL name##_t args;                             \
+    ITT_EXTERN_C name##_t* ITT_JOIN(_N_(name),INTEL_ITTNOTIFY_POSTFIX) = ITT_JOIN(_N_(name),_init); \
+    static type ITTAPI_CALL ITT_JOIN(_N_(name),_init) args              \
+    {                                                                   \
+        __itt_init_lib_name();                                          \
+        if(ITT_JOIN(_N_(name),INTEL_ITTNOTIFY_POSTFIX))                                             \
+            ITT_JOIN(_N_(name),INTEL_ITTNOTIFY_POSTFIX) params;                                     \
+        else                                                            \
+            return;                                                     \
+    }
+
+const __itt_state_t _N_(state_err) = 0;
+const __itt_event _N_(event_err) = 0;
+const int _N_(err) = 0;
+
+#include "_ittnotify_static.h"
+
+typedef enum ___itt_group_id
+{
+    __itt_none_group    = 0,
+    __itt_control_group = 1,
+    __itt_thread_group  = 2,
+    __itt_mark_group    = 4,
+    __itt_sync_group    = 8,
+    __itt_fsync_group   = 16,
+    __itt_jit_group     = 32,
+    __itt_all_group     = -1
+} __itt_group_id;
+
+
+#ifndef CDECL
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+#    define CDECL __cdecl
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+#    define CDECL
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+#endif /* CDECL */
+
+#ifndef STDCALL
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+#    define STDCALL __stdcall
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+#    define STDCALL
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+#endif /* STDCALL */
+
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+    typedef FARPROC FPTR;
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+    typedef void* FPTR;
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+
+
+/* OS communication functions */
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+typedef HMODULE lib_t;
+typedef CRITICAL_SECTION mutex_t;
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+typedef void* lib_t;
+typedef pthread_mutex_t mutex_t;
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+
+static lib_t ittnotify_lib;
+
+static __itt_error_notification_t* error_handler = 0;
+
+#if ITT_OS==ITT_OS_WIN
+static const char* ittnotify_lib_name = "libittnotify.dll";
+#elif ITT_OS==ITT_OS_LINUX
+static const char* ittnotify_lib_name = "libittnotify.so";
+#elif ITT_OS==ITT_OS_MAC
+static const char* ittnotify_lib_name = "libittnotify.dylib";
+#else
+#error Unsupported or unknown OS.
+#endif
+
+#ifndef LIB_VAR_NAME
+#if ITT_ARCH==ITT_ARCH_IA32
+#define LIB_VAR_NAME INTEL_LIBITTNOTIFY32
+#else
+#define LIB_VAR_NAME INTEL_LIBITTNOTIFY64
+#endif
+#endif /* LIB_VAR_NAME */
+
+#define __TO_STR(x) #x
+#define _TO_STR(x) __TO_STR(x)
+
+static int __itt_fstrcmp(const char* s1, const char* s2)
+{
+    int i;
+
+    if(!s1 && !s2)
+        return 0;
+    else if(!s1 && s2)
+        return -1;
+    else if(s1 && !s2)
+        return 1;
+
+    for(i = 0; s1[i] || s2[i]; i++)
+        if(s1[i] > s2[i])
+            return 1;
+        else if(s1[i] < s2[i])
+            return -1;
+    return 0;
+}
+
+static const char* __itt_fsplit(const char* s, const char* sep, const char** out, int* len)
+{
+    int i;
+    int j;
+
+    if(!s || !sep || !out || !len)
+        return 0;
+
+    for(i = 0; s[i]; i++)
+    {
+        int b = 0;
+        for(j = 0; sep[j]; j++)
+            if(s[i] == sep[j])
+            {
+                b = 1;
+                break;
+            }
+        if(!b)
+            break;
+    }
+
+    if(!s[i])
+        return 0;
+
+    *len = 0;
+    *out = s + i;
+
+    for(; s[i]; i++, (*len)++)
+    {
+        int b = 0;
+        for(j = 0; sep[j]; j++)
+            if(s[i] == sep[j])
+            {
+                b = 1;
+                break;
+            }
+        if(b)
+            break;
+    }
+
+    for(; s[i]; i++)
+    {
+        int b = 0;
+        for(j = 0; sep[j]; j++)
+            if(s[i] == sep[j])
+            {
+                b = 1;
+                break;
+            }
+        if(!b)
+            break;
+    }
+
+    return s + i;
+}
+
+static char* __itt_fstrcpyn(char* dst, const char* src, int len)
+{
+    int i;
+
+    if(!src || !dst)
+        return 0;
+
+    for(i = 0; i < len; i++)
+        dst[i] = src[i];
+    dst[len] = 0;
+    return dst;
+}
+
+#ifdef ITT_NOTIFY_EXT_REPORT
+#  define ERROR_HANDLER ITT_JOIN(INTEL_ITTNOTIFY_PREFIX, error_handler)
+ITT_EXTERN_C void ERROR_HANDLER(__itt_error_code, const char* msg);
+#endif /* ITT_NOTIFY_EXT_REPORT */
+
+static void __itt_report_error(__itt_error_code code, const char* msg)
+{
+    if(error_handler)
+        error_handler(code, msg);
+#ifdef ITT_NOTIFY_EXT_REPORT
+    ERROR_HANDLER(code, msg);
+#endif /* ITT_NOTIFY_EXT_REPORT */
+}
+
+static const char* __itt_get_env_var(const char* name)
+{
+    static char env_value[4096];
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+    int i;
+    DWORD rc;
+    for(i = 0; i < sizeof(env_value); i++)
+        env_value[i] = 0;
+    rc = GetEnvironmentVariableA(name, env_value, sizeof(env_value) - 1);
+    if(rc >= sizeof(env_value))
+        __itt_report_error(__itt_error_cant_read_env, name);
+    else if(!rc)
+        return 0;
+    else
+        return env_value;
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+    char* env = getenv(name);
+    int i;
+    for(i = 0; i < sizeof(env_value); i++)
+        env_value[i] = 0;
+    if(env)
+    {
+        if(strlen(env) >= sizeof(env_value))
+        {
+            __itt_report_error(__itt_error_cant_read_env, name);
+            return 0;
+        }
+        strncpy(env_value, env, sizeof(env_value) - 1);
+        return env_value;
+    }
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+    return 0;
+}
+
+static const char* __itt_get_lib_name()
+{
+    const char* lib_name = __itt_get_env_var(_TO_STR(LIB_VAR_NAME));
+    if(!lib_name)
+        lib_name = ittnotify_lib_name;
+
+    return lib_name;
+}
+
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+#  define __itt_get_proc(lib, name) GetProcAddress(lib, name)
+#  define __itt_init_mutex(mutex)   InitializeCriticalSection(mutex)
+#  define __itt_mutex_lock(mutex)   EnterCriticalSection(mutex)
+#  define __itt_mutex_unlock(mutex) LeaveCriticalSection(mutex)
+#  define __itt_load_lib(name)      LoadLibraryA(name)
+#else /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+#  define __itt_get_proc(lib, name) dlsym(lib, name)
+#  define __itt_init_mutex(mutex)   pthread_mutex_init(mutex, 0)
+#  define __itt_mutex_lock(mutex)   pthread_mutex_lock(mutex)
+#  define __itt_mutex_unlock(mutex) pthread_mutex_unlock(mutex)
+#  define __itt_load_lib(name)      dlopen(name, RTLD_LAZY)
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+
+#ifndef ITT_SIMPLE_INIT
+/* function stubs */
+
+#undef ITT_STUBV
+#undef ITT_STUB
+
+#define ITT_STUBV(type,name,args,params,ptr,group) \
+ITT_EXTERN_C type ITTAPI_CALL _N_(name) args                \
+{                                                  \
+    __itt_init_lib_name();                         \
+    if(ITT_JOIN(_N_(name),INTEL_ITTNOTIFY_POSTFIX))         \
+        ITT_JOIN(_N_(name),INTEL_ITTNOTIFY_POSTFIX) params; \
+    else                                           \
+        return;                                    \
+}
+
+#define ITT_STUB(type,name,args,params,ptr,group) \
+ITT_EXTERN_C type ITTAPI_CALL _N_(name) args                        \
+{                                                 \
+    __itt_init_lib_name();                        \
+    if(ITT_JOIN(_N_(name),INTEL_ITTNOTIFY_POSTFIX))                 \
+        return ITT_JOIN(_N_(name),INTEL_ITTNOTIFY_POSTFIX) params;  \
+    else                                          \
+        return (type)0;                           \
+}
+
+#include "_ittnotify_static.h"
+
+#endif /* ITT_SIMPLE_INIT */
+
+typedef struct ___itt_group_list
+{
+    __itt_group_id id;
+    const char*    name;
+} __itt_group_list;
+
+static __itt_group_list group_list[] = {
+    {__itt_control_group, "control"},
+    {__itt_thread_group,  "thread"},
+    {__itt_mark_group,    "mark"},
+    {__itt_sync_group,    "sync"},
+    {__itt_fsync_group,   "fsync"},
+    {__itt_jit_group,     "jit"},
+    {__itt_all_group,     "all"},
+    {__itt_none_group,    0}
+};
+
+typedef struct ___itt_group_alias
+{
+    const char*    env_var;
+    __itt_group_id groups;
+} __itt_group_alias;
+
+static __itt_group_alias group_alias[] = {
+    {"KMP_FOR_TPROFILE", (__itt_group_id)(__itt_control_group | __itt_thread_group | __itt_sync_group | __itt_mark_group)},
+    {"KMP_FOR_TCHECK", (__itt_group_id)(__itt_control_group | __itt_thread_group | __itt_fsync_group | __itt_mark_group)},
+    {0, __itt_none_group}
+};
+
+typedef struct ___itt_func_map
+{
+    const char*    name;
+    void**         func_ptr;
+    __itt_group_id group;
+} __itt_func_map;
+
+
+#define _P_(name) ITT_JOIN(_N_(name),INTEL_ITTNOTIFY_POSTFIX)
+
+#define ITT_STRINGIZE_AUX(p) #p
+#define ITT_STRINGIZE(p) ITT_STRINGIZE_AUX(p)
+
+#define __ptr_(pname,name,group) {ITT_STRINGIZE(ITT_JOIN(__itt_,pname)), (void**)(void*)&_P_(name), (__itt_group_id)(group)},
+
+#undef ITT_STUB
+#undef ITT_STUBV
+
+#define ITT_STUB(type,name,args,params,ptr,group) __ptr_(ptr,name,group)
+#define ITT_STUBV ITT_STUB
+
+static __itt_func_map func_map[] = {
+#include "_ittnotify_static.h"
+    {0, 0, __itt_none_group}
+};
+
+static __itt_group_id __itt_get_groups()
+{
+    __itt_group_id res = __itt_none_group;
+
+    const char* group_str = __itt_get_env_var("INTEL_ITTNOTIFY_GROUPS");
+    if(group_str)
+    {
+        char gr[255];
+        const char* chunk;
+        int len;
+        while((group_str = __itt_fsplit(group_str, ",; ", &chunk, &len)) != 0)
+        {
+            int j;
+            int group_detected = 0;
+            __itt_fstrcpyn(gr, chunk, len);
+            for(j = 0; group_list[j].name; j++)
+            {
+                if(!__itt_fstrcmp(gr, group_list[j].name))
+                {
+                    res = (__itt_group_id)(res | group_list[j].id);
+                    group_detected = 1;
+                    break;
+                }
+            }
+
+            if(!group_detected)
+                __itt_report_error(__itt_error_unknown_group, gr);
+        }
+        return res;
+    }
+    else
+    {
+        int i;
+        for(i = 0; group_alias[i].env_var; i++)
+            if(__itt_get_env_var(group_alias[i].env_var))
+                return group_alias[i].groups;
+    }
+
+    return res;
+}
+
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+#pragma warning(push)
+#pragma warning(disable: 4054)
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
+
+static int __itt_init_lib()
+{
+    static volatile int init = 0;
+    static int result = 0;
+
+#ifndef ITT_SIMPLE_INIT
+
+#if ITT_PLATFORM==ITT_PLATFORM_POSIX
+    static mutex_t mutex = PTHREAD_MUTEX_INITIALIZER;
+#else
+    static volatile int mutex_initialized = 0;
+    static mutex_t mutex;
+    static LONG inter_counter = 0;
+#endif
+
+    if(!init)
+    {
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+        if(!mutex_initialized)
+        {
+            if(InterlockedIncrement(&inter_counter) == 1)
+            {
+                __itt_init_mutex(&mutex);
+                mutex_initialized = 1;
+            }
+            else
+                while(!mutex_initialized)
+                    SwitchToThread();
+        }
+#endif
+
+        __itt_mutex_lock(&mutex);
+#endif /* ITT_SIMPLE_INIT */
+        if(!init)
+        {
+            int i;
+
+            __itt_group_id groups = __itt_get_groups();
+
+            for(i = 0; func_map[i].name; i++)
+                *func_map[i].func_ptr = 0;
+
+            if(groups != __itt_none_group)
+            {
+#ifdef ITT_COMPLETE_GROUP
+                __itt_group_id zero_group = __itt_none_group;
+#endif /* ITT_COMPLETE_GROUP */
+
+                ittnotify_lib = __itt_load_lib(__itt_get_lib_name());
+                if(ittnotify_lib)
+                {
+                    for(i = 0; func_map[i].name; i++)
+                    {
+                        if(func_map[i].name && func_map[i].func_ptr && (func_map[i].group & groups))
+                        {
+                            *func_map[i].func_ptr = (void*)__itt_get_proc(ittnotify_lib, func_map[i].name);
+                            if(!(*func_map[i].func_ptr) && func_map[i].name)
+                            {
+                                __itt_report_error(__itt_error_no_symbol, func_map[i].name);
+#ifdef ITT_COMPLETE_GROUP
+                                zero_group = (__itt_group_id)(zero_group | func_map[i].group);
+#endif /* ITT_COMPLETE_GROUP */
+                            }
+                            else
+                                result = 1;
+                        }
+                    }
+                }
+                else
+                {
+                    __itt_report_error(__itt_error_no_module, __itt_get_lib_name());
+                }
+
+#ifdef ITT_COMPLETE_GROUP
+                for(i = 0; func_map[i].name; i++)
+                    if(func_map[i].group & zero_group)
+                        *func_map[i].func_ptr = 0;
+
+                result = 0;
+
+                for(i = 0; func_map[i].name; i++) /* evaluating if any function ptr is non empty */
+                    if(*func_map[i].func_ptr)
+                    {
+                        result = 1;
+                        break;
+                    }
+#endif /* ITT_COMPLETE_GROUP */
+            }
+
+            init = 1; /* first checking of 'init' flag happened out of mutex, that is why setting flag to 1 */
+                      /* must be after call table is filled (to avoid condition races) */
+        }
+#ifndef ITT_SIMPLE_INIT
+        __itt_mutex_unlock(&mutex);
+    }
+#endif /* ITT_SIMPLE_INIT */
+    return result;
+}
+
+#define SET_ERROR_HANDLER ITT_JOIN(INTEL_ITTNOTIFY_PREFIX, set_error_handler)
+
+ITT_EXTERN_C __itt_error_notification_t* SET_ERROR_HANDLER(__itt_error_notification_t* handler)
+{
+    __itt_error_notification_t* prev = error_handler;
+    error_handler = handler;
+    return prev;
+}
+
+#if ITT_PLATFORM==ITT_PLATFORM_WIN
+#pragma warning(pop)
+#endif /* ITT_PLATFORM==ITT_PLATFORM_WIN */
diff --git a/dep/tbb/src/tbb/win32-tbb-export.def b/dep/tbb/src/tbb/win32-tbb-export.def
new file mode 100644
index 000000000..d78bf6d6a
--- /dev/null
+++ b/dep/tbb/src/tbb/win32-tbb-export.def
@@ -0,0 +1,261 @@
+; Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+;
+; This file is part of Threading Building Blocks.
+;
+; Threading Building Blocks is free software; you can redistribute it
+; and/or modify it under the terms of the GNU General Public License
+; version 2 as published by the Free Software Foundation.
+;
+; Threading Building Blocks is distributed in the hope that it will be
+; useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+; of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+; GNU General Public License for more details.
+;
+; You should have received a copy of the GNU General Public License
+; along with Threading Building Blocks; if not, write to the Free Software
+; Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+;
+; As a special exception, you may use this file as part of a free software
+; library without restriction.  Specifically, if other files instantiate
+; templates or use macros or inline functions from this file, or you compile
+; this file and link it with other files to produce an executable, this
+; file does not by itself cause the resulting executable to be covered by
+; the GNU General Public License.  This exception does not however
+; invalidate any other reasons why the executable file might be covered by
+; the GNU General Public License.
+
+#include "tbb/tbb_config.h"
+
+EXPORTS
+
+; Assembly-language support that is called directly by clients
+;__TBB_machine_cmpswp1
+;__TBB_machine_cmpswp2
+;__TBB_machine_cmpswp4
+__TBB_machine_cmpswp8
+;__TBB_machine_fetchadd1
+;__TBB_machine_fetchadd2
+;__TBB_machine_fetchadd4
+__TBB_machine_fetchadd8
+;__TBB_machine_fetchstore1
+;__TBB_machine_fetchstore2
+;__TBB_machine_fetchstore4
+__TBB_machine_fetchstore8
+__TBB_machine_store8
+__TBB_machine_load8
+__TBB_machine_trylockbyte
+
+; cache_aligned_allocator.cpp
+?NFS_Allocate@internal@tbb@@YAPAXIIPAX@Z
+?NFS_GetLineSize@internal@tbb@@YAIXZ
+?NFS_Free@internal@tbb@@YAXPAX@Z
+?allocate_via_handler_v3@internal@tbb@@YAPAXI@Z
+?deallocate_via_handler_v3@internal@tbb@@YAXPAX@Z
+?is_malloc_used_v3@internal@tbb@@YA_NXZ
+
+; task.cpp v3
+?allocate@allocate_additional_child_of_proxy@internal@tbb@@QBEAAVtask@3@I@Z
+?allocate@allocate_child_proxy@internal@tbb@@QBEAAVtask@3@I@Z
+?allocate@allocate_continuation_proxy@internal@tbb@@QBEAAVtask@3@I@Z
+?allocate@allocate_root_proxy@internal@tbb@@SAAAVtask@3@I@Z
+?destroy@task@tbb@@QAEXAAV12@@Z
+?free@allocate_additional_child_of_proxy@internal@tbb@@QBEXAAVtask@3@@Z
+?free@allocate_child_proxy@internal@tbb@@QBEXAAVtask@3@@Z
+?free@allocate_continuation_proxy@internal@tbb@@QBEXAAVtask@3@@Z
+?free@allocate_root_proxy@internal@tbb@@SAXAAVtask@3@@Z
+?internal_set_ref_count@task@tbb@@AAEXH@Z
+?internal_decrement_ref_count@task@tbb@@AAEHXZ
+?is_owned_by_current_thread@task@tbb@@QBE_NXZ
+?note_affinity@task@tbb@@UAEXG@Z
+?resize@affinity_partitioner_base_v3@internal@tbb@@AAEXI@Z
+?self@task@tbb@@SAAAV12@XZ
+?spawn_and_wait_for_all@task@tbb@@QAEXAAVtask_list@2@@Z
+?default_num_threads@task_scheduler_init@tbb@@SAHXZ
+?initialize@task_scheduler_init@tbb@@QAEXHI@Z
+?initialize@task_scheduler_init@tbb@@QAEXH@Z
+?terminate@task_scheduler_init@tbb@@QAEXXZ
+?observe@task_scheduler_observer_v3@internal@tbb@@QAEX_N@Z
+
+; exception handling support
+#if __TBB_EXCEPTIONS
+?allocate@allocate_root_with_context_proxy@internal@tbb@@QBEAAVtask@3@I@Z
+?free@allocate_root_with_context_proxy@internal@tbb@@QBEXAAVtask@3@@Z
+?is_group_execution_cancelled@task_group_context@tbb@@QBE_NXZ
+?cancel_group_execution@task_group_context@tbb@@QAE_NXZ
+?reset@task_group_context@tbb@@QAEXXZ
+?init@task_group_context@tbb@@IAEXXZ
+?register_pending_exception@task_group_context@tbb@@QAEXXZ
+??1task_group_context@tbb@@QAE@XZ
+?name@captured_exception@tbb@@UBEPBDXZ
+?what@captured_exception@tbb@@UBEPBDXZ
+??1captured_exception@tbb@@UAE@XZ
+?move@captured_exception@tbb@@UAEPAV12@XZ
+?destroy@captured_exception@tbb@@UAEXXZ
+?set@captured_exception@tbb@@QAEXPBD0@Z
+?clear@captured_exception@tbb@@QAEXXZ
+#endif /* __TBB_EXCEPTIONS */
+
+; tbb_misc.cpp
+?assertion_failure@tbb@@YAXPBDH00@Z
+?get_initial_auto_partitioner_divisor@internal@tbb@@YAIXZ
+?handle_perror@internal@tbb@@YAXHPBD@Z
+?set_assertion_handler@tbb@@YAP6AXPBDH00@ZP6AX0H00@Z@Z
+?runtime_warning@internal@tbb@@YAXPBDZZ
+TBB_runtime_interface_version
+?throw_bad_last_alloc_exception_v4@internal@tbb@@YAXXZ
+
+; itt_notify.cpp
+?itt_load_pointer_with_acquire_v3@internal@tbb@@YAPAXPBX@Z
+?itt_store_pointer_with_release_v3@internal@tbb@@YAXPAX0@Z
+?itt_set_sync_name_v3@internal@tbb@@YAXPAXPB_W@Z
+?itt_load_pointer_v3@internal@tbb@@YAPAXPBX@Z
+
+; pipeline.cpp
+??0pipeline@tbb@@QAE@XZ
+??1filter@tbb@@UAE@XZ
+??1pipeline@tbb@@UAE@XZ
+??_7pipeline@tbb@@6B@
+?add_filter@pipeline@tbb@@QAEXAAVfilter@2@@Z
+?clear@pipeline@tbb@@QAEXXZ
+?inject_token@pipeline@tbb@@AAEXAAVtask@2@@Z
+?run@pipeline@tbb@@QAEXI@Z
+#if __TBB_EXCEPTIONS
+?run@pipeline@tbb@@QAEXIAAVtask_group_context@2@@Z
+#endif
+?process_item@thread_bound_filter@tbb@@QAE?AW4result_type@12@XZ
+?try_process_item@thread_bound_filter@tbb@@QAE?AW4result_type@12@XZ
+
+; queuing_rw_mutex.cpp
+?internal_construct@queuing_rw_mutex@tbb@@QAEXXZ
+?acquire@scoped_lock@queuing_rw_mutex@tbb@@QAEXAAV23@_N@Z
+?downgrade_to_reader@scoped_lock@queuing_rw_mutex@tbb@@QAE_NXZ
+?release@scoped_lock@queuing_rw_mutex@tbb@@QAEXXZ
+?upgrade_to_writer@scoped_lock@queuing_rw_mutex@tbb@@QAE_NXZ
+?try_acquire@scoped_lock@queuing_rw_mutex@tbb@@QAE_NAAV23@_N@Z
+
+#if !TBB_NO_LEGACY
+; spin_rw_mutex.cpp v2
+?internal_acquire_reader@spin_rw_mutex@tbb@@CAXPAV12@@Z
+?internal_acquire_writer@spin_rw_mutex@tbb@@CA_NPAV12@@Z
+?internal_downgrade@spin_rw_mutex@tbb@@CAXPAV12@@Z
+?internal_itt_releasing@spin_rw_mutex@tbb@@CAXPAV12@@Z
+?internal_release_reader@spin_rw_mutex@tbb@@CAXPAV12@@Z
+?internal_release_writer@spin_rw_mutex@tbb@@CAXPAV12@@Z
+?internal_upgrade@spin_rw_mutex@tbb@@CA_NPAV12@@Z
+?internal_try_acquire_writer@spin_rw_mutex@tbb@@CA_NPAV12@@Z
+?internal_try_acquire_reader@spin_rw_mutex@tbb@@CA_NPAV12@@Z
+#endif
+
+; spin_rw_mutex v3
+?internal_construct@spin_rw_mutex_v3@tbb@@AAEXXZ
+?internal_upgrade@spin_rw_mutex_v3@tbb@@AAE_NXZ
+?internal_downgrade@spin_rw_mutex_v3@tbb@@AAEXXZ
+?internal_acquire_reader@spin_rw_mutex_v3@tbb@@AAEXXZ
+?internal_acquire_writer@spin_rw_mutex_v3@tbb@@AAE_NXZ
+?internal_release_reader@spin_rw_mutex_v3@tbb@@AAEXXZ
+?internal_release_writer@spin_rw_mutex_v3@tbb@@AAEXXZ
+?internal_try_acquire_reader@spin_rw_mutex_v3@tbb@@AAE_NXZ
+?internal_try_acquire_writer@spin_rw_mutex_v3@tbb@@AAE_NXZ
+
+; spin_mutex.cpp
+?internal_construct@spin_mutex@tbb@@QAEXXZ
+?internal_acquire@scoped_lock@spin_mutex@tbb@@AAEXAAV23@@Z
+?internal_release@scoped_lock@spin_mutex@tbb@@AAEXXZ
+?internal_try_acquire@scoped_lock@spin_mutex@tbb@@AAE_NAAV23@@Z
+
+; mutex.cpp
+?internal_acquire@scoped_lock@mutex@tbb@@AAEXAAV23@@Z
+?internal_release@scoped_lock@mutex@tbb@@AAEXXZ
+?internal_try_acquire@scoped_lock@mutex@tbb@@AAE_NAAV23@@Z
+?internal_construct@mutex@tbb@@AAEXXZ
+?internal_destroy@mutex@tbb@@AAEXXZ
+
+; recursive_mutex.cpp
+?internal_acquire@scoped_lock@recursive_mutex@tbb@@AAEXAAV23@@Z
+?internal_release@scoped_lock@recursive_mutex@tbb@@AAEXXZ
+?internal_try_acquire@scoped_lock@recursive_mutex@tbb@@AAE_NAAV23@@Z
+?internal_construct@recursive_mutex@tbb@@AAEXXZ
+?internal_destroy@recursive_mutex@tbb@@AAEXXZ
+
+; queuing_mutex.cpp
+?internal_construct@queuing_mutex@tbb@@QAEXXZ
+?acquire@scoped_lock@queuing_mutex@tbb@@QAEXAAV23@@Z
+?release@scoped_lock@queuing_mutex@tbb@@QAEXXZ
+?try_acquire@scoped_lock@queuing_mutex@tbb@@QAE_NAAV23@@Z
+
+#if !TBB_NO_LEGACY
+; concurrent_hash_map.cpp
+?internal_grow_predicate@hash_map_segment_base@internal@tbb@@QBE_NXZ
+
+; concurrent_queue.cpp v2
+?advance@concurrent_queue_iterator_base@internal@tbb@@IAEXXZ
+?assign@concurrent_queue_iterator_base@internal@tbb@@IAEXABV123@@Z
+?internal_size@concurrent_queue_base@internal@tbb@@IBEHXZ
+??0concurrent_queue_base@internal@tbb@@IAE@I@Z
+??0concurrent_queue_iterator_base@internal@tbb@@IAE@ABVconcurrent_queue_base@12@@Z
+??1concurrent_queue_base@internal@tbb@@MAE@XZ
+??1concurrent_queue_iterator_base@internal@tbb@@IAE@XZ
+?internal_pop@concurrent_queue_base@internal@tbb@@IAEXPAX@Z
+?internal_pop_if_present@concurrent_queue_base@internal@tbb@@IAE_NPAX@Z
+?internal_push@concurrent_queue_base@internal@tbb@@IAEXPBX@Z
+?internal_push_if_not_full@concurrent_queue_base@internal@tbb@@IAE_NPBX@Z
+?internal_set_capacity@concurrent_queue_base@internal@tbb@@IAEXHI@Z
+#endif
+
+; concurrent_queue v3
+??1concurrent_queue_iterator_base_v3@internal@tbb@@IAE@XZ
+??0concurrent_queue_iterator_base_v3@internal@tbb@@IAE@ABVconcurrent_queue_base_v3@12@@Z
+?advance@concurrent_queue_iterator_base_v3@internal@tbb@@IAEXXZ
+?assign@concurrent_queue_iterator_base_v3@internal@tbb@@IAEXABV123@@Z
+??0concurrent_queue_base_v3@internal@tbb@@IAE@I@Z
+??1concurrent_queue_base_v3@internal@tbb@@MAE@XZ
+?internal_pop@concurrent_queue_base_v3@internal@tbb@@IAEXPAX@Z
+?internal_pop_if_present@concurrent_queue_base_v3@internal@tbb@@IAE_NPAX@Z
+?internal_push@concurrent_queue_base_v3@internal@tbb@@IAEXPBX@Z
+?internal_push_if_not_full@concurrent_queue_base_v3@internal@tbb@@IAE_NPBX@Z
+?internal_size@concurrent_queue_base_v3@internal@tbb@@IBEHXZ
+?internal_empty@concurrent_queue_base_v3@internal@tbb@@IBE_NXZ
+?internal_set_capacity@concurrent_queue_base_v3@internal@tbb@@IAEXHI@Z
+?internal_finish_clear@concurrent_queue_base_v3@internal@tbb@@IAEXXZ
+?internal_throw_exception@concurrent_queue_base_v3@internal@tbb@@IBEXXZ
+?assign@concurrent_queue_base_v3@internal@tbb@@IAEXABV123@@Z
+
+#if !TBB_NO_LEGACY
+; concurrent_vector.cpp v2
+?internal_assign@concurrent_vector_base@internal@tbb@@IAEXABV123@IP6AXPAXI@ZP6AX1PBXI@Z4@Z
+?internal_capacity@concurrent_vector_base@internal@tbb@@IBEIXZ
+?internal_clear@concurrent_vector_base@internal@tbb@@IAEXP6AXPAXI@Z_N@Z
+?internal_copy@concurrent_vector_base@internal@tbb@@IAEXABV123@IP6AXPAXPBXI@Z@Z
+?internal_grow_by@concurrent_vector_base@internal@tbb@@IAEIIIP6AXPAXI@Z@Z
+?internal_grow_to_at_least@concurrent_vector_base@internal@tbb@@IAEXIIP6AXPAXI@Z@Z
+?internal_push_back@concurrent_vector_base@internal@tbb@@IAEPAXIAAI@Z
+?internal_reserve@concurrent_vector_base@internal@tbb@@IAEXIII@Z
+#endif
+
+; concurrent_vector v3
+??1concurrent_vector_base_v3@internal@tbb@@IAE@XZ
+?internal_assign@concurrent_vector_base_v3@internal@tbb@@IAEXABV123@IP6AXPAXI@ZP6AX1PBXI@Z4@Z
+?internal_capacity@concurrent_vector_base_v3@internal@tbb@@IBEIXZ
+?internal_clear@concurrent_vector_base_v3@internal@tbb@@IAEIP6AXPAXI@Z@Z
+?internal_copy@concurrent_vector_base_v3@internal@tbb@@IAEXABV123@IP6AXPAXPBXI@Z@Z
+?internal_grow_by@concurrent_vector_base_v3@internal@tbb@@IAEIIIP6AXPAXPBXI@Z1@Z
+?internal_grow_to_at_least@concurrent_vector_base_v3@internal@tbb@@IAEXIIP6AXPAXPBXI@Z1@Z
+?internal_push_back@concurrent_vector_base_v3@internal@tbb@@IAEPAXIAAI@Z
+?internal_reserve@concurrent_vector_base_v3@internal@tbb@@IAEXIII@Z
+?internal_compact@concurrent_vector_base_v3@internal@tbb@@IAEPAXIPAXP6AX0I@ZP6AX0PBXI@Z@Z
+?internal_swap@concurrent_vector_base_v3@internal@tbb@@IAEXAAV123@@Z
+?internal_throw_exception@concurrent_vector_base_v3@internal@tbb@@IBEXI@Z
+?internal_resize@concurrent_vector_base_v3@internal@tbb@@IAEXIIIPBXP6AXPAXI@ZP6AX10I@Z@Z
+?internal_grow_to_at_least_with_result@concurrent_vector_base_v3@internal@tbb@@IAEIIIP6AXPAXPBXI@Z1@Z
+
+; tbb_thread
+?join@tbb_thread_v3@internal@tbb@@QAEXXZ
+?detach@tbb_thread_v3@internal@tbb@@QAEXXZ
+?internal_start@tbb_thread_v3@internal@tbb@@AAEXP6GIPAX@Z0@Z
+?allocate_closure_v3@internal@tbb@@YAPAXI@Z
+?free_closure_v3@internal@tbb@@YAXPAX@Z
+?hardware_concurrency@tbb_thread_v3@internal@tbb@@SAIXZ
+?thread_yield_v3@internal@tbb@@YAXXZ
+?thread_sleep_v3@internal@tbb@@YAXABVinterval_t@tick_count@2@@Z
+?move_v3@internal@tbb@@YAXAAVtbb_thread_v3@12@0@Z
+?thread_get_id_v3@internal@tbb@@YA?AVid@tbb_thread_v3@12@XZ
diff --git a/dep/tbb/src/tbb/win64-tbb-export.def b/dep/tbb/src/tbb/win64-tbb-export.def
new file mode 100644
index 000000000..4a3debff9
--- /dev/null
+++ b/dep/tbb/src/tbb/win64-tbb-export.def
@@ -0,0 +1,257 @@
+; Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+;
+; This file is part of Threading Building Blocks.
+;
+; Threading Building Blocks is free software; you can redistribute it
+; and/or modify it under the terms of the GNU General Public License
+; version 2 as published by the Free Software Foundation.
+;
+; Threading Building Blocks is distributed in the hope that it will be
+; useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+; of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+; GNU General Public License for more details.
+;
+; You should have received a copy of the GNU General Public License
+; along with Threading Building Blocks; if not, write to the Free Software
+; Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+;
+; As a special exception, you may use this file as part of a free software
+; library without restriction.  Specifically, if other files instantiate
+; templates or use macros or inline functions from this file, or you compile
+; this file and link it with other files to produce an executable, this
+; file does not by itself cause the resulting executable to be covered by
+; the GNU General Public License.  This exception does not however
+; invalidate any other reasons why the executable file might be covered by
+; the GNU General Public License.
+
+; This file is organized with a section for each .cpp file.
+; Each of these sections is in alphabetical order.
+
+#include "tbb/tbb_config.h"
+
+EXPORTS
+
+; Assembly-language support that is called directly by clients
+__TBB_machine_cmpswp1
+__TBB_machine_fetchadd1
+__TBB_machine_fetchstore1
+__TBB_machine_cmpswp2
+__TBB_machine_fetchadd2
+__TBB_machine_fetchstore2
+__TBB_machine_pause
+
+; cache_aligned_allocator.cpp
+?NFS_Allocate@internal@tbb@@YAPEAX_K0PEAX@Z
+?NFS_GetLineSize@internal@tbb@@YA_KXZ
+?NFS_Free@internal@tbb@@YAXPEAX@Z
+?allocate_via_handler_v3@internal@tbb@@YAPEAX_K@Z
+?deallocate_via_handler_v3@internal@tbb@@YAXPEAX@Z
+?is_malloc_used_v3@internal@tbb@@YA_NXZ
+
+
+; task.cpp v3
+?resize@affinity_partitioner_base_v3@internal@tbb@@AEAAXI@Z
+?allocate@allocate_additional_child_of_proxy@internal@tbb@@QEBAAEAVtask@3@_K@Z
+?allocate@allocate_child_proxy@internal@tbb@@QEBAAEAVtask@3@_K@Z
+?allocate@allocate_continuation_proxy@internal@tbb@@QEBAAEAVtask@3@_K@Z
+?allocate@allocate_root_proxy@internal@tbb@@SAAEAVtask@3@_K@Z
+?destroy@task@tbb@@QEAAXAEAV12@@Z
+?free@allocate_additional_child_of_proxy@internal@tbb@@QEBAXAEAVtask@3@@Z
+?free@allocate_child_proxy@internal@tbb@@QEBAXAEAVtask@3@@Z
+?free@allocate_continuation_proxy@internal@tbb@@QEBAXAEAVtask@3@@Z
+?free@allocate_root_proxy@internal@tbb@@SAXAEAVtask@3@@Z
+?internal_set_ref_count@task@tbb@@AEAAXH@Z
+?internal_decrement_ref_count@task@tbb@@AEAA_JXZ
+?is_owned_by_current_thread@task@tbb@@QEBA_NXZ
+?note_affinity@task@tbb@@UEAAXG@Z
+?self@task@tbb@@SAAEAV12@XZ
+?spawn_and_wait_for_all@task@tbb@@QEAAXAEAVtask_list@2@@Z
+?default_num_threads@task_scheduler_init@tbb@@SAHXZ
+?initialize@task_scheduler_init@tbb@@QEAAXH_K@Z
+?initialize@task_scheduler_init@tbb@@QEAAXH@Z
+?terminate@task_scheduler_init@tbb@@QEAAXXZ
+?observe@task_scheduler_observer_v3@internal@tbb@@QEAAX_N@Z
+
+; exception handling support
+#if __TBB_EXCEPTIONS
+?allocate@allocate_root_with_context_proxy@internal@tbb@@QEBAAEAVtask@3@_K@Z
+?free@allocate_root_with_context_proxy@internal@tbb@@QEBAXAEAVtask@3@@Z
+?is_group_execution_cancelled@task_group_context@tbb@@QEBA_NXZ
+?cancel_group_execution@task_group_context@tbb@@QEAA_NXZ
+?reset@task_group_context@tbb@@QEAAXXZ
+?init@task_group_context@tbb@@IEAAXXZ
+?register_pending_exception@task_group_context@tbb@@QEAAXXZ
+??1task_group_context@tbb@@QEAA@XZ
+?name@captured_exception@tbb@@UEBAPEBDXZ
+?what@captured_exception@tbb@@UEBAPEBDXZ
+??1captured_exception@tbb@@UEAA@XZ
+?move@captured_exception@tbb@@UEAAPEAV12@XZ
+?destroy@captured_exception@tbb@@UEAAXXZ
+?set@captured_exception@tbb@@QEAAXPEBD0@Z
+?clear@captured_exception@tbb@@QEAAXXZ
+#endif /* __TBB_EXCEPTIONS */
+
+; tbb_misc.cpp
+?assertion_failure@tbb@@YAXPEBDH00@Z
+?get_initial_auto_partitioner_divisor@internal@tbb@@YA_KXZ
+?handle_perror@internal@tbb@@YAXHPEBD@Z
+?set_assertion_handler@tbb@@YAP6AXPEBDH00@ZP6AX0H00@Z@Z
+?runtime_warning@internal@tbb@@YAXPEBDZZ
+TBB_runtime_interface_version
+?throw_bad_last_alloc_exception_v4@internal@tbb@@YAXXZ
+
+; itt_notify.cpp
+?itt_load_pointer_with_acquire_v3@internal@tbb@@YAPEAXPEBX@Z
+?itt_store_pointer_with_release_v3@internal@tbb@@YAXPEAX0@Z
+?itt_load_pointer_v3@internal@tbb@@YAPEAXPEBX@Z
+?itt_set_sync_name_v3@internal@tbb@@YAXPEAXPEB_W@Z
+
+; pipeline.cpp
+??_7pipeline@tbb@@6B@
+??0pipeline@tbb@@QEAA@XZ
+??1filter@tbb@@UEAA@XZ
+??1pipeline@tbb@@UEAA@XZ
+?add_filter@pipeline@tbb@@QEAAXAEAVfilter@2@@Z
+?clear@pipeline@tbb@@QEAAXXZ
+?inject_token@pipeline@tbb@@AEAAXAEAVtask@2@@Z
+?run@pipeline@tbb@@QEAAX_K@Z
+#if __TBB_EXCEPTIONS
+?run@pipeline@tbb@@QEAAX_KAEAVtask_group_context@2@@Z
+#endif
+?process_item@thread_bound_filter@tbb@@QEAA?AW4result_type@12@XZ
+?try_process_item@thread_bound_filter@tbb@@QEAA?AW4result_type@12@XZ
+
+; queuing_rw_mutex.cpp
+?internal_construct@queuing_rw_mutex@tbb@@QEAAXXZ
+?acquire@scoped_lock@queuing_rw_mutex@tbb@@QEAAXAEAV23@_N@Z
+?downgrade_to_reader@scoped_lock@queuing_rw_mutex@tbb@@QEAA_NXZ
+?release@scoped_lock@queuing_rw_mutex@tbb@@QEAAXXZ
+?upgrade_to_writer@scoped_lock@queuing_rw_mutex@tbb@@QEAA_NXZ
+?try_acquire@scoped_lock@queuing_rw_mutex@tbb@@QEAA_NAEAV23@_N@Z
+
+#if !TBB_NO_LEGACY
+; spin_rw_mutex.cpp v2
+?internal_itt_releasing@spin_rw_mutex@tbb@@CAXPEAV12@@Z
+?internal_acquire_writer@spin_rw_mutex@tbb@@CA_NPEAV12@@Z
+?internal_acquire_reader@spin_rw_mutex@tbb@@CAXPEAV12@@Z
+?internal_downgrade@spin_rw_mutex@tbb@@CAXPEAV12@@Z
+?internal_upgrade@spin_rw_mutex@tbb@@CA_NPEAV12@@Z
+?internal_release_reader@spin_rw_mutex@tbb@@CAXPEAV12@@Z
+?internal_release_writer@spin_rw_mutex@tbb@@CAXPEAV12@@Z
+?internal_try_acquire_writer@spin_rw_mutex@tbb@@CA_NPEAV12@@Z
+?internal_try_acquire_reader@spin_rw_mutex@tbb@@CA_NPEAV12@@Z
+#endif
+
+; spin_rw_mutex v3
+?internal_construct@spin_rw_mutex_v3@tbb@@AEAAXXZ
+?internal_upgrade@spin_rw_mutex_v3@tbb@@AEAA_NXZ
+?internal_downgrade@spin_rw_mutex_v3@tbb@@AEAAXXZ
+?internal_acquire_reader@spin_rw_mutex_v3@tbb@@AEAAXXZ
+?internal_acquire_writer@spin_rw_mutex_v3@tbb@@AEAA_NXZ
+?internal_release_reader@spin_rw_mutex_v3@tbb@@AEAAXXZ
+?internal_release_writer@spin_rw_mutex_v3@tbb@@AEAAXXZ
+?internal_try_acquire_reader@spin_rw_mutex_v3@tbb@@AEAA_NXZ
+?internal_try_acquire_writer@spin_rw_mutex_v3@tbb@@AEAA_NXZ
+
+; spin_mutex.cpp
+?internal_construct@spin_mutex@tbb@@QEAAXXZ
+?internal_acquire@scoped_lock@spin_mutex@tbb@@AEAAXAEAV23@@Z
+?internal_release@scoped_lock@spin_mutex@tbb@@AEAAXXZ
+?internal_try_acquire@scoped_lock@spin_mutex@tbb@@AEAA_NAEAV23@@Z
+
+; mutex.cpp
+?internal_acquire@scoped_lock@mutex@tbb@@AEAAXAEAV23@@Z
+?internal_release@scoped_lock@mutex@tbb@@AEAAXXZ
+?internal_try_acquire@scoped_lock@mutex@tbb@@AEAA_NAEAV23@@Z
+?internal_construct@mutex@tbb@@AEAAXXZ
+?internal_destroy@mutex@tbb@@AEAAXXZ
+
+; recursive_mutex.cpp
+?internal_construct@recursive_mutex@tbb@@AEAAXXZ
+?internal_destroy@recursive_mutex@tbb@@AEAAXXZ
+?internal_acquire@scoped_lock@recursive_mutex@tbb@@AEAAXAEAV23@@Z
+?internal_try_acquire@scoped_lock@recursive_mutex@tbb@@AEAA_NAEAV23@@Z
+?internal_release@scoped_lock@recursive_mutex@tbb@@AEAAXXZ
+
+; queuing_mutex.cpp
+?internal_construct@queuing_mutex@tbb@@QEAAXXZ
+?acquire@scoped_lock@queuing_mutex@tbb@@QEAAXAEAV23@@Z
+?release@scoped_lock@queuing_mutex@tbb@@QEAAXXZ
+?try_acquire@scoped_lock@queuing_mutex@tbb@@QEAA_NAEAV23@@Z
+
+#if !TBB_NO_LEGACY
+; concurrent_hash_map.cpp
+?internal_grow_predicate@hash_map_segment_base@internal@tbb@@QEBA_NXZ
+
+; concurrent_queue.cpp v2
+??0concurrent_queue_base@internal@tbb@@IEAA@_K@Z
+??0concurrent_queue_iterator_base@internal@tbb@@IEAA@AEBVconcurrent_queue_base@12@@Z
+??1concurrent_queue_base@internal@tbb@@MEAA@XZ
+??1concurrent_queue_iterator_base@internal@tbb@@IEAA@XZ
+?advance@concurrent_queue_iterator_base@internal@tbb@@IEAAXXZ
+?assign@concurrent_queue_iterator_base@internal@tbb@@IEAAXAEBV123@@Z
+?internal_pop@concurrent_queue_base@internal@tbb@@IEAAXPEAX@Z
+?internal_pop_if_present@concurrent_queue_base@internal@tbb@@IEAA_NPEAX@Z
+?internal_push@concurrent_queue_base@internal@tbb@@IEAAXPEBX@Z
+?internal_push_if_not_full@concurrent_queue_base@internal@tbb@@IEAA_NPEBX@Z
+?internal_set_capacity@concurrent_queue_base@internal@tbb@@IEAAX_J_K@Z
+?internal_size@concurrent_queue_base@internal@tbb@@IEBA_JXZ
+#endif
+
+; concurrent_queue v3
+??0concurrent_queue_iterator_base_v3@internal@tbb@@IEAA@AEBVconcurrent_queue_base_v3@12@@Z
+??1concurrent_queue_iterator_base_v3@internal@tbb@@IEAA@XZ
+?assign@concurrent_queue_iterator_base_v3@internal@tbb@@IEAAXAEBV123@@Z
+?advance@concurrent_queue_iterator_base_v3@internal@tbb@@IEAAXXZ
+??0concurrent_queue_base_v3@internal@tbb@@IEAA@_K@Z
+??1concurrent_queue_base_v3@internal@tbb@@MEAA@XZ
+?internal_push@concurrent_queue_base_v3@internal@tbb@@IEAAXPEBX@Z
+?internal_push_if_not_full@concurrent_queue_base_v3@internal@tbb@@IEAA_NPEBX@Z
+?internal_pop@concurrent_queue_base_v3@internal@tbb@@IEAAXPEAX@Z
+?internal_pop_if_present@concurrent_queue_base_v3@internal@tbb@@IEAA_NPEAX@Z
+?internal_size@concurrent_queue_base_v3@internal@tbb@@IEBA_JXZ
+?internal_empty@concurrent_queue_base_v3@internal@tbb@@IEBA_NXZ
+?internal_finish_clear@concurrent_queue_base_v3@internal@tbb@@IEAAXXZ
+?internal_set_capacity@concurrent_queue_base_v3@internal@tbb@@IEAAX_J_K@Z
+?internal_throw_exception@concurrent_queue_base_v3@internal@tbb@@IEBAXXZ
+?assign@concurrent_queue_base_v3@internal@tbb@@IEAAXAEBV123@@Z
+
+#if !TBB_NO_LEGACY
+; concurrent_vector.cpp v2
+?internal_assign@concurrent_vector_base@internal@tbb@@IEAAXAEBV123@_KP6AXPEAX1@ZP6AX2PEBX1@Z5@Z
+?internal_capacity@concurrent_vector_base@internal@tbb@@IEBA_KXZ
+?internal_clear@concurrent_vector_base@internal@tbb@@IEAAXP6AXPEAX_K@Z_N@Z
+?internal_copy@concurrent_vector_base@internal@tbb@@IEAAXAEBV123@_KP6AXPEAXPEBX1@Z@Z
+?internal_grow_by@concurrent_vector_base@internal@tbb@@IEAA_K_K0P6AXPEAX0@Z@Z
+?internal_grow_to_at_least@concurrent_vector_base@internal@tbb@@IEAAX_K0P6AXPEAX0@Z@Z
+?internal_push_back@concurrent_vector_base@internal@tbb@@IEAAPEAX_KAEA_K@Z
+?internal_reserve@concurrent_vector_base@internal@tbb@@IEAAX_K00@Z
+#endif
+
+; concurrent_vector v3
+??1concurrent_vector_base_v3@internal@tbb@@IEAA@XZ
+?internal_assign@concurrent_vector_base_v3@internal@tbb@@IEAAXAEBV123@_KP6AXPEAX1@ZP6AX2PEBX1@Z5@Z
+?internal_capacity@concurrent_vector_base_v3@internal@tbb@@IEBA_KXZ
+?internal_clear@concurrent_vector_base_v3@internal@tbb@@IEAA_KP6AXPEAX_K@Z@Z
+?internal_copy@concurrent_vector_base_v3@internal@tbb@@IEAAXAEBV123@_KP6AXPEAXPEBX1@Z@Z
+?internal_grow_by@concurrent_vector_base_v3@internal@tbb@@IEAA_K_K0P6AXPEAXPEBX0@Z2@Z
+?internal_grow_to_at_least@concurrent_vector_base_v3@internal@tbb@@IEAAX_K0P6AXPEAXPEBX0@Z2@Z
+?internal_push_back@concurrent_vector_base_v3@internal@tbb@@IEAAPEAX_KAEA_K@Z
+?internal_reserve@concurrent_vector_base_v3@internal@tbb@@IEAAX_K00@Z
+?internal_compact@concurrent_vector_base_v3@internal@tbb@@IEAAPEAX_KPEAXP6AX10@ZP6AX1PEBX0@Z@Z
+?internal_swap@concurrent_vector_base_v3@internal@tbb@@IEAAXAEAV123@@Z
+?internal_throw_exception@concurrent_vector_base_v3@internal@tbb@@IEBAX_K@Z
+?internal_resize@concurrent_vector_base_v3@internal@tbb@@IEAAX_K00PEBXP6AXPEAX0@ZP6AX210@Z@Z
+?internal_grow_to_at_least_with_result@concurrent_vector_base_v3@internal@tbb@@IEAA_K_K0P6AXPEAXPEBX0@Z2@Z
+
+; tbb_thread
+?allocate_closure_v3@internal@tbb@@YAPEAX_K@Z
+?detach@tbb_thread_v3@internal@tbb@@QEAAXXZ
+?free_closure_v3@internal@tbb@@YAXPEAX@Z
+?hardware_concurrency@tbb_thread_v3@internal@tbb@@SAIXZ
+?internal_start@tbb_thread_v3@internal@tbb@@AEAAXP6AIPEAX@Z0@Z
+?join@tbb_thread_v3@internal@tbb@@QEAAXXZ
+?move_v3@internal@tbb@@YAXAEAVtbb_thread_v3@12@0@Z
+?thread_get_id_v3@internal@tbb@@YA?AVid@tbb_thread_v3@12@XZ
+?thread_sleep_v3@internal@tbb@@YAXAEBVinterval_t@tick_count@2@@Z
+?thread_yield_v3@internal@tbb@@YAXXZ
diff --git a/dep/tbb/src/tbbmalloc/Customize.h b/dep/tbb/src/tbbmalloc/Customize.h
new file mode 100644
index 000000000..adc6d4c76
--- /dev/null
+++ b/dep/tbb/src/tbbmalloc/Customize.h
@@ -0,0 +1,120 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef _TBB_malloc_Customize_H_
+#define _TBB_malloc_Customize_H_
+
+/* Thread shutdown notification callback */
+/* redefine the name of the callback to meet TBB requirements
+   for externally visible names of service functions */
+#define mallocThreadShutdownNotification __TBB_mallocThreadShutdownNotification
+#define mallocProcessShutdownNotification __TBB_mallocProcessShutdownNotification
+
+extern "C" void mallocThreadShutdownNotification(void *);
+extern "C" void mallocProcessShutdownNotification(void);
+
+// customizing MALLOC_ASSERT macro
+#include "tbb/tbb_stddef.h"
+#define MALLOC_ASSERT(assertion, message) __TBB_ASSERT(assertion, message)
+
+#ifndef MALLOC_DEBUG
+#define MALLOC_DEBUG TBB_USE_DEBUG
+#endif
+
+#include "tbb/tbb_machine.h"
+
+#if DO_ITT_NOTIFY
+#include "tbb/itt_notify.h"
+#define MALLOC_ITT_SYNC_PREPARE(pointer) ITT_NOTIFY(sync_prepare, (pointer))
+#define MALLOC_ITT_SYNC_ACQUIRED(pointer) ITT_NOTIFY(sync_acquired, (pointer))
+#define MALLOC_ITT_SYNC_RELEASING(pointer) ITT_NOTIFY(sync_releasing, (pointer))
+#define MALLOC_ITT_SYNC_CANCEL(pointer) ITT_NOTIFY(sync_cancel, (pointer))
+#else
+#define MALLOC_ITT_SYNC_PREPARE(pointer) ((void)0)
+#define MALLOC_ITT_SYNC_ACQUIRED(pointer) ((void)0)
+#define MALLOC_ITT_SYNC_RELEASING(pointer) ((void)0)
+#define MALLOC_ITT_SYNC_CANCEL(pointer) ((void)0)
+#endif
+
+//! Stripped down version of spin_mutex.
+/** Instances of MallocMutex must be declared in memory that is zero-initialized.
+    There are no constructors.  This is a feature that lets it be
+    used in situations where the mutex might be used while file-scope constructors
+    are running.
+
+    There are no methods "acquire" or "release".  The scoped_lock must be used
+    in a strict block-scoped locking pattern.  Omitting these methods permitted
+    further simplication. */
+class MallocMutex {
+    unsigned char value;
+
+    //! Deny assignment
+    void operator=( MallocMutex& MallocMutex );
+public:
+    class scoped_lock {
+        const unsigned char value;
+        MallocMutex& mutex;
+    public:
+        scoped_lock( MallocMutex& m ) : value( __TBB_LockByte(m.value)), mutex(m) {}
+        ~scoped_lock() { __TBB_store_with_release(mutex.value, value); }
+    };
+    friend class scoped_lock;
+};
+
+inline intptr_t AtomicIncrement( volatile intptr_t& counter ) {
+    return __TBB_FetchAndAddW( &counter, 1 )+1;
+}
+
+inline uintptr_t AtomicAdd( volatile uintptr_t& counter, uintptr_t value ) {
+    return __TBB_FetchAndAddW( &counter, value );
+}
+
+inline intptr_t AtomicCompareExchange( volatile intptr_t& location, intptr_t new_value, intptr_t comparand) {
+    return __TBB_CompareAndSwapW( &location, new_value, comparand );
+}
+
+#define USE_DEFAULT_MEMORY_MAPPING 1
+
+// To support malloc replacement with LD_PRELOAD
+#include "proxy.h"
+
+#if MALLOC_LD_PRELOAD
+#define malloc_proxy __TBB_malloc_proxy
+extern "C" void * __TBB_malloc_proxy(size_t)  __attribute__ ((weak));
+#else
+const bool malloc_proxy = false;
+#endif
+
+namespace rml {
+namespace internal {
+    void init_tbbmalloc();
+} } // namespaces
+
+#define MALLOC_EXTRA_INITIALIZATION rml::internal::init_tbbmalloc()
+
+#endif /* _TBB_malloc_Customize_H_ */
diff --git a/dep/tbb/src/tbbmalloc/LifoQueue.h b/dep/tbb/src/tbbmalloc/LifoQueue.h
new file mode 100644
index 000000000..9a81dd9b7
--- /dev/null
+++ b/dep/tbb/src/tbbmalloc/LifoQueue.h
@@ -0,0 +1,97 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef _itt_common_malloc_LifoQueue_H_
+#define _itt_common_malloc_LifoQueue_H_
+
+#include "TypeDefinitions.h"
+#include <string.h> // for memset()
+
+//! Checking the synchronization method
+/** FINE_GRAIN_LOCKS is the only variant for now; should be defined for LifoQueue */
+#ifndef FINE_GRAIN_LOCKS
+#define FINE_GRAIN_LOCKS
+#endif
+
+namespace rml {
+
+namespace internal {
+
+class LifoQueue {
+public:
+    inline LifoQueue();
+    inline void push(void** ptr);
+    inline void* pop(void);
+
+private:
+    void * top;
+#ifdef FINE_GRAIN_LOCKS
+    MallocMutex lock;
+#endif /* FINE_GRAIN_LOCKS     */
+};
+
+#ifdef FINE_GRAIN_LOCKS
+/* LifoQueue assumes zero initialization so a vector of it can be created
+ * by just allocating some space with no call to constructor.
+ * On Linux, it seems to be necessary to avoid linking with C++ libraries.
+ *
+ * By usage convention there is no race on the initialization. */
+LifoQueue::LifoQueue( ) : top(NULL)
+{
+    // MallocMutex assumes zero initialization
+    memset(&lock, 0, sizeof(MallocMutex));
+}
+
+void LifoQueue::push( void **ptr )
+{   
+    MallocMutex::scoped_lock scoped_cs(lock);
+    *ptr = top;
+    top = ptr;
+}
+
+void * LifoQueue::pop( )
+{   
+    void **result=NULL;
+    {
+        MallocMutex::scoped_lock scoped_cs(lock);
+        if (!top) goto done;
+        result = (void **) top;
+        top = *result;
+    } 
+    *result = NULL;
+done:
+    return result;
+}
+
+#endif /* FINE_GRAIN_LOCKS     */
+
+} // namespace internal
+} // namespace rml
+
+#endif /* _itt_common_malloc_LifoQueue_H_ */
+
diff --git a/dep/tbb/src/tbbmalloc/MapMemory.h b/dep/tbb/src/tbbmalloc/MapMemory.h
new file mode 100644
index 000000000..64bf66b0c
--- /dev/null
+++ b/dep/tbb/src/tbbmalloc/MapMemory.h
@@ -0,0 +1,101 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef _itt_shared_malloc_MapMemory_H
+#define _itt_shared_malloc_MapMemory_H
+
+#if __linux__ || __APPLE__ || __sun || __FreeBSD__
+
+#if __sun && !defined(_XPG4_2)
+ // To have void* as mmap's 1st argument
+ #define _XPG4_2 1
+ #define XPG4_WAS_DEFINED 1
+#endif
+
+#include <sys/mman.h>
+
+#if XPG4_WAS_DEFINED
+ #undef _XPG4_2
+ #undef XPG4_WAS_DEFINED
+#endif
+
+#define MEMORY_MAPPING_USES_MALLOC 0
+void* MapMemory (size_t bytes)
+{
+    void* result = 0;
+#ifndef MAP_ANONYMOUS
+// Mac OS* X defines MAP_ANON, which is deprecated in Linux.
+#define MAP_ANONYMOUS MAP_ANON
+#endif /* MAP_ANONYMOUS */
+    result = mmap(result, bytes, (PROT_READ | PROT_WRITE), MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
+    return result==MAP_FAILED? 0: result;
+}
+
+int UnmapMemory(void *area, size_t bytes)
+{
+    return munmap(area, bytes);
+}
+
+#elif _WIN32 || _WIN64
+#include <windows.h>
+
+#define MEMORY_MAPPING_USES_MALLOC 0
+void* MapMemory (size_t bytes)
+{
+    /* Is VirtualAlloc thread safe? */
+    return VirtualAlloc(NULL, bytes, (MEM_RESERVE | MEM_COMMIT | MEM_TOP_DOWN), PAGE_READWRITE);
+}
+
+int UnmapMemory(void *area, size_t bytes)
+{
+    BOOL result = VirtualFree(area, 0, MEM_RELEASE);
+    return !result;
+}
+
+#else
+#include <stdlib.h>
+
+#define MEMORY_MAPPING_USES_MALLOC 1
+void* MapMemory (size_t bytes)
+{
+    return malloc( bytes );
+}
+
+int UnmapMemory(void *area, size_t bytes)
+{
+    free( area );
+    return 0;
+}
+
+#endif /* OS dependent */
+
+#if MALLOC_CHECK_RECURSION && MEMORY_MAPPING_USES_MALLOC
+#error Impossible to protect against malloc recursion when memory mapping uses malloc.
+#endif
+
+#endif /* _itt_shared_malloc_MapMemory_H */
diff --git a/dep/tbb/src/tbbmalloc/MemoryAllocator.cpp b/dep/tbb/src/tbbmalloc/MemoryAllocator.cpp
new file mode 100644
index 000000000..749efda36
--- /dev/null
+++ b/dep/tbb/src/tbbmalloc/MemoryAllocator.cpp
@@ -0,0 +1,2391 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+
+#include "TypeDefinitions.h" /* Also includes customization layer Customize.h */
+
+#if USE_PTHREAD
+    // Some pthreads documentation says that <pthreads.h> must be first header.
+    #include <pthread.h>
+    #define TlsSetValue_func pthread_setspecific
+    #define TlsGetValue_func pthread_getspecific
+    typedef pthread_key_t tls_key_t;
+    #include <sched.h>
+    inline void do_yield() {sched_yield();}
+
+#elif USE_WINTHREAD
+    #define _WIN32_WINNT 0x0400
+    #include <windows.h>
+    #define TlsSetValue_func TlsSetValue
+    #define TlsGetValue_func TlsGetValue
+    typedef DWORD tls_key_t;
+    inline void do_yield() {SwitchToThread();}
+
+#else
+    #error Must define USE_PTHREAD or USE_WINTHREAD
+
+#endif
+
+#include <stdio.h>
+#include <string.h>
+#include <errno.h>
+#include <stdlib.h>
+#if MALLOC_CHECK_RECURSION
+#include <new>        /* for placement new */
+#endif /* MALLOC_CHECK_RECURSION */
+
+extern "C" {
+    void * scalable_malloc(size_t size);
+    void   scalable_free(void *object);
+    void mallocThreadShutdownNotification(void*);
+}
+
+/********* Various compile-time options        **************/
+
+#define MALLOC_TRACE 0
+
+#if MALLOC_TRACE
+#define TRACEF(x) printf x
+#else
+#define TRACEF(x) ((void)0)
+#endif /* MALLOC_TRACE */
+
+#define ASSERT_TEXT NULL
+
+//! Define the main synchronization method
+/** It should be specified before including LifoQueue.h */
+#define FINE_GRAIN_LOCKS
+#include "LifoQueue.h"
+
+#define COLLECT_STATISTICS MALLOC_DEBUG && defined(MALLOCENV_COLLECT_STATISTICS)
+#include "Statistics.h"
+
+#define FREELIST_NONBLOCKING 1
+
+// If USE_MALLOC_FOR_LARGE_OBJECT is nonzero, then large allocations are done via malloc.
+// Otherwise large allocations are done using the scalable allocator's block allocator.
+// As of 06.Jun.17, using malloc is about 10x faster on Linux.
+#if !_WIN32
+#define USE_MALLOC_FOR_LARGE_OBJECT 1
+#endif
+
+/********* End compile-time options        **************/
+
+namespace rml {
+
+namespace internal {
+
+/******* A helper class to support overriding malloc with scalable_malloc *******/
+#if MALLOC_CHECK_RECURSION
+
+inline bool isMallocInitialized();
+
+class RecursiveMallocCallProtector {
+    // pointer to an automatic data of holding thread
+    static void       *autoObjPtr;
+    static MallocMutex rmc_mutex;
+    static pthread_t   owner_thread;
+/* Under FreeBSD 8.0 1st call to any pthread function including pthread_self
+   leads to pthread initialization, that causes malloc calls. As 1st usage of
+   RecursiveMallocCallProtector can be before pthread initialized, pthread calls
+   can't be used in 1st instance of RecursiveMallocCallProtector.
+   RecursiveMallocCallProtector is used 1st time in checkInitialization(),
+   so there is a guarantee that on 2nd usage pthread is initialized. 
+   No such situation observed with other supported OSes.
+ */
+#if __FreeBSD__
+    static bool        canUsePthread;
+#else
+    static const bool  canUsePthread = true;
+#endif
+/*
+  The variable modified in checkInitialization,
+  so can be read without memory barriers.
+ */
+    static bool mallocRecursionDetected;
+
+    MallocMutex::scoped_lock* lock_acquired;
+    char scoped_lock_space[sizeof(MallocMutex::scoped_lock)+1];
+
+    static uintptr_t absDiffPtr(void *x, void *y) {
+        uintptr_t xi = (uintptr_t)x, yi = (uintptr_t)y;
+        return xi > yi ? xi - yi : yi - xi;
+    }
+public:
+
+    RecursiveMallocCallProtector() : lock_acquired(NULL) {
+        lock_acquired = new (scoped_lock_space) MallocMutex::scoped_lock( rmc_mutex );
+        if (canUsePthread)
+            owner_thread = pthread_self();
+        autoObjPtr = &scoped_lock_space;
+    }
+    ~RecursiveMallocCallProtector() {
+        if (lock_acquired) {
+            autoObjPtr = NULL;
+            lock_acquired->~scoped_lock();
+        }
+    }
+    static bool sameThreadActive() {
+        if (!autoObjPtr) // fast path
+            return false;
+        // Some thread has an active recursive call protector; check if the current one.
+        // Exact pthread_self based test
+        if (canUsePthread)
+            if (pthread_equal( owner_thread, pthread_self() )) {
+                mallocRecursionDetected = true;
+                return true;
+            } else
+                return false;
+        // inexact stack size based test
+        const uintptr_t threadStackSz = 2*1024*1024;
+        int dummy;
+        return absDiffPtr(autoObjPtr, &dummy)<threadStackSz;
+    }
+    static bool noRecursion() {
+        MALLOC_ASSERT(isMallocInitialized(), 
+                      "Recursion status can be checked only when initialization was done.");
+        return !mallocRecursionDetected;
+    }
+/* The function is called on 1st scalable_malloc call to check if malloc calls
+   scalable_malloc (nested call must set mallocRecursionDetected). */
+    static void detectNaiveOverload() {
+        if (!malloc_proxy) {
+#if __FreeBSD__
+/* If !canUsePthread, we can't call pthread_self() before, but now pthread 
+   is already on, so can do it. False positives here lead to silent switching 
+   from malloc to mmap for all large allocations with bad performance impact. */
+            if (!canUsePthread) {
+                canUsePthread = true;
+                owner_thread = pthread_self();
+            }
+#endif
+            free(malloc(1));
+        }
+    }
+};
+
+
+MallocMutex RecursiveMallocCallProtector::rmc_mutex;
+pthread_t   RecursiveMallocCallProtector::owner_thread;
+void       *RecursiveMallocCallProtector::autoObjPtr;
+bool        RecursiveMallocCallProtector::mallocRecursionDetected;
+#if __FreeBSD__
+bool        RecursiveMallocCallProtector::canUsePthread;
+#endif
+
+#else
+
+class RecursiveMallocCallProtector {
+public:
+    RecursiveMallocCallProtector() {}
+    ~RecursiveMallocCallProtector() {}
+};
+
+#endif  /* MALLOC_CHECK_RECURSION */
+
+/*********** Code to provide thread ID and a thread-local void pointer **********/
+
+typedef intptr_t ThreadId;
+
+static ThreadId ThreadIdCount;
+
+static tls_key_t TLS_pointer_key;
+static tls_key_t Tid_key;
+
+static inline ThreadId  getThreadId(void)
+{
+    ThreadId result;
+    result = reinterpret_cast<ThreadId>(TlsGetValue_func(Tid_key));
+    if( !result ) {
+        RecursiveMallocCallProtector scoped;
+        // Thread-local value is zero -> first call from this thread,
+        // need to initialize with next ID value (IDs start from 1)
+        result = AtomicIncrement(ThreadIdCount); // returned new value!
+        TlsSetValue_func( Tid_key, reinterpret_cast<void*>(result) );
+    }
+    return result;
+}
+
+static inline void* getThreadMallocTLS() {
+    void *result;
+    result = TlsGetValue_func( TLS_pointer_key );
+// The assert below is incorrect: with lazy initialization, it fails on the first call of the function.
+//    MALLOC_ASSERT( result, "Memory allocator not initialized" );
+    return result;
+}
+
+static inline void  setThreadMallocTLS( void * newvalue ) {
+    RecursiveMallocCallProtector scoped;
+    TlsSetValue_func( TLS_pointer_key, newvalue );
+}
+
+/*********** End code to provide thread ID and a TLS pointer **********/
+
+/*
+ * The identifier to make sure that memory is allocated by scalable_malloc.
+ */
+const uint64_t theMallocUniqueID=0xE3C7AF89A1E2D8C1ULL; 
+
+/*
+ * This number of bins in the TLS that leads to blocks that we can allocate in.
+ */
+const uint32_t numBlockBinLimit = 32;
+
+ /*
+  * The number of bins to cache large objects.
+  */
+const uint32_t numLargeObjectBins = 1024; // for 1024 max cached size is near 8MB
+ 
+/********* The data structures and global objects        **************/
+
+struct FreeObject {
+    FreeObject  *next;
+};
+
+/*
+ * The following constant is used to define the size of struct Block, the block header.
+ * The intent is to have the size of a Block multiple of the cache line size, this allows us to
+ * get good alignment at the cost of some overhead equal to the amount of padding included in the Block.
+ */
+
+const int blockHeaderAlignment = 64; // a common size of a cache line
+
+struct Block;
+
+/* The 'next' field in the block header has to maintain some invariants:
+ *   it needs to be on a 16K boundary and the first field in the block.
+ *   Any value stored there needs to have the lower 14 bits set to 0
+ *   so that various assert work. This means that if you want to smash this memory
+ *   for debugging purposes you will need to obey this invariant.
+ * The total size of the header needs to be a power of 2 to simplify
+ * the alignment requirements. For now it is a 128 byte structure.
+ * To avoid false sharing, the fields changed only locally are separated 
+ * from the fields changed by foreign threads.
+ * Changing the size of the block header would require to change
+ * some bin allocation sizes, in particular "fitting" sizes (see above).
+ */
+
+struct LocalBlockFields {
+    Block       *next;            /* This field needs to be on a 16K boundary and the first field in the block
+                                     for LIFO lists to work. */
+    uint64_t     mallocUniqueID;  /* The field to identify memory allocated by scalable_malloc */
+    Block       *previous;        /* Use double linked list to speed up removal */
+    unsigned int objectSize;
+    unsigned int owner;
+    FreeObject  *bumpPtr;         /* Bump pointer moves from the end to the beginning of a block */
+    FreeObject  *freeList;
+    unsigned int allocatedCount;  /* Number of objects allocated (obviously by the owning thread) */
+    unsigned int isFull;
+};
+
+struct Block : public LocalBlockFields {
+    size_t       __pad_local_fields[(blockHeaderAlignment-sizeof(LocalBlockFields))/sizeof(size_t)];
+    FreeObject  *publicFreeList;
+    Block       *nextPrivatizable;
+    size_t       __pad_public_fields[(blockHeaderAlignment-2*sizeof(void*))/sizeof(size_t)];
+};
+
+struct Bin {
+    Block      *activeBlk;
+    Block      *mailbox;
+    MallocMutex mailLock;
+};
+
+/*
+ * This is a LIFO linked list that one can init, push or pop from
+ */
+static LifoQueue freeBlockList;
+
+/*
+ * When a block that is not completely free is returned for reuse by other threads
+ * this is where the block goes.
+ *
+ * LifoQueue assumes zero initialization; so below its constructors are omitted,
+ * to avoid linking with C++ libraries on Linux.
+ */
+static char globalBinSpace[sizeof(LifoQueue)*numBlockBinLimit];
+static LifoQueue* globalSizeBins = (LifoQueue*)globalBinSpace;
+
+static struct LargeObjectCacheStat {
+    uintptr_t age;
+    size_t cacheSize;
+} loCacheStat;
+
+struct CachedObject {
+    CachedObject *next,
+                 *prev;
+    uintptr_t     age;
+    bool          fromMapMemory;
+};
+
+class CachedObjectsList {
+    CachedObject *first,
+                 *last;
+    /* age of an oldest object in the list; equal to last->age, if last defined,
+       used for quick cheching it without acquiring the lock. */
+    uintptr_t     oldest;
+    /* currAge when something was excluded out of list because of the age,
+       not because of cache hit */
+    uintptr_t     lastCleanedAge;
+    /* Current threshold value for the objects of a particular size. 
+       Set on cache miss. */
+    uintptr_t     ageThreshold;
+
+    MallocMutex   lock;
+    /* CachedObjectsList should be placed in zero-initialized memory,
+       ctor not needed. */
+    CachedObjectsList();
+public:
+    inline void push(void *buf, bool fromMapMemory, uintptr_t currAge);
+    inline CachedObject* pop(uintptr_t currAge);
+    void releaseLastIfOld(uintptr_t currAge, size_t size);
+};
+
+/*
+ * Array of bins with lists of recently freed objects cached for re-use.
+ */
+static char globalCachedObjectBinsSpace[sizeof(CachedObjectsList)*numLargeObjectBins];
+static CachedObjectsList* globalCachedObjectBins = (CachedObjectsList*)globalCachedObjectBinsSpace;
+
+/********* End of the data structures                    **************/
+
+/********** Various numeric parameters controlling allocations ********/
+
+/*
+ * The size of the TLS should be enough to hold numBlockBinLimit bins.
+ */
+const uint32_t tlsSize = numBlockBinLimit * sizeof(Bin);
+
+/*
+ * blockSize - the size of a block, it must be larger than maxSegregatedObjectSize.
+ *
+ */
+const uintptr_t blockSize = 16*1024;
+
+/*
+ * There are bins for all 8 byte aligned objects less than this segregated size; 8 bins in total
+ */
+const uint32_t minSmallObjectIndex = 0;
+const uint32_t numSmallObjectBins = 8;
+const uint32_t maxSmallObjectSize = 64;
+
+/*
+ * There are 4 bins between each couple of powers of 2 [64-128-256-...]
+ * from maxSmallObjectSize till this size; 16 bins in total
+ */
+const uint32_t minSegregatedObjectIndex = minSmallObjectIndex+numSmallObjectBins;
+const uint32_t numSegregatedObjectBins = 16;
+const uint32_t maxSegregatedObjectSize = 1024;
+
+/*
+ * And there are 5 bins with the following allocation sizes: 1792, 2688, 3968, 5376, 8064.
+ * They selected to fit 9, 6, 4, 3, and 2 sizes per a block, and also are multiples of 128.
+ * If sizeof(Block) changes from 128, these sizes require close attention!
+ */
+const uint32_t minFittingIndex = minSegregatedObjectIndex+numSegregatedObjectBins;
+const uint32_t numFittingBins = 5;
+
+const uint32_t fittingAlignment = 128;
+
+#define SET_FITTING_SIZE(N) ( (blockSize-sizeof(Block))/N ) & ~(fittingAlignment-1)
+const uint32_t fittingSize1 = SET_FITTING_SIZE(9);
+const uint32_t fittingSize2 = SET_FITTING_SIZE(6);
+const uint32_t fittingSize3 = SET_FITTING_SIZE(4);
+const uint32_t fittingSize4 = SET_FITTING_SIZE(3);
+const uint32_t fittingSize5 = SET_FITTING_SIZE(2);
+#undef SET_FITTING_SIZE
+
+/*
+ * The total number of thread-specific Block-based bins
+ */
+const uint32_t numBlockBins = minFittingIndex+numFittingBins;
+
+/*
+ * Objects of this size and larger are considered large objects.
+ */
+const uint32_t minLargeObjectSize = fittingSize5 + 1;
+
+/*
+ * Block::objectSize value used to mark blocks allocated by startupAlloc
+ */
+const unsigned int startupAllocObjSizeMark = ~(unsigned int)0;
+
+/*
+ * Difference between object sizes in large object bins
+ */
+const uint32_t largeObjectCacheStep = 8*1024;
+
+/*
+ * Object cache cleanup frequency.
+ * It should be power of 2 for the fast checking.
+ */
+const unsigned cacheCleanupFreq = 256;
+
+/*
+ * Get virtual memory in pieces of this size: 0x0100000 is 1 megabyte decimal
+ */
+static size_t mmapRequestSize = 0x0100000;
+
+/********** End of numeric parameters controlling allocations *********/
+
+#if !MALLOC_DEBUG
+#if __INTEL_COMPILER || _MSC_VER
+#define NOINLINE(decl) __declspec(noinline) decl
+#define ALWAYSINLINE(decl) __forceinline decl
+#elif __GNUC__
+#define NOINLINE(decl) decl __attribute__ ((noinline))
+#define ALWAYSINLINE(decl) decl __attribute__ ((always_inline))
+#else
+#define NOINLINE(decl) decl
+#define ALWAYSINLINE(decl) decl
+#endif
+
+static NOINLINE( Block* getPublicFreeListBlock(Bin* bin) );
+static NOINLINE( void moveBlockToBinFront(Block *block) );
+static NOINLINE( void processLessUsedBlock(Block *block) );
+
+static ALWAYSINLINE( Bin* getAllocationBin(size_t size) );
+static ALWAYSINLINE( void checkInitialization() );
+
+#undef ALWAYSINLINE
+#undef NOINLINE
+#endif /* !MALLOC_DEBUG */
+
+/*********** Code to acquire memory from the OS or other executive ****************/
+
+#if USE_DEFAULT_MEMORY_MAPPING
+#include "MapMemory.h"
+#else
+/* assume MapMemory and UnmapMemory are customized */
+#endif
+
+#if USE_MALLOC_FOR_LARGE_OBJECT
+
+// (get|free)RawMemory only necessary for the USE_MALLOC_FOR_LARGE_OBJECT case
+static inline void* getRawMemory (size_t size, bool alwaysUseMap = false)
+{
+    void *object;
+
+    if (alwaysUseMap) 
+        object = MapMemory(size);
+    else
+#if MALLOC_CHECK_RECURSION
+    if (RecursiveMallocCallProtector::noRecursion())
+        object = malloc(size);
+    else if ( rml::internal::original_malloc_found )
+        object = (*rml::internal::original_malloc_ptr)(size);
+    else
+        object = MapMemory(size);
+#else
+    object = malloc(size);
+#endif /* MALLOC_CHECK_RECURSION */
+    return object;
+}
+
+static inline void freeRawMemory (void *object, size_t size, bool alwaysUseMap)
+{
+    if (alwaysUseMap)
+        UnmapMemory(object, size);
+    else
+#if MALLOC_CHECK_RECURSION
+    if (RecursiveMallocCallProtector::noRecursion())
+        free(object);
+    else if ( rml::internal::original_malloc_found )
+        (*rml::internal::original_free_ptr)(object);
+    else
+        UnmapMemory(object, size);
+#else
+    free(object);
+#endif /* MALLOC_CHECK_RECURSION */
+}
+
+#else /* USE_MALLOC_FOR_LARGE_OBJECT */
+
+static inline void* getRawMemory (size_t size, bool = false) { return MapMemory(size); }
+
+static inline void freeRawMemory (void *object, size_t size, bool) {
+    UnmapMemory(object, size);
+}
+
+#endif /* USE_MALLOC_FOR_LARGE_OBJECT */
+
+/********* End memory acquisition code ********************************/
+
+/********* Now some rough utility code to deal with indexing the size bins. **************/
+
+/*
+ * Given a number return the highest non-zero bit in it. It is intended to work with 32-bit values only.
+ * Moreover, on IPF, for sake of simplicity and performance, it is narrowed to only serve for 64 to 1023.
+ * This is enough for current algorithm of distribution of sizes among bins.
+ */
+#if _WIN64 && _MSC_VER>=1400 && !__INTEL_COMPILER
+extern "C" unsigned char _BitScanReverse( unsigned long* i, unsigned long w );
+#pragma intrinsic(_BitScanReverse)
+#endif
+static inline unsigned int highestBitPos(unsigned int n)
+{
+    unsigned int pos;
+#if __ARCH_x86_32||__ARCH_x86_64
+
+# if __linux__||__APPLE__||__FreeBSD__||__sun||__MINGW32__
+    __asm__ ("bsr %1,%0" : "=r"(pos) : "r"(n));
+# elif (_WIN32 && (!_WIN64 || __INTEL_COMPILER))
+    __asm
+    {
+        bsr eax, n
+        mov pos, eax
+    }
+# elif _WIN64 && _MSC_VER>=1400
+    _BitScanReverse((unsigned long*)&pos, (unsigned long)n);
+# else
+#   error highestBitPos() not implemented for this platform
+# endif
+
+#elif __ARCH_ipf || __ARCH_other
+    static unsigned int bsr[16] = {0,6,7,7,8,8,8,8,9,9,9,9,9,9,9,9};
+    MALLOC_ASSERT( n>=64 && n<1024, ASSERT_TEXT );
+    pos = bsr[ n>>6 ];
+#else
+#   error highestBitPos() not implemented for this platform
+#endif /* __ARCH_* */
+    return pos;
+}
+
+/*
+ * Depending on indexRequest, for a given size return either the index into the bin
+ * for objects of this size, or the actual size of objects in this bin.
+ */
+template<bool indexRequest>
+static unsigned int getIndexOrObjectSize (unsigned int size)
+{
+    if (size <= maxSmallObjectSize) { // selection from 4/8/16/24/32/40/48/56/64
+         /* Index 0 holds up to 8 bytes, Index 1 16 and so forth */
+        return indexRequest ? (size - 1) >> 3 : alignUp(size,8);
+    }
+    else if (size <= maxSegregatedObjectSize ) { // 80/96/112/128 / 160/192/224/256 / 320/384/448/512 / 640/768/896/1024
+        unsigned int order = highestBitPos(size-1); // which group of bin sizes?
+        MALLOC_ASSERT( 6<=order && order<=9, ASSERT_TEXT );
+        if (indexRequest)
+            return minSegregatedObjectIndex - (4*6) - 4 + (4*order) + ((size-1)>>(order-2));
+        else {
+            unsigned int alignment = 128 >> (9-order); // alignment in the group
+            MALLOC_ASSERT( alignment==16 || alignment==32 || alignment==64 || alignment==128, ASSERT_TEXT );
+            return alignUp(size,alignment);
+        }
+    }
+    else {
+        if( size <= fittingSize3 ) {
+            if( size <= fittingSize2 ) {
+                if( size <= fittingSize1 )
+                    return indexRequest ? minFittingIndex : fittingSize1; 
+                else
+                    return indexRequest ? minFittingIndex+1 : fittingSize2;
+            } else
+                return indexRequest ? minFittingIndex+2 : fittingSize3;
+        } else {
+            if( size <= fittingSize5 ) {
+                if( size <= fittingSize4 )
+                    return indexRequest ? minFittingIndex+3 : fittingSize4;
+                else
+                    return indexRequest ? minFittingIndex+4 : fittingSize5;
+            } else {
+                MALLOC_ASSERT( 0,ASSERT_TEXT ); // this should not happen
+                return ~0U;
+            }
+        }
+    }
+}
+
+static unsigned int getIndex (unsigned int size)
+{
+    return getIndexOrObjectSize</*indexRequest*/true>(size);
+}
+
+static unsigned int getObjectSize (unsigned int size)
+{
+    return getIndexOrObjectSize</*indexRequest*/false>(size);
+}
+
+/*
+ * Initialization code.
+ *
+ */
+
+/*
+ * Big Blocks are the blocks we get from the OS or some similar place using getMemory above.
+ * They are placed on the freeBlockList once they are acquired.
+ */
+
+static inline void *alignBigBlock(void *unalignedBigBlock)
+{
+    void *alignedBigBlock;
+    /* align the entireHeap so all blocks are aligned. */
+    alignedBigBlock = alignUp(unalignedBigBlock, blockSize);
+    return alignedBigBlock;
+}
+
+/* Divide the big block into smaller bigBlocks that hold this many blocks.
+ * This is done since we really need a lot of blocks on the freeBlockList or there will be
+ * contention problems.
+ */
+const unsigned int blocksPerBigBlock = 16;
+
+/* Returns 0 if unsuccessful, otherwise 1. */
+static int mallocBigBlock()
+{
+    void *unalignedBigBlock;
+    void *alignedBigBlock;
+    void *bigBlockCeiling;
+    Block *splitBlock;
+    void *splitEdge;
+    size_t bigBlockSplitSize;
+
+    unalignedBigBlock = getRawMemory(mmapRequestSize, /*alwaysUseMap=*/true);
+
+    if (!unalignedBigBlock) {
+        TRACEF(( "[ScalableMalloc trace] in mallocBigBlock, getMemory returns 0\n" ));
+        /* We can't get any more memory from the OS or executive so return 0 */
+        return 0;
+    }
+
+    alignedBigBlock = alignBigBlock(unalignedBigBlock);
+    bigBlockCeiling = (void*)((uintptr_t)unalignedBigBlock + mmapRequestSize);
+
+    bigBlockSplitSize = blocksPerBigBlock * blockSize;
+
+    splitBlock = (Block*)alignedBigBlock;
+
+    while ( ((uintptr_t)splitBlock + blockSize) <= (uintptr_t)bigBlockCeiling ) {
+        splitEdge = (void*)((uintptr_t)splitBlock + bigBlockSplitSize);
+        if( splitEdge > bigBlockCeiling) {
+            splitEdge = alignDown(bigBlockCeiling, blockSize);
+        }
+        splitBlock->bumpPtr = (FreeObject*)splitEdge;
+        freeBlockList.push((void**) splitBlock);
+        splitBlock = (Block*)splitEdge;
+    }
+
+    TRACEF(( "[ScalableMalloc trace] in mallocBigBlock returning 1\n" ));
+    return 1;
+}
+
+/*
+ * The malloc routines themselves need to be able to occasionally malloc some space,
+ * in order to set up the structures used by the thread local structures. This
+ * routine preforms that fuctions.
+ */
+
+/*
+ * Forward Refs
+ */
+static void initEmptyBlock(Block *block, size_t size);
+static Block *getEmptyBlock(size_t size);
+
+static MallocMutex bootStrapLock;
+
+static Block      *bootStrapBlock = NULL;
+static Block      *bootStrapBlockUsed = NULL;
+static FreeObject *bootStrapObjectList = NULL; 
+
+static void *bootStrapMalloc(size_t size)
+{
+    FreeObject *result;
+
+    MALLOC_ASSERT( size == tlsSize, ASSERT_TEXT );
+
+    { // Lock with acquire
+        MallocMutex::scoped_lock scoped_cs(bootStrapLock);
+
+        if( bootStrapObjectList) {
+            result = bootStrapObjectList;
+            bootStrapObjectList = bootStrapObjectList->next;
+        } else {
+            if (!bootStrapBlock) {
+                bootStrapBlock = getEmptyBlock(size);
+                if (!bootStrapBlock) return NULL;
+            }
+            result = bootStrapBlock->bumpPtr;
+            bootStrapBlock->bumpPtr = (FreeObject *)((uintptr_t)bootStrapBlock->bumpPtr - bootStrapBlock->objectSize);
+            if ((uintptr_t)bootStrapBlock->bumpPtr < (uintptr_t)bootStrapBlock+sizeof(Block)) {
+                bootStrapBlock->bumpPtr = NULL;
+                bootStrapBlock->next = bootStrapBlockUsed;
+                bootStrapBlockUsed = bootStrapBlock;
+                bootStrapBlock = NULL;
+            }
+        }
+    } // Unlock with release
+
+    memset (result, 0, size);
+    return (void*)result;
+}
+
+static void bootStrapFree(void* ptr)
+{
+    MALLOC_ASSERT( ptr, ASSERT_TEXT );
+    { // Lock with acquire
+        MallocMutex::scoped_lock scoped_cs(bootStrapLock);
+        ((FreeObject*)ptr)->next = bootStrapObjectList;
+        bootStrapObjectList = (FreeObject*)ptr;
+    } // Unlock with release
+}
+
+/********* End rough utility code  **************/
+
+/********* Thread and block related code      *************/
+
+#if MALLOC_DEBUG>1
+/* The debug version verifies the TLSBin as needed */
+static void verifyTLSBin (Bin* bin, size_t size)
+{
+    Block* temp;
+    Bin*   tls;
+    uint32_t index = getIndex(size);
+    uint32_t objSize = getObjectSize(size);
+
+    tls = (Bin*)getThreadMallocTLS();
+    MALLOC_ASSERT( bin == tls+index, ASSERT_TEXT );
+
+    if (tls[index].activeBlk) {
+        MALLOC_ASSERT( tls[index].activeBlk->mallocUniqueID==theMallocUniqueID, ASSERT_TEXT );
+        MALLOC_ASSERT( tls[index].activeBlk->owner == getThreadId(), ASSERT_TEXT );
+        MALLOC_ASSERT( tls[index].activeBlk->objectSize == objSize, ASSERT_TEXT );
+
+        for (temp = tls[index].activeBlk->next; temp; temp=temp->next) {
+            MALLOC_ASSERT( temp!=tls[index].activeBlk, ASSERT_TEXT );
+            MALLOC_ASSERT( temp->mallocUniqueID==theMallocUniqueID, ASSERT_TEXT );
+            MALLOC_ASSERT( temp->owner == getThreadId(), ASSERT_TEXT );
+            MALLOC_ASSERT( temp->objectSize == objSize, ASSERT_TEXT );
+            MALLOC_ASSERT( temp->previous->next == temp, ASSERT_TEXT );
+            if (temp->next) {
+                MALLOC_ASSERT( temp->next->previous == temp, ASSERT_TEXT );
+            }
+        }
+        for (temp = tls[index].activeBlk->previous; temp; temp=temp->previous) {
+            MALLOC_ASSERT( temp!=tls[index].activeBlk, ASSERT_TEXT );
+            MALLOC_ASSERT( temp->mallocUniqueID==theMallocUniqueID, ASSERT_TEXT );
+            MALLOC_ASSERT( temp->owner == getThreadId(), ASSERT_TEXT );
+            MALLOC_ASSERT( temp->objectSize == objSize, ASSERT_TEXT );
+            MALLOC_ASSERT( temp->next->previous == temp, ASSERT_TEXT );
+            if (temp->previous) {
+                MALLOC_ASSERT( temp->previous->next == temp, ASSERT_TEXT );
+            }
+        }
+    }
+}
+#else
+inline static void verifyTLSBin (Bin*, size_t) {}
+#endif /* MALLOC_DEBUG>1 */
+
+/*
+ * Add a block to the start of this tls bin list.
+ */
+static void pushTLSBin (Bin* bin, Block* block)
+{
+    /* The objectSize should be defined and not a parameter
+       because the function is applied to partially filled blocks as well */
+    unsigned int size = block->objectSize;
+    Block* activeBlk;
+
+    MALLOC_ASSERT( block->owner == getThreadId(), ASSERT_TEXT );
+    MALLOC_ASSERT( block->objectSize != 0, ASSERT_TEXT );
+    MALLOC_ASSERT( block->next == NULL, ASSERT_TEXT );
+    MALLOC_ASSERT( block->previous == NULL, ASSERT_TEXT );
+
+    MALLOC_ASSERT( bin, ASSERT_TEXT );
+    verifyTLSBin(bin, size);
+    activeBlk = bin->activeBlk;
+
+    block->next = activeBlk;
+    if( activeBlk ) {
+        block->previous = activeBlk->previous;
+        activeBlk->previous = block;
+        if( block->previous )
+            block->previous->next = block;
+    } else {
+        bin->activeBlk = block;
+    }
+
+    verifyTLSBin(bin, size);
+}
+
+/*
+ * Take a block out of its tls bin (e.g. before removal).
+ */
+static void outofTLSBin (Bin* bin, Block* block)
+{
+    unsigned int size = block->objectSize;
+
+    MALLOC_ASSERT( block->owner == getThreadId(), ASSERT_TEXT );
+    MALLOC_ASSERT( block->objectSize != 0, ASSERT_TEXT );
+
+    MALLOC_ASSERT( bin, ASSERT_TEXT );
+    verifyTLSBin(bin, size);
+
+    if (block == bin->activeBlk) {
+        bin->activeBlk = block->previous? block->previous : block->next;
+    }
+    /* Delink the block */
+    if (block->previous) {
+        MALLOC_ASSERT( block->previous->next == block, ASSERT_TEXT );
+        block->previous->next = block->next;
+    }
+    if (block->next) {
+        MALLOC_ASSERT( block->next->previous == block, ASSERT_TEXT );
+        block->next->previous = block->previous;
+    }
+    block->next = NULL;
+    block->previous = NULL;
+
+    verifyTLSBin(bin, size);
+}
+
+/*
+ * Return the bin for the given size. If the TLS bin structure is absent, create it.
+ */
+static Bin* getAllocationBin(size_t size)
+{
+    Bin* tls = (Bin*)getThreadMallocTLS();
+    if( !tls ) {
+        MALLOC_ASSERT( tlsSize >= sizeof(Bin) * numBlockBins, ASSERT_TEXT );
+        tls = (Bin*) bootStrapMalloc(tlsSize);
+        if ( !tls ) return NULL;
+        /* the block contains zeroes after bootStrapMalloc, so bins are initialized */
+#if MALLOC_DEBUG
+        for (int i = 0; i < numBlockBinLimit; i++) {
+            MALLOC_ASSERT( tls[i].activeBlk == 0, ASSERT_TEXT );
+            MALLOC_ASSERT( tls[i].mailbox == 0, ASSERT_TEXT );
+        }
+#endif
+        setThreadMallocTLS(tls);
+    }
+    MALLOC_ASSERT( tls, ASSERT_TEXT );
+    return tls+getIndex(size);
+}
+
+const float emptyEnoughRatio = 1.0 / 4.0; /* "Reactivate" a block if this share of its objects is free. */
+
+static unsigned int emptyEnoughToUse (Block *mallocBlock)
+{
+    const float threshold = (blockSize - sizeof(Block)) * (1-emptyEnoughRatio);
+
+    if (mallocBlock->bumpPtr) {
+        /* If we are still using a bump ptr for this block it is empty enough to use. */
+        STAT_increment(mallocBlock->owner, getIndex(mallocBlock->objectSize), examineEmptyEnough);
+        mallocBlock->isFull = 0;
+        return 1;
+    }
+
+    /* allocatedCount shows how many objects in the block are in use; however it still counts
+       blocks freed by other threads; so prior call to privatizePublicFreeList() is recommended */
+    mallocBlock->isFull = (mallocBlock->allocatedCount*mallocBlock->objectSize > threshold)? 1: 0;
+#if COLLECT_STATISTICS
+    if (mallocBlock->isFull)
+        STAT_increment(mallocBlock->owner, getIndex(mallocBlock->objectSize), examineNotEmpty);
+    else
+        STAT_increment(mallocBlock->owner, getIndex(mallocBlock->objectSize), examineEmptyEnough);
+#endif
+    return 1-mallocBlock->isFull;
+}
+
+/* Restore the bump pointer for an empty block that is planned to use */
+static void restoreBumpPtr (Block *block)
+{
+    MALLOC_ASSERT( block->allocatedCount == 0, ASSERT_TEXT );
+    MALLOC_ASSERT( block->publicFreeList == NULL, ASSERT_TEXT );
+    STAT_increment(block->owner, getIndex(block->objectSize), freeRestoreBumpPtr);
+    block->bumpPtr = (FreeObject *)((uintptr_t)block + blockSize - block->objectSize);
+    block->freeList = NULL;
+    block->isFull = 0;
+}
+
+#if !(FREELIST_NONBLOCKING)
+static MallocMutex publicFreeListLock; // lock for changes of publicFreeList
+#endif
+
+const uintptr_t UNUSABLE = 0x1;
+inline bool isSolidPtr( void* ptr )
+{
+    return (UNUSABLE|(uintptr_t)ptr)!=UNUSABLE;
+}
+inline bool isNotForUse( void* ptr )
+{
+    return (uintptr_t)ptr==UNUSABLE;
+}
+
+static void freePublicObject (Block *block, FreeObject *objectToFree)
+{
+    Bin* theBin;
+    FreeObject *publicFreeList;
+
+#if FREELIST_NONBLOCKING
+    FreeObject *temp = block->publicFreeList;
+    MALLOC_ITT_SYNC_RELEASING(&block->publicFreeList);
+    do {
+        publicFreeList = objectToFree->next = temp;
+        temp = (FreeObject*)AtomicCompareExchange(
+                                (intptr_t&)block->publicFreeList,
+                                (intptr_t)objectToFree, (intptr_t)publicFreeList );
+        // no backoff necessary because trying to make change, not waiting for a change
+    } while( temp != publicFreeList );
+#else
+    STAT_increment(getThreadId(), ThreadCommonCounters, lockPublicFreeList);
+    {
+        MallocMutex::scoped_lock scoped_cs(publicFreeListLock);
+        publicFreeList = objectToFree->next = block->publicFreeList;
+        block->publicFreeList = objectToFree;
+    }
+#endif
+
+    if( publicFreeList==NULL ) {
+        // if the block is abandoned, its nextPrivatizable pointer should be UNUSABLE
+        // otherwise, it should point to the bin the block belongs to.
+        // reading nextPrivatizable is thread-safe below, because:
+        // 1) the executing thread atomically got publicFreeList==NULL and changed it to non-NULL;
+        // 2) only owning thread can change it back to NULL,
+        // 3) but it can not be done until the block is put to the mailbox
+        // So the executing thread is now the only one that can change nextPrivatizable
+        if( !isNotForUse(block->nextPrivatizable) ) {
+            MALLOC_ASSERT( block->nextPrivatizable!=NULL, ASSERT_TEXT );
+            MALLOC_ASSERT( block->owner!=0, ASSERT_TEXT );
+            theBin = (Bin*) block->nextPrivatizable;
+            MallocMutex::scoped_lock scoped_cs(theBin->mailLock);
+            block->nextPrivatizable = theBin->mailbox;
+            theBin->mailbox = block;
+        } else {
+            MALLOC_ASSERT( block->owner==0, ASSERT_TEXT );
+        }
+    }
+    STAT_increment(getThreadId(), ThreadCommonCounters, freeToOtherThread);
+    STAT_increment(block->owner, getIndex(block->objectSize), freeByOtherThread);
+}
+
+static void privatizePublicFreeList (Block *mallocBlock)
+{
+    FreeObject *temp, *publicFreeList;
+
+    MALLOC_ASSERT( mallocBlock->owner == getThreadId(), ASSERT_TEXT );
+#if FREELIST_NONBLOCKING
+    temp = mallocBlock->publicFreeList;
+    do {
+        publicFreeList = temp;
+        temp = (FreeObject*)AtomicCompareExchange(
+                                (intptr_t&)mallocBlock->publicFreeList,
+                                0, (intptr_t)publicFreeList);
+        // no backoff necessary because trying to make change, not waiting for a change
+    } while( temp != publicFreeList );
+    MALLOC_ITT_SYNC_ACQUIRED(&mallocBlock->publicFreeList);
+#else
+    STAT_increment(mallocBlock->owner, ThreadCommonCounters, lockPublicFreeList);
+    {
+        MallocMutex::scoped_lock scoped_cs(publicFreeListLock);
+        publicFreeList = mallocBlock->publicFreeList;
+        mallocBlock->publicFreeList = NULL;
+    }
+    temp = publicFreeList;
+#endif
+
+    MALLOC_ASSERT( publicFreeList && publicFreeList==temp, ASSERT_TEXT ); // there should be something in publicFreeList!
+    if( !isNotForUse(temp) ) { // return/getPartialBlock could set it to UNUSABLE
+        MALLOC_ASSERT( mallocBlock->allocatedCount <= (blockSize-sizeof(Block))/mallocBlock->objectSize, ASSERT_TEXT );
+        /* other threads did not change the counter freeing our blocks */
+        mallocBlock->allocatedCount--;
+        while( isSolidPtr(temp->next) ){ // the list will end with either NULL or UNUSABLE
+            temp = temp->next;
+            mallocBlock->allocatedCount--;
+        }
+        MALLOC_ASSERT( mallocBlock->allocatedCount < (blockSize-sizeof(Block))/mallocBlock->objectSize, ASSERT_TEXT );
+        /* merge with local freeList */
+        temp->next = mallocBlock->freeList;
+        mallocBlock->freeList = publicFreeList;
+        STAT_increment(mallocBlock->owner, getIndex(mallocBlock->objectSize), allocPrivatized);
+    }
+}
+
+static Block* getPublicFreeListBlock (Bin* bin)
+{
+    Block* block;
+    MALLOC_ASSERT( bin, ASSERT_TEXT );
+// the counter should be changed    STAT_increment(getThreadId(), ThreadCommonCounters, lockPublicFreeList);
+    {
+        MallocMutex::scoped_lock scoped_cs(bin->mailLock);
+        block = bin->mailbox;
+        if( block ) {
+            MALLOC_ASSERT( block->owner == getThreadId(), ASSERT_TEXT );
+            MALLOC_ASSERT( !isNotForUse(block->nextPrivatizable), ASSERT_TEXT );
+            bin->mailbox = block->nextPrivatizable;
+            block->nextPrivatizable = (Block*) bin;
+        }
+    }
+    if( block ) {
+        MALLOC_ASSERT( isSolidPtr(block->publicFreeList), ASSERT_TEXT );
+        privatizePublicFreeList(block);
+    }
+    return block;
+}
+
+static Block *getPartialBlock(Bin* bin, unsigned int size)
+{
+    Block *result;
+    MALLOC_ASSERT( bin, ASSERT_TEXT );
+    unsigned int index = getIndex(size);
+    result = (Block *) globalSizeBins[index].pop();
+    if (result) {
+        MALLOC_ASSERT( result->mallocUniqueID==theMallocUniqueID, ASSERT_TEXT );
+        result->next = NULL;
+        result->previous = NULL;
+        MALLOC_ASSERT( result->publicFreeList!=NULL, ASSERT_TEXT );
+        /* There is not a race here since no other thread owns this block */
+        MALLOC_ASSERT( result->owner == 0, ASSERT_TEXT );
+        result->owner = getThreadId();
+        // It is safe to change nextPrivatizable, as publicFreeList is not null
+        MALLOC_ASSERT( isNotForUse(result->nextPrivatizable), ASSERT_TEXT );
+        result->nextPrivatizable = (Block*)bin;
+        // the next call is required to change publicFreeList to 0
+        privatizePublicFreeList(result);
+        if( result->allocatedCount ) {
+            // check its fullness and set result->isFull
+            emptyEnoughToUse(result);
+        } else {
+            restoreBumpPtr(result);
+        }
+        MALLOC_ASSERT( !isNotForUse(result->publicFreeList), ASSERT_TEXT );
+        STAT_increment(result->owner, index, allocBlockPublic);
+    }
+    return result;
+}
+
+static void returnPartialBlock(Bin* bin, Block *block)
+{
+    unsigned int index = getIndex(block->objectSize);
+    MALLOC_ASSERT( bin, ASSERT_TEXT );
+    MALLOC_ASSERT( block->owner==getThreadId(), ASSERT_TEXT );
+    STAT_increment(block->owner, index, freeBlockPublic);
+    // need to set publicFreeList to non-zero, so other threads
+    // will not change nextPrivatizable and it can be zeroed.
+    if ((intptr_t)block->nextPrivatizable==(intptr_t)bin) {
+        void* oldval;
+#if FREELIST_NONBLOCKING
+        oldval = (void*)AtomicCompareExchange((intptr_t&)block->publicFreeList, (intptr_t)UNUSABLE, 0);
+#else
+        STAT_increment(block->owner, ThreadCommonCounters, lockPublicFreeList);
+        {
+            MallocMutex::scoped_lock scoped_cs(publicFreeListLock);
+            if ( (oldval=block->publicFreeList)==NULL )
+                (uintptr_t&)(block->publicFreeList) = UNUSABLE;
+        }
+#endif
+        if ( oldval!=NULL ) {
+            // another thread freed an object; we need to wait until it finishes.
+            // I believe there is no need for exponential backoff, as the wait here is not for a lock;
+            // but need to yield, so the thread we wait has a chance to run.
+            int count = 256;
+            while( (intptr_t)const_cast<Block* volatile &>(block->nextPrivatizable)==(intptr_t)bin ) {
+                if (--count==0) {
+                    do_yield();
+                    count = 256;
+                }
+            }
+        }
+    } else {
+        MALLOC_ASSERT( isSolidPtr(block->publicFreeList), ASSERT_TEXT );
+    }
+    MALLOC_ASSERT( block->publicFreeList!=NULL, ASSERT_TEXT );
+    // now it is safe to change our data
+    block->previous = NULL;
+    block->owner = 0;
+    // it is caller responsibility to ensure that the list of blocks
+    // formed by nextPrivatizable pointers is kept consistent if required.
+    // if only called from thread shutdown code, it does not matter.
+    (uintptr_t&)(block->nextPrivatizable) = UNUSABLE;
+    globalSizeBins[index].push((void **)block);
+}
+
+static void cleanBlockHeader(Block *block)
+{
+#if MALLOC_DEBUG
+    memset (block, 0x0e5, blockSize);
+#endif
+    block->next = NULL;
+    block->previous = NULL;
+    block->freeList = NULL;
+    block->allocatedCount = 0;
+    block->isFull = 0;
+
+    block->publicFreeList = NULL;
+}
+
+static void initEmptyBlock(Block *block, size_t size)
+{
+    // Having getIndex and getObjectSize called next to each other
+    // allows better compiler optimization as they basically share the code.
+    unsigned int index      = getIndex(size);
+    unsigned int objectSize = getObjectSize(size); 
+    Bin* tls = (Bin*)getThreadMallocTLS();
+
+    cleanBlockHeader(block);
+    block->mallocUniqueID = theMallocUniqueID;
+    block->objectSize = objectSize;
+    block->owner = getThreadId();
+    // bump pointer should be prepared for first allocation - thus mode it down to objectSize
+    block->bumpPtr = (FreeObject *)((uintptr_t)block + blockSize - objectSize);
+
+    // each block should have the address where the head of the list of "privatizable" blocks is kept
+    // the only exception is a block for boot strap which is initialized when TLS is yet NULL
+    block->nextPrivatizable = tls? (Block*)(tls + index) : NULL;
+    TRACEF(( "[ScalableMalloc trace] Empty block %p is initialized, owner is %d, objectSize is %d, bumpPtr is %p\n",
+             block, block->owner, block->objectSize, block->bumpPtr ));
+  }
+
+/* Return an empty uninitialized block in a non-blocking fashion. */
+static Block *getRawBlock()
+{
+    Block *result;
+    Block *bigBlock;
+
+    result = NULL;
+
+    bigBlock = (Block *) freeBlockList.pop();
+
+    while (!bigBlock) {
+        /* We are out of blocks so go to the OS and get another one */
+        if (!mallocBigBlock()) {
+            return NULL;
+        }
+        bigBlock = (Block *) freeBlockList.pop();
+    }
+
+    // check alignment
+    MALLOC_ASSERT( isAligned( bigBlock, blockSize ), ASSERT_TEXT );
+    MALLOC_ASSERT( isAligned( bigBlock->bumpPtr, blockSize ), ASSERT_TEXT );
+    // block should be at least as big as blockSize; otherwise the previous block can be damaged.
+    MALLOC_ASSERT( (uintptr_t)bigBlock->bumpPtr >= (uintptr_t)bigBlock + blockSize, ASSERT_TEXT );
+    bigBlock->bumpPtr = (FreeObject *)((uintptr_t)bigBlock->bumpPtr - blockSize);
+    result = (Block *)bigBlock->bumpPtr;
+    if ( result!=bigBlock ) {
+        TRACEF(( "[ScalableMalloc trace] Pushing partial rest of block back on.\n" ));
+        freeBlockList.push((void **)bigBlock);
+    }
+    return result;
+}
+
+/* Return an empty uninitialized block in a non-blocking fashion. */
+static Block *getEmptyBlock(size_t size)
+{
+    Block *result = getRawBlock();
+
+    if (result) {
+        initEmptyBlock(result, size);
+        STAT_increment(result->owner, getIndex(result->objectSize), allocBlockNew);
+    }
+
+    return result;
+}
+
+/* We have a block give it back to the malloc block manager */
+static void returnEmptyBlock (Block *block, bool keepTheBin = true)
+{
+    // it is caller's responsibility to ensure no data is lost before calling this
+    MALLOC_ASSERT( block->allocatedCount==0, ASSERT_TEXT );
+    MALLOC_ASSERT( block->publicFreeList==NULL, ASSERT_TEXT );
+    if (keepTheBin) {
+        /* We should keep the TLS bin structure */
+        MALLOC_ASSERT( block->next == NULL, ASSERT_TEXT );
+        MALLOC_ASSERT( block->previous == NULL, ASSERT_TEXT );
+    }
+    STAT_increment(block->owner, getIndex(block->objectSize), freeBlockBack);
+
+    cleanBlockHeader(block);
+
+    block->nextPrivatizable = NULL;
+
+    block->mallocUniqueID=0;
+    block->objectSize = 0;
+    block->owner = (unsigned)-1;
+    // for an empty block, bump pointer should point right after the end of the block
+    block->bumpPtr = (FreeObject *)((uintptr_t)block + blockSize);
+    freeBlockList.push((void **)block);
+}
+
+inline static Block* getActiveBlock( Bin* bin )
+{
+    MALLOC_ASSERT( bin, ASSERT_TEXT );
+    return bin->activeBlk;
+}
+
+inline static void setActiveBlock (Bin* bin, Block *block)
+{
+    MALLOC_ASSERT( bin, ASSERT_TEXT );
+    MALLOC_ASSERT( block->owner == getThreadId(), ASSERT_TEXT );
+    // it is the caller responsibility to keep bin consistence (i.e. ensure this block is in the bin list)
+    bin->activeBlk = block;
+}
+
+inline static Block* setPreviousBlockActive( Bin* bin )
+{
+    MALLOC_ASSERT( bin && bin->activeBlk, ASSERT_TEXT );
+    Block* temp = bin->activeBlk->previous;
+    if( temp ) {
+        MALLOC_ASSERT( temp->isFull == 0, ASSERT_TEXT );
+        bin->activeBlk = temp;
+    }
+    return temp;
+}
+
+#if MALLOC_CHECK_RECURSION
+
+/*
+ * It's a special kind of allocation that can be used when malloc is 
+ * not available (either during startup or when malloc was already called and
+ * we are, say, inside pthread_setspecific's call). 
+ * Block can contain objects of different sizes, 
+ * allocations are performed by moving bump pointer and increasing of object counter, 
+ * releasing is done via counter of objects allocated in the block 
+ * or moving bump pointer if releasing object is on a bound.
+ */
+
+struct StartupBlock : public Block {
+    size_t availableSize() {
+        return blockSize - ((uintptr_t)bumpPtr - (uintptr_t)this);
+    }
+};
+
+static MallocMutex startupMallocLock;
+static StartupBlock *firstStartupBlock;
+
+static StartupBlock *getNewStartupBlock()
+{
+    StartupBlock *block = (StartupBlock *)getRawBlock();
+
+    if (!block) return NULL;
+
+    cleanBlockHeader(block);
+    block->mallocUniqueID = theMallocUniqueID;
+    // use startupAllocObjSizeMark to mark objects from startup block marker
+    block->objectSize = startupAllocObjSizeMark;
+    block->bumpPtr = (FreeObject *)((uintptr_t)block + sizeof(StartupBlock));
+    return block;
+}
+
+/* TODO: Function is called when malloc nested call is detected, so simultaneous
+   usage from different threads are unprobable, so block pre-allocation 
+   can be not useful, and the code might be simplified. */
+static FreeObject *startupAlloc(size_t size)
+{
+    FreeObject *result;
+    StartupBlock *newBlock = NULL;
+    bool newBlockUnused = false;
+
+    /* Objects must be aligned on their natural bounds, 
+       and objects bigger than word on word's bound. */
+    size = alignUp(size, sizeof(size_t));
+    // We need size of an object to implement msize.
+    size_t reqSize = size + sizeof(size_t);
+    // speculatively allocates newBlock to later use or return it as unused
+    if (!firstStartupBlock || firstStartupBlock->availableSize() < reqSize)
+        if (!(newBlock = getNewStartupBlock()))
+            return NULL;
+
+    {
+        MallocMutex::scoped_lock scoped_cs(startupMallocLock);
+    
+        if (!firstStartupBlock || firstStartupBlock->availableSize() < reqSize) {
+            if (!newBlock && !(newBlock = getNewStartupBlock()))
+                return NULL;
+            newBlock->next = (Block*)firstStartupBlock;
+            if (firstStartupBlock)
+                firstStartupBlock->previous = (Block*)newBlock;
+            firstStartupBlock = newBlock;
+        } else
+            newBlockUnused = true;
+        result = firstStartupBlock->bumpPtr;
+        firstStartupBlock->allocatedCount++;
+        firstStartupBlock->bumpPtr = 
+            (FreeObject *)((uintptr_t)firstStartupBlock->bumpPtr + reqSize);
+    }
+    if (newBlock && newBlockUnused)
+        returnEmptyBlock(newBlock);
+
+    // keep object size at the negative offset
+    *((size_t*)result) = size;
+    return (FreeObject*)((size_t*)result+1);
+}
+
+static size_t startupMsize(void *ptr) { return *((size_t*)ptr - 1); }
+
+static void startupFree(StartupBlock *block, void *ptr)
+{
+    Block* blockToRelease = NULL;
+    {
+        MallocMutex::scoped_lock scoped_cs(startupMallocLock);
+    
+        MALLOC_ASSERT(firstStartupBlock, ASSERT_TEXT);
+        MALLOC_ASSERT(startupAllocObjSizeMark==block->objectSize 
+                      && block->allocatedCount>0, ASSERT_TEXT);
+        MALLOC_ASSERT((uintptr_t)ptr>=(uintptr_t)block+sizeof(StartupBlock)
+                      && (uintptr_t)ptr+startupMsize(ptr)<=(uintptr_t)block+blockSize, 
+                      ASSERT_TEXT);
+        if (0 == --block->allocatedCount) {
+            if (block == firstStartupBlock)
+                firstStartupBlock = (StartupBlock*)firstStartupBlock->next;
+            if (block->previous)
+                block->previous->next = block->next;
+            if (block->next)
+                block->next->previous = block->previous;
+            blockToRelease = block;
+        } else if ((uintptr_t)ptr + startupMsize(ptr) == (uintptr_t)block->bumpPtr) {
+            // last object in the block released
+            FreeObject *newBump = (FreeObject*)((size_t*)ptr - 1);
+            MALLOC_ASSERT((uintptr_t)newBump>(uintptr_t)block+sizeof(StartupBlock), 
+                          ASSERT_TEXT);
+            block->bumpPtr = newBump;
+        }
+    }
+    if (blockToRelease) {
+        blockToRelease->previous = blockToRelease->next = NULL;
+        returnEmptyBlock(blockToRelease);
+    }
+}
+
+#endif /* MALLOC_CHECK_RECURSION */
+
+/********* End thread related code  *************/
+
+/********* Library initialization *************/
+
+//! Value indicating the state of initialization.
+/* 0 = initialization not started.
+ * 1 = initialization started but not finished.
+ * 2 = initialization finished.
+ * In theory, we only need values 0 and 2. But value 1 is nonetheless
+ * useful for detecting errors in the double-check pattern.
+ */
+static int mallocInitialized;   // implicitly initialized to 0
+static MallocMutex initAndShutMutex;
+
+inline bool isMallocInitialized() { return 2 == mallocInitialized; }
+
+/*
+ * Allocator initialization routine;
+ * it is called lazily on the very first scalable_malloc call.
+ */
+static void initMemoryManager()
+{
+    TRACEF(( "[ScalableMalloc trace] sizeof(Block) is %d (expected 128); sizeof(uintptr_t) is %d\n",
+             sizeof(Block), sizeof(uintptr_t) ));
+    MALLOC_ASSERT( 2*blockHeaderAlignment == sizeof(Block), ASSERT_TEXT );
+
+// Create keys for thread-local storage and for thread id
+// TODO: add error handling, and on error do something better than exit(1)
+#if USE_WINTHREAD
+    TLS_pointer_key = TlsAlloc();
+    Tid_key = TlsAlloc();
+#else
+    int status1 = pthread_key_create( &TLS_pointer_key, mallocThreadShutdownNotification );
+    int status2 = pthread_key_create( &Tid_key, NULL );
+    if ( status1 || status2 ) {
+        fprintf (stderr, "The memory manager cannot create tls key during initialization; exiting \n");
+        exit(1);
+    }
+#endif /* USE_WINTHREAD */
+#if COLLECT_STATISTICS
+    initStatisticsCollection();
+#endif
+
+    TRACEF(( "[ScalableMalloc trace] Asking for a mallocBigBlock\n" ));
+    if (!mallocBigBlock()) {
+        fprintf (stderr, "The memory manager cannot access sufficient memory to initialize; exiting \n");
+        exit(1);
+    }
+}
+
+//! Ensures that initMemoryManager() is called once and only once.
+/** Does not return until initMemoryManager() has been completed by a thread.
+    There is no need to call this routine if mallocInitialized==2 . */
+static void checkInitialization()
+{
+    if (mallocInitialized==2) return;
+    MallocMutex::scoped_lock lock( initAndShutMutex );
+    if (mallocInitialized!=2) {
+        MALLOC_ASSERT( mallocInitialized==0, ASSERT_TEXT );
+        mallocInitialized = 1;
+        RecursiveMallocCallProtector scoped;
+        initMemoryManager();
+#ifdef  MALLOC_EXTRA_INITIALIZATION
+        MALLOC_EXTRA_INITIALIZATION;
+#endif
+#if MALLOC_CHECK_RECURSION
+        RecursiveMallocCallProtector::detectNaiveOverload();
+#endif
+        MALLOC_ASSERT( mallocInitialized==1, ASSERT_TEXT );
+        mallocInitialized = 2;
+    }
+    MALLOC_ASSERT( mallocInitialized==2, ASSERT_TEXT ); /* It can't be 0 or I would have initialized it */
+}
+
+/********* End library initialization *************/
+
+/********* The malloc show begins     *************/
+
+
+/********* Allocation of large objects ************/
+
+/*
+ * The program wants a large object that we are not prepared to deal with.
+ * so we pass the problem on to the OS. Large Objects are the only objects in
+ * the system that begin on a 16K byte boundary since the blocks used for smaller
+ * objects have the Block structure at each 16K boundary.
+ *
+ */
+
+struct LargeObjectHeader {
+    void        *unalignedResult;   /* The base of the memory returned from getMemory, this is what is used to return this to the OS */
+    size_t       unalignedSize;     /* The size that was requested from getMemory */
+    uint64_t     mallocUniqueID;    /* The field to check whether the memory was allocated by scalable_malloc */
+    size_t       objectSize;        /* The size originally requested by a client */
+    bool         fromMapMemory;     /* Memory allocated when MapMemory usage is forced */
+};
+
+void CachedObjectsList::push(void *buf, bool fromMapMemory, uintptr_t currAge)
+{   
+    CachedObject *ptr = (CachedObject*)buf;
+    ptr->prev = NULL;
+    ptr->age  = currAge;
+    ptr->fromMapMemory = fromMapMemory;
+
+    MallocMutex::scoped_lock scoped_cs(lock);
+    ptr->next = first;
+    first = ptr;
+    if (ptr->next) ptr->next->prev = ptr;
+    if (!last) {
+        MALLOC_ASSERT(0 == oldest, ASSERT_TEXT);
+        oldest = currAge;
+        last = ptr;
+    }
+}
+
+CachedObject *CachedObjectsList::pop(uintptr_t currAge)
+{   
+    CachedObject *result=NULL;
+    {
+        MallocMutex::scoped_lock scoped_cs(lock);
+        if (first) {
+            result = first;
+            first = result->next;
+            if (first)  
+                first->prev = NULL;
+            else {
+                last = NULL;
+                oldest = 0;
+            }
+        } else {
+            /* If cache miss occured, set ageThreshold to twice the difference 
+               between current time and last time cache was cleaned. */
+            ageThreshold = 2*(currAge - lastCleanedAge);
+        }
+    }
+    return result;
+}
+
+void CachedObjectsList::releaseLastIfOld(uintptr_t currAge, size_t size)
+{
+    CachedObject *toRelease = NULL;
+ 
+    /* oldest may be more recent then age, that's why cast to signed type
+       was used. age overflow is also processed correctly. */
+    if (last && (intptr_t)(currAge - oldest) > ageThreshold) {
+        MallocMutex::scoped_lock scoped_cs(lock);
+        // double check
+        if (last && (intptr_t)(currAge - last->age) > ageThreshold) {
+            do {
+                last = last->prev;
+            } while (last && (intptr_t)(currAge - last->age) > ageThreshold);
+            if (last) {
+                toRelease = last->next;
+                oldest = last->age;
+                last->next = NULL;
+            } else {
+                toRelease = first;
+                first = NULL;
+                oldest = 0;
+            }
+            MALLOC_ASSERT( toRelease, ASSERT_TEXT );
+            lastCleanedAge = toRelease->age;
+        } 
+        else 
+            return;
+    }
+    while ( toRelease ) {
+        CachedObject *helper = toRelease->next;
+        freeRawMemory(toRelease, size, toRelease->fromMapMemory);
+        toRelease = helper;
+    }
+}
+
+/* A predicate checks whether an object starts on blockSize boundary */
+static inline unsigned int isLargeObject(void *object)
+{
+    return isAligned(object, blockSize);
+}
+
+static uintptr_t cleanupCacheIfNeed ()
+{
+    /* loCacheStat.age overflow is OK, as we only want difference between 
+     * its current value and some recent.
+     *
+     * Both malloc and free should increment loCacheStat.age, as in 
+     * a different case mulitiple cache object would have same age,
+     * and accuracy of predictors suffers.
+     */
+    uintptr_t currAge = (uintptr_t)AtomicIncrement((intptr_t&)loCacheStat.age);
+
+    if ( 0 == currAge % cacheCleanupFreq ) {
+        size_t objSize;
+        int i;
+
+        for (i = numLargeObjectBins-1, 
+             objSize = (numLargeObjectBins-1)*largeObjectCacheStep+blockSize; 
+             i >= 0; 
+             i--, objSize-=largeObjectCacheStep) {
+            /* cached object size on iteration is
+             * i*largeObjectCacheStep+blockSize, it seems iterative
+             * computation of it improves performance.
+             */
+            // release from cache objects that are older then ageThreshold
+            globalCachedObjectBins[i].releaseLastIfOld(currAge, objSize);
+        }
+    }
+    return currAge;
+}
+
+static CachedObject* allocateCachedLargeObject (size_t size)
+{
+    MALLOC_ASSERT( size%largeObjectCacheStep==0, ASSERT_TEXT );
+    CachedObject *block = NULL;
+    // blockSize is the minimal alignment and thus the minimal size of a large object.
+    size_t idx = (size-blockSize)/largeObjectCacheStep;
+    if (idx<numLargeObjectBins) {
+        uintptr_t currAge = cleanupCacheIfNeed();
+        block = globalCachedObjectBins[idx].pop(currAge);
+        if (block) {
+            STAT_increment(getThreadId(), ThreadCommonCounters, allocCachedLargeObj);
+        }
+    }
+    return block;
+}
+
+static inline void* mallocLargeObject (size_t size, size_t alignment, 
+                                       bool startupAlloc = false)
+{
+    void * unalignedArea;
+    size_t allocationSize = alignUp(size+sizeof(LargeObjectHeader)+alignment, 
+                                    largeObjectCacheStep);
+    bool   blockFromMapMemory = false;
+
+    if (startupAlloc) {
+        if (! (unalignedArea = getRawMemory(allocationSize, /*alwaysUseMap=*/true)))
+            return NULL;
+    } else {
+        CachedObject* cachedObj = allocateCachedLargeObject(allocationSize);
+        if (cachedObj) {
+            blockFromMapMemory = cachedObj->fromMapMemory;
+            unalignedArea = cachedObj;
+        } else {
+            unalignedArea = getRawMemory(allocationSize);
+            if (!unalignedArea)
+                return NULL;
+            STAT_increment(getThreadId(), ThreadCommonCounters, allocNewLargeObj);
+        }
+    }
+    void *alignedArea = (void*)alignUp((uintptr_t)unalignedArea+sizeof(LargeObjectHeader), alignment);
+    LargeObjectHeader *header = (LargeObjectHeader*)((uintptr_t)alignedArea-sizeof(LargeObjectHeader));
+    header->unalignedResult = unalignedArea;
+    header->mallocUniqueID=theMallocUniqueID;
+    header->unalignedSize = allocationSize;
+    header->objectSize = size;
+    header->fromMapMemory = startupAlloc || blockFromMapMemory;
+    MALLOC_ASSERT( isLargeObject(alignedArea), ASSERT_TEXT );
+    return alignedArea;
+}
+
+static bool freeLargeObjectToCache (LargeObjectHeader* header)
+{
+    size_t size = header->unalignedSize;
+    size_t idx = (size-blockSize)/largeObjectCacheStep;
+    if (idx<numLargeObjectBins) {
+        MALLOC_ASSERT( size%largeObjectCacheStep==0, ASSERT_TEXT );
+        uintptr_t currAge = cleanupCacheIfNeed ();
+        globalCachedObjectBins[idx].push(header->unalignedResult, 
+                                         header->fromMapMemory, currAge);
+
+        STAT_increment(getThreadId(), ThreadCommonCounters, cacheLargeObj);
+        return true;
+    }
+    return false;
+}
+
+static inline void freeLargeObject (void *object)
+{
+    LargeObjectHeader *header;
+    header = (LargeObjectHeader *)((uintptr_t)object - sizeof(LargeObjectHeader));
+    header->mallocUniqueID = 0;
+    if (!freeLargeObjectToCache(header)) {
+        freeRawMemory(header->unalignedResult, header->unalignedSize, 
+                      /*alwaysUseMap=*/ header->fromMapMemory);
+        STAT_increment(getThreadId(), ThreadCommonCounters, freeLargeObj);
+    }
+}
+
+/*********** End allocation of large objects **********/
+
+
+static FreeObject *allocateFromFreeList(Block *mallocBlock)
+{
+    FreeObject *result;
+
+    if (!mallocBlock->freeList) {
+        return NULL;
+    }
+
+    result = mallocBlock->freeList;
+    MALLOC_ASSERT( result, ASSERT_TEXT );
+
+    mallocBlock->freeList = result->next;
+    MALLOC_ASSERT( mallocBlock->allocatedCount < (blockSize-sizeof(Block))/mallocBlock->objectSize, ASSERT_TEXT );
+    mallocBlock->allocatedCount++;
+    STAT_increment(mallocBlock->owner, getIndex(mallocBlock->objectSize), allocFreeListUsed);
+
+    return result;
+}
+
+static FreeObject *allocateFromBumpPtr(Block *mallocBlock)
+{
+    FreeObject *result = mallocBlock->bumpPtr;
+    if (result) {
+        mallocBlock->bumpPtr =
+            (FreeObject *) ((uintptr_t) mallocBlock->bumpPtr - mallocBlock->objectSize);
+        if ( (uintptr_t)mallocBlock->bumpPtr < (uintptr_t)mallocBlock+sizeof(Block) ) {
+            mallocBlock->bumpPtr = NULL;
+        }
+        MALLOC_ASSERT( mallocBlock->allocatedCount < (blockSize-sizeof(Block))/mallocBlock->objectSize, ASSERT_TEXT );
+        mallocBlock->allocatedCount++;
+        STAT_increment(mallocBlock->owner, getIndex(mallocBlock->objectSize), allocBumpPtrUsed);
+    }
+    return result;
+}
+
+inline static FreeObject* allocateFromBlock( Block *mallocBlock )
+{
+    FreeObject *result;
+
+    MALLOC_ASSERT( mallocBlock->owner == getThreadId(), ASSERT_TEXT );
+
+    /* for better cache locality, first looking in the free list. */
+    if ( (result = allocateFromFreeList(mallocBlock)) ) {
+        return result;
+    }
+    MALLOC_ASSERT( !mallocBlock->freeList, ASSERT_TEXT );
+
+    /* if free list is empty, try thread local bump pointer allocation. */
+    if ( (result = allocateFromBumpPtr(mallocBlock)) ) {
+        return result;
+    }
+    MALLOC_ASSERT( !mallocBlock->bumpPtr, ASSERT_TEXT );
+
+    /* the block is considered full. */
+    mallocBlock->isFull = 1;
+    return NULL;
+}
+
+static void moveBlockToBinFront(Block *block)
+{
+    Bin* bin = getAllocationBin(block->objectSize);
+    /* move the block to the front of the bin */
+    outofTLSBin(bin, block);
+    pushTLSBin(bin, block);
+}
+
+static void processLessUsedBlock(Block *block)
+{
+    Bin* bin = getAllocationBin(block->objectSize);
+    if (block != getActiveBlock(bin) && block != getActiveBlock(bin)->previous ) {
+        /* We are not actively using this block; return it to the general block pool */
+        outofTLSBin(bin, block);
+        returnEmptyBlock(block);
+    } else {
+        /* all objects are free - let's restore the bump pointer */
+        restoreBumpPtr(block);
+    }
+}
+
+/*
+ * All aligned allocations fall into one of the following categories:
+ *  1. if both request size and alignment are <= maxSegregatedObjectSize,
+ *       we just align the size up, and request this amount, because for every size
+ *       aligned to some power of 2, the allocated object is at least that aligned.
+ * 2. for bigger size, check if already guaranteed fittingAlignment is enough.
+ * 3. if size+alignment<minLargeObjectSize, we take an object of fittingSizeN and align
+ *       its address up; given such pointer, scalable_free could find the real object.
+ * 4. otherwise, aligned large object is allocated.
+ */
+static void *allocateAligned(size_t size, size_t alignment)
+{
+    MALLOC_ASSERT( isPowerOfTwo(alignment), ASSERT_TEXT );
+
+    void *result;
+    if (size<=maxSegregatedObjectSize && alignment<=maxSegregatedObjectSize)
+        result = scalable_malloc(alignUp(size? size: sizeof(size_t), alignment));
+    else if (alignment<=fittingAlignment)
+        result = scalable_malloc(size);
+    else if (size+alignment < minLargeObjectSize) {
+        void *unaligned = scalable_malloc(size+alignment);
+        if (!unaligned) return NULL;
+        result = alignUp(unaligned, alignment);
+    } else {
+        /* This can be the first allocation call. */
+        checkInitialization();
+        /* To correctly detect kind of allocation in scalable_free we need 
+           to distinguish memory allocated as large object.
+           This is done via alignment that is greater than can be found in Block.
+        */ 
+        result = mallocLargeObject(size, blockSize>alignment? blockSize: alignment);
+    }
+
+    MALLOC_ASSERT( isAligned(result, alignment), ASSERT_TEXT );
+    return result;
+}
+
+static void *reallocAligned(void *ptr, size_t size, size_t alignment = 0)
+{
+    void *result;
+    size_t copySize;
+
+    if (isLargeObject(ptr)) {
+        LargeObjectHeader* loh = (LargeObjectHeader *)((uintptr_t)ptr - sizeof(LargeObjectHeader));
+        MALLOC_ASSERT( loh->mallocUniqueID==theMallocUniqueID, ASSERT_TEXT );
+        copySize = loh->unalignedSize-((uintptr_t)ptr-(uintptr_t)loh->unalignedResult);
+        if (size <= copySize && (0==alignment || isAligned(ptr, alignment))) {
+            loh->objectSize = size;
+            return ptr;
+        } else {
+            copySize = loh->objectSize;
+            result = alignment ? allocateAligned(size, alignment) : scalable_malloc(size);
+        }
+    } else {
+        Block* block = (Block *)alignDown(ptr, blockSize);
+        MALLOC_ASSERT( block->mallocUniqueID==theMallocUniqueID, ASSERT_TEXT );
+        copySize = block->objectSize;
+        if (size <= copySize && (0==alignment || isAligned(ptr, alignment))) {
+            return ptr;
+        } else {
+            result = alignment ? allocateAligned(size, alignment) : scalable_malloc(size);
+        }
+    }
+    if (result) {
+        memcpy(result, ptr, copySize<size? copySize: size);
+        scalable_free(ptr);
+    }
+    return result;
+}
+
+/* A predicate checks if an object is properly placed inside its block */
+static inline bool isProperlyPlaced(const void *object, const Block *block)
+{
+    return 0 == ((uintptr_t)block + blockSize - (uintptr_t)object) % block->objectSize;
+}
+
+/* Finds the real object inside the block */
+static inline FreeObject *findAllocatedObject(const void *address, const Block *block)
+{
+    // calculate offset from the end of the block space
+    uintptr_t offset = (uintptr_t)block + blockSize - (uintptr_t)address;
+    MALLOC_ASSERT( offset<blockSize-sizeof(Block), ASSERT_TEXT );
+    // find offset difference from a multiple of allocation size
+    offset %= block->objectSize;
+    // and move the address down to where the real object starts.
+    return (FreeObject*)((uintptr_t)address - (offset? block->objectSize-offset: 0));
+}
+
+} // namespace internal
+} // namespace rml
+
+using namespace rml::internal;
+
+/*
+ * When a thread is shutting down this routine should be called to remove all the thread ids
+ * from the malloc blocks and replace them with a NULL thread id.
+ *
+ */
+#if MALLOC_TRACE
+static unsigned int threadGoingDownCount = 0;
+#endif
+
+/*
+ * for pthreads, the function is set as a callback in pthread_key_create for TLS bin.
+ * it will be automatically called at thread exit with the key value as the argument.
+ *
+ * for Windows, it should be called directly e.g. from DllMain; the argument can be NULL
+ * one should include "TypeDefinitions.h" for the declaration of this function.
+*/
+extern "C" void mallocThreadShutdownNotification(void* arg)
+{
+    Bin   *tls;
+    Block *threadBlock;
+    Block *threadlessBlock;
+    unsigned int index;
+
+    {
+        MallocMutex::scoped_lock lock( initAndShutMutex );
+        if ( mallocInitialized == 0 ) return;
+    }
+
+    TRACEF(( "[ScalableMalloc trace] Thread id %d blocks return start %d\n",
+             getThreadId(),  threadGoingDownCount++ ));
+#ifdef USE_WINTHREAD
+    tls = (Bin*)getThreadMallocTLS();
+#else
+    tls = (Bin*)arg;
+#endif
+    if (tls) {
+        for (index = 0; index < numBlockBins; index++) {
+            if (tls[index].activeBlk==NULL)
+                continue;
+            threadlessBlock = tls[index].activeBlk->previous;
+            while (threadlessBlock) {
+                threadBlock = threadlessBlock->previous;
+                if (threadlessBlock->allocatedCount==0 && threadlessBlock->publicFreeList==NULL) {
+                    /* we destroy the thread, no need to keep its TLS bin -> the second param is false */
+                    returnEmptyBlock(threadlessBlock, false);
+                } else {
+                    returnPartialBlock(tls+index, threadlessBlock);
+                }
+                threadlessBlock = threadBlock;
+            }
+            threadlessBlock = tls[index].activeBlk;
+            while (threadlessBlock) {
+                threadBlock = threadlessBlock->next;
+                if (threadlessBlock->allocatedCount==0 && threadlessBlock->publicFreeList==NULL) {
+                    /* we destroy the thread, no need to keep its TLS bin -> the second param is false */
+                    returnEmptyBlock(threadlessBlock, false);
+                } else {
+                    returnPartialBlock(tls+index, threadlessBlock);
+                }
+                threadlessBlock = threadBlock;
+            }
+            tls[index].activeBlk = 0;
+        }
+        bootStrapFree((void*)tls);
+        setThreadMallocTLS(NULL);
+    }
+
+    TRACEF(( "[ScalableMalloc trace] Thread id %d blocks return end\n", getThreadId() ));
+}
+
+extern "C" void mallocProcessShutdownNotification(void)
+{
+    // for now, this function is only necessary for dumping statistics
+#if COLLECT_STATISTICS
+    ThreadId nThreads = ThreadIdCount;
+    for( int i=1; i<=nThreads && i<MAX_THREADS; ++i )
+        STAT_print(i);
+#endif
+}
+
+/**** Check if an object was allocated by scalable_malloc ****/
+
+/* 
+ * Bad dereference caused by a foreign pointer is possible only here, not earlier in call chain.
+ * Separate function isolates SEH code, as it has bad influence on compiler optimization.
+ */
+static inline uint64_t safer_dereference (uint64_t *ptr)
+{
+    uint64_t id;
+#if _MSC_VER
+    __try {
+#endif
+        id = *ptr;
+#if _MSC_VER
+    } __except( GetExceptionCode() == EXCEPTION_ACCESS_VIOLATION? 
+                EXCEPTION_EXECUTE_HANDLER : EXCEPTION_CONTINUE_SEARCH ) {
+        id = 0;
+    }
+#endif
+    return id;
+}
+
+static inline bool isRecognized (void* ptr)
+{
+    uint64_t id = safer_dereference(
+        isLargeObject(ptr)
+            ? &((LargeObjectHeader*)((uintptr_t)ptr-sizeof(LargeObjectHeader)))->mallocUniqueID
+            : &((Block *)alignDown(ptr, blockSize))->mallocUniqueID
+    );
+    return id == theMallocUniqueID;
+}
+
+/********* The malloc code          *************/
+
+extern "C" void * scalable_malloc(size_t size)
+{
+    Bin* bin;
+    Block * mallocBlock;
+    FreeObject *result;
+
+    if (!size) size = sizeof(size_t);
+
+#if MALLOC_CHECK_RECURSION
+    if (RecursiveMallocCallProtector::sameThreadActive()) {
+        result = size<minLargeObjectSize? startupAlloc(size) : 
+              (FreeObject*)mallocLargeObject(size, blockSize, /*startupAlloc=*/ true);
+        if (!result) errno = ENOMEM;
+        return result;
+    }
+#endif
+
+    /* This returns only after malloc is initialized */
+    checkInitialization();
+
+    /*
+     * Use Large Object Allocation
+     */
+    if (size >= minLargeObjectSize) {
+        result = (FreeObject*)mallocLargeObject(size, blockSize);
+        if (!result) errno = ENOMEM;
+        return result;
+    }
+
+    /*
+     * Get an element in thread-local array corresponding to the given size;
+     * It keeps ptr to the active block for allocations of this size
+     */
+    bin = getAllocationBin(size);
+    if ( !bin ) {
+        errno = ENOMEM;
+        return NULL;
+    }
+
+    /* Get the block of you want to try to allocate in. */
+    mallocBlock = getActiveBlock(bin);
+
+    if (mallocBlock) {
+        do {
+            if( (result = allocateFromBlock(mallocBlock)) ) {
+                return result;
+            }
+            // the previous block, if any, should be empty enough
+        } while( (mallocBlock = setPreviousBlockActive(bin)) );
+    }
+    MALLOC_ASSERT( !(bin->activeBlk) || bin->activeBlk->isFull==1, ASSERT_TEXT );
+
+    /*
+     * else privatize publicly freed objects in some block and allocate from it
+     */
+    mallocBlock = getPublicFreeListBlock( bin );
+    if (mallocBlock) {
+        if (emptyEnoughToUse(mallocBlock)) {
+            /* move the block to the front of the bin */
+            outofTLSBin(bin, mallocBlock);
+            pushTLSBin(bin, mallocBlock);
+        }
+        MALLOC_ASSERT( mallocBlock->freeList, ASSERT_TEXT );
+        if ( (result = allocateFromFreeList(mallocBlock)) ) {
+            return result;
+        }
+        /* Else something strange happened, need to retry from the beginning; */
+        TRACEF(( "[ScalableMalloc trace] Something is wrong: no objects in public free list; reentering.\n" ));
+        return scalable_malloc(size);
+    }
+
+    /*
+     * no suitable own blocks, try to get a partial block that some other thread has discarded.
+     */
+    mallocBlock = getPartialBlock(bin, size);
+    while (mallocBlock) {
+        pushTLSBin(bin, mallocBlock);
+// guaranteed by pushTLSBin: MALLOC_ASSERT( *bin==mallocBlock || (*bin)->previous==mallocBlock, ASSERT_TEXT );
+        setActiveBlock(bin, mallocBlock);
+        if( (result = allocateFromBlock(mallocBlock)) ) {
+            return result;
+        }
+        mallocBlock = getPartialBlock(bin, size);
+    }
+
+    /*
+     * else try to get a new empty block
+     */
+    mallocBlock = getEmptyBlock(size);
+    if (mallocBlock) {
+        pushTLSBin(bin, mallocBlock);
+// guaranteed by pushTLSBin: MALLOC_ASSERT( *bin==mallocBlock || (*bin)->previous==mallocBlock, ASSERT_TEXT );
+        setActiveBlock(bin, mallocBlock);
+        if( (result = allocateFromBlock(mallocBlock)) ) {
+            return result;
+        }
+        /* Else something strange happened, need to retry from the beginning; */
+        TRACEF(( "[ScalableMalloc trace] Something is wrong: no objects in empty block; reentering.\n" ));
+        return scalable_malloc(size);
+    }
+    /*
+     * else nothing works so return NULL
+     */
+    TRACEF(( "[ScalableMalloc trace] No memory found, returning NULL.\n" ));
+    errno = ENOMEM;
+    return NULL;
+}
+
+/********* End the malloc code      *************/
+
+/********* The free code            *************/
+
+extern "C" void scalable_free (void *object) {
+    Block *block;
+    ThreadId myTid;
+    FreeObject *objectToFree;
+
+    if (!object) {
+        return;
+    }
+
+    if (isLargeObject(object)) {
+        freeLargeObject(object);
+        return;
+    } 
+
+    block = (Block *)alignDown(object, blockSize);/* mask low bits to get the block */
+    MALLOC_ASSERT( block->mallocUniqueID == theMallocUniqueID, ASSERT_TEXT );
+    MALLOC_ASSERT( block->allocatedCount, ASSERT_TEXT );
+
+#if MALLOC_CHECK_RECURSION
+    if (block->objectSize == startupAllocObjSizeMark) {
+        startupFree((StartupBlock *)block, object);
+        return;
+    }
+#endif
+
+    myTid = getThreadId();
+
+    // Due to aligned allocations, a pointer passed to scalable_free
+    // might differ from the address of internally allocated object.
+    // Small objects however should always be fine.    
+    if (block->objectSize <= maxSegregatedObjectSize)
+        objectToFree = (FreeObject*)object;
+    // "Fitting size" allocations are suspicious if aligned higher than naturally
+    else {
+        if ( ! isAligned(object,2*fittingAlignment) )
+        // TODO: the above check is questionable - it gives false negatives in ~50% cases,
+        //       so might even be slower in average than unconditional use of findAllocatedObject.
+        // here it should be a "real" object
+            objectToFree = (FreeObject*)object;
+        else
+        // here object can be an aligned address, so applying additional checks
+            objectToFree = findAllocatedObject(object, block);
+        MALLOC_ASSERT( isAligned(objectToFree,fittingAlignment), ASSERT_TEXT );
+    }
+    MALLOC_ASSERT( isProperlyPlaced(objectToFree, block), ASSERT_TEXT );
+
+    if (myTid == block->owner) {
+        objectToFree->next = block->freeList;
+        block->freeList = objectToFree;
+        block->allocatedCount--;
+        MALLOC_ASSERT( block->allocatedCount < (blockSize-sizeof(Block))/block->objectSize, ASSERT_TEXT );
+#if COLLECT_STATISTICS
+        if (getActiveBlock(getAllocationBin(block->objectSize)) != block)
+            STAT_increment(myTid, getIndex(block->objectSize), freeToInactiveBlock);
+        else
+            STAT_increment(myTid, getIndex(block->objectSize), freeToActiveBlock);
+#endif
+        if (block->isFull) {
+            if (emptyEnoughToUse(block))
+                moveBlockToBinFront(block);
+        } else {
+            if (block->allocatedCount==0 && block->publicFreeList==NULL)
+                processLessUsedBlock(block);
+        }
+    } else { /* Slower path to add to the shared list, the allocatedCount is updated by the owner thread in malloc. */
+        freePublicObject (block, objectToFree);
+    }
+}
+
+/*
+ * A variant that provides additional memory safety, by checking whether the given address
+ * was obtained with this allocator, and if not redirecting to the provided alternative call.
+ */
+extern "C" void safer_scalable_free (void *object, void (*original_free)(void*)) 
+{
+    if (!object)
+        return;
+
+    if (isRecognized(object))
+        scalable_free(object);
+    else if (original_free)
+        original_free(object);
+}
+
+/********* End the free code        *************/
+
+/********* Code for scalable_realloc       ***********/
+
+/*
+ * From K&R
+ * "realloc changes the size of the object pointed to by p to size. The contents will
+ * be unchanged up to the minimum of the old and the new sizes. If the new size is larger,
+ * the new space is uninitialized. realloc returns a pointer to the new space, or
+ * NULL if the request cannot be satisfied, in which case *p is unchanged."
+ *
+ */
+extern "C" void* scalable_realloc(void* ptr, size_t size)
+{
+    /* corner cases left out of reallocAligned to not deal with errno there */
+    if (!ptr) {
+        return scalable_malloc(size);
+    }
+    if (!size) {
+        scalable_free(ptr);
+        return NULL;
+    }
+    void* tmp = reallocAligned(ptr, size, 0);
+    if (!tmp) errno = ENOMEM;
+    return tmp;
+}
+
+/*
+ * A variant that provides additional memory safety, by checking whether the given address
+ * was obtained with this allocator, and if not redirecting to the provided alternative call.
+ */
+extern "C" void* safer_scalable_realloc (void* ptr, size_t sz, void* original_realloc) 
+{
+    if (!ptr) {
+        return scalable_malloc(sz);
+    }
+    if (isRecognized(ptr)) {
+        if (!sz) {
+            scalable_free(ptr);
+            return NULL;
+        }
+        void* tmp = reallocAligned(ptr, sz, 0);
+        if (!tmp) errno = ENOMEM;
+        return tmp;
+    }
+#if USE_WINTHREAD
+    else if (original_realloc && sz) {
+            orig_ptrs *original_ptrs = static_cast<orig_ptrs*>(original_realloc);
+            if ( original_ptrs->orig_msize ){
+                size_t oldSize = original_ptrs->orig_msize(ptr);
+                void *newBuf = scalable_malloc(sz);
+                if (newBuf) {
+                    memcpy(newBuf, ptr, sz<oldSize? sz : oldSize);
+                    if ( original_ptrs->orig_free ){
+                        original_ptrs->orig_free( ptr );
+                    }
+                }
+                return newBuf;
+             }
+    }
+#else
+    else if (original_realloc) {
+        typedef void* (*realloc_ptr_t)(void*,size_t);
+        realloc_ptr_t original_realloc_ptr;
+        (void *&)original_realloc_ptr = original_realloc;
+        return original_realloc_ptr(ptr,sz);
+    }
+#endif
+    return NULL;
+}
+
+/********* End code for scalable_realloc   ***********/
+
+/********* Code for scalable_calloc   ***********/
+
+/*
+ * From K&R
+ * calloc returns a pointer to space for an array of nobj objects, 
+ * each of size size, or NULL if the request cannot be satisfied. 
+ * The space is initialized to zero bytes.
+ *
+ */
+
+extern "C" void * scalable_calloc(size_t nobj, size_t size)
+{
+    size_t arraySize = nobj * size;
+    void* result = scalable_malloc(arraySize);
+    if (result)
+        memset(result, 0, arraySize);
+    return result;
+}
+
+/********* End code for scalable_calloc   ***********/
+
+/********* Code for aligned allocation API **********/
+
+extern "C" int scalable_posix_memalign(void **memptr, size_t alignment, size_t size)
+{
+    if ( !isPowerOfTwoMultiple(alignment, sizeof(void*)) )
+        return EINVAL;
+    void *result = allocateAligned(size, alignment);
+    if (!result)
+        return ENOMEM;
+    *memptr = result;
+    return 0;
+}
+
+extern "C" void * scalable_aligned_malloc(size_t size, size_t alignment)
+{
+    if (!isPowerOfTwo(alignment) || 0==size) {
+        errno = EINVAL;
+        return NULL;
+    }
+    void* tmp = allocateAligned(size, alignment);
+    if (!tmp) 
+        errno = ENOMEM;
+    return tmp;
+}
+
+extern "C" void * scalable_aligned_realloc(void *ptr, size_t size, size_t alignment)
+{
+    /* corner cases left out of reallocAligned to not deal with errno there */
+    if (!isPowerOfTwo(alignment)) {
+        errno = EINVAL;
+        return NULL;
+    }
+    if (!ptr) {
+        return allocateAligned(size, alignment);
+    }
+    if (!size) {
+        scalable_free(ptr);
+        return NULL;
+    }
+
+    void* tmp = reallocAligned(ptr, size, alignment);
+    if (!tmp) errno = ENOMEM;
+    return tmp;
+}
+
+extern "C" void * safer_scalable_aligned_realloc(void *ptr, size_t size, size_t alignment, void* orig_function)
+{
+    /* corner cases left out of reallocAligned to not deal with errno there */
+    if (!isPowerOfTwo(alignment)) {
+        errno = EINVAL;
+        return NULL;
+    }
+    if (!ptr) {
+        return allocateAligned(size, alignment);
+    }
+    if (isRecognized(ptr)) {
+        if (!size) {
+            scalable_free(ptr);
+            return NULL;
+        }
+        void* tmp = reallocAligned(ptr, size, alignment);
+        if (!tmp) errno = ENOMEM;
+        return tmp;
+    }
+#if USE_WINTHREAD
+    else {
+        orig_ptrs *original_ptrs = static_cast<orig_ptrs*>(orig_function);
+        if (size) {
+            if ( original_ptrs->orig_msize ){
+                size_t oldSize = original_ptrs->orig_msize(ptr);
+                void *newBuf = allocateAligned(size, alignment);
+                if (newBuf) {
+                    memcpy(newBuf, ptr, size<oldSize? size : oldSize);
+                    if ( original_ptrs->orig_free ){
+                        original_ptrs->orig_free( ptr );
+                    }
+                }
+                return newBuf;
+            }else{
+                //We can't do anything with this. Just keeping old pointer
+                return NULL;
+            }
+        } else {
+            if ( original_ptrs->orig_free ){
+                original_ptrs->orig_free( ptr );
+            }
+            return NULL;
+        }
+    }
+#endif
+    return NULL;
+}
+
+extern "C" void scalable_aligned_free(void *ptr)
+{
+    scalable_free(ptr);
+}
+
+/********* end code for aligned allocation API **********/
+
+/********* Code for scalable_msize       ***********/
+
+/*
+ * Returns the size of a memory block allocated in the heap.
+ */
+extern "C" size_t scalable_msize(void* ptr)
+{
+    if (ptr) {
+        if (isLargeObject(ptr)) {
+            LargeObjectHeader* loh = (LargeObjectHeader*)((uintptr_t)ptr - sizeof(LargeObjectHeader));
+            if (loh->mallocUniqueID==theMallocUniqueID)
+                return loh->unalignedSize-((uintptr_t)ptr-(uintptr_t)loh->unalignedResult);
+        } else {
+            Block* block = (Block *)alignDown(ptr, blockSize);
+            if (block->mallocUniqueID==theMallocUniqueID) {
+#if MALLOC_CHECK_RECURSION
+                size_t size = block->objectSize? block->objectSize : startupMsize(ptr);
+#else
+                size_t size = block->objectSize;
+#endif
+                MALLOC_ASSERT(size>0 && size<minLargeObjectSize, ASSERT_TEXT);
+                return size;
+            }
+        }
+    }
+    errno = EINVAL;
+    // Unlike _msize, return 0 in case of parameter error.
+    // Returning size_t(-1) looks more like the way to troubles.
+    return 0;
+}
+
+/*
+ * A variant that provides additional memory safety, by checking whether the given address
+ * was obtained with this allocator, and if not redirecting to the provided alternative call.
+ */
+extern "C" size_t safer_scalable_msize (void *object, size_t (*original_msize)(void*)) 
+{
+    if (object) {
+        // Check if the memory was allocated by scalable_malloc
+        if (isRecognized(object))
+            return scalable_msize(object);
+        else if (original_msize)
+            return original_msize(object);
+    }
+    // object is NULL or unknown
+    errno = EINVAL;
+    return 0;
+}
+
+/********* End code for scalable_msize   ***********/
diff --git a/dep/tbb/src/tbbmalloc/Statistics.h b/dep/tbb/src/tbbmalloc/Statistics.h
new file mode 100644
index 000000000..6eeff1ad7
--- /dev/null
+++ b/dep/tbb/src/tbbmalloc/Statistics.h
@@ -0,0 +1,137 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#define MAX_THREADS 1024
+#define NUM_OF_BINS 30
+#define ThreadCommonCounters NUM_OF_BINS
+
+enum counter_type {
+    allocBlockNew = 0,
+    allocBlockPublic,
+    allocBumpPtrUsed,
+    allocFreeListUsed,
+    allocPrivatized,
+    examineEmptyEnough,
+    examineNotEmpty,
+    freeRestoreBumpPtr,
+    freeByOtherThread,
+    freeToActiveBlock,
+    freeToInactiveBlock,
+    freeBlockPublic,
+    freeBlockBack,
+    MaxCounters
+};
+enum common_counter_type {
+    allocNewLargeObj = 0,
+    allocCachedLargeObj,
+    cacheLargeObj,
+    freeLargeObj,
+    lockPublicFreeList,
+    freeToOtherThread
+};
+
+#if COLLECT_STATISTICS
+/* Statistics reporting callback registred via a static object dtor 
+   on Posix or DLL_PROCESS_DETACH on Windows.
+ */
+
+static bool reportAllocationStatistics;
+
+struct bin_counters {
+    int counter[MaxCounters];
+};
+
+static bin_counters statistic[MAX_THREADS][NUM_OF_BINS+1]; //zero-initialized;
+
+static inline int STAT_increment(int thread, int bin, int ctr)
+{
+    return reportAllocationStatistics && thread < MAX_THREADS ? ++(statistic[thread][bin].counter[ctr]) : 0;
+}
+
+static inline void initStatisticsCollection() {
+#if defined(MALLOCENV_COLLECT_STATISTICS)
+    if (NULL != getenv(MALLOCENV_COLLECT_STATISTICS))
+        reportAllocationStatistics = true;
+#endif
+}
+
+#else
+#define STAT_increment(a,b,c) ((void)0)
+#endif /* COLLECT_STATISTICS */
+
+static inline void STAT_print(int thread)
+{
+#if COLLECT_STATISTICS
+    if (!reportAllocationStatistics)
+        return;
+
+    char filename[100];
+#if USE_PTHREAD
+    sprintf(filename, "stat_ScalableMalloc_proc%04d_thr%04d.log", getpid(), thread);
+#else
+    sprintf(filename, "stat_ScalableMalloc_thr%04d.log", thread);
+#endif
+    FILE* outfile = fopen(filename, "w");
+    for(int i=0; i<NUM_OF_BINS; ++i)
+    {
+        bin_counters& ctrs = statistic[thread][i];
+        fprintf(outfile, "Thr%04d Bin%02d", thread, i);
+        fprintf(outfile, ": allocNewBlocks %5d", ctrs.counter[allocBlockNew]);
+        fprintf(outfile, ", allocPublicBlocks %5d", ctrs.counter[allocBlockPublic]);
+        fprintf(outfile, ", restoreBumpPtr %5d", ctrs.counter[freeRestoreBumpPtr]);
+        fprintf(outfile, ", privatizeCalled %10d", ctrs.counter[allocPrivatized]);
+        fprintf(outfile, ", emptyEnough %10d", ctrs.counter[examineEmptyEnough]);
+        fprintf(outfile, ", notEmptyEnough %10d", ctrs.counter[examineNotEmpty]);
+        fprintf(outfile, ", freeBlocksPublic %5d", ctrs.counter[freeBlockPublic]);
+        fprintf(outfile, ", freeBlocksBack %5d", ctrs.counter[freeBlockBack]);
+        fprintf(outfile, "\n");
+    }
+    for(int i=0; i<NUM_OF_BINS; ++i)
+    {
+        bin_counters& ctrs = statistic[thread][i];
+        fprintf(outfile, "Thr%04d Bin%02d", thread, i);
+        fprintf(outfile, ": allocBumpPtr %10d", ctrs.counter[allocBumpPtrUsed]);
+        fprintf(outfile, ", allocFreeList %10d", ctrs.counter[allocFreeListUsed]);
+        fprintf(outfile, ", freeToActiveBlk %10d", ctrs.counter[freeToActiveBlock]);
+        fprintf(outfile, ", freeToInactive  %10d", ctrs.counter[freeToInactiveBlock]);
+        fprintf(outfile, ", freedByOther %10d", ctrs.counter[freeByOtherThread]);
+        fprintf(outfile, "\n");
+    }
+    bin_counters& ctrs = statistic[thread][ThreadCommonCounters];
+    fprintf(outfile, "Thr%04d common counters", thread);
+    fprintf(outfile, ": allocNewLargeObject %5d", ctrs.counter[allocNewLargeObj]);
+    fprintf(outfile, ": allocCachedLargeObject %5d", ctrs.counter[allocCachedLargeObj]);
+    fprintf(outfile, ", cacheLargeObject %5d", ctrs.counter[cacheLargeObj]);
+    fprintf(outfile, ", freeLargeObject %5d", ctrs.counter[freeLargeObj]);
+    fprintf(outfile, ", lockPublicFreeList %5d", ctrs.counter[lockPublicFreeList]);
+    fprintf(outfile, ", freeToOtherThread %10d", ctrs.counter[freeToOtherThread]);
+    fprintf(outfile, "\n");
+
+    fclose(outfile);
+#endif
+}
diff --git a/dep/tbb/src/tbbmalloc/TypeDefinitions.h b/dep/tbb/src/tbbmalloc/TypeDefinitions.h
new file mode 100644
index 000000000..20a03b78c
--- /dev/null
+++ b/dep/tbb/src/tbbmalloc/TypeDefinitions.h
@@ -0,0 +1,104 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef _itt_shared_malloc_TypeDefinitions_H_
+#define _itt_shared_malloc_TypeDefinitions_H_
+
+// Define preprocessor symbols used to determine architecture
+#if _WIN32||_WIN64
+#   if defined(_M_AMD64)
+#       define __ARCH_x86_64 1
+#   elif defined(_M_IA64)
+#       define __ARCH_ipf 1
+#   elif defined(_M_IX86)||defined(__i386__) // the latter for MinGW support
+#       define __ARCH_x86_32 1
+#   else
+#       error Unknown processor architecture for Windows
+#   endif
+#   define USE_WINTHREAD 1
+#else /* Assume generic Unix */
+#   if __x86_64__
+#       define __ARCH_x86_64 1
+#   elif __ia64__
+#       define __ARCH_ipf 1
+#   elif __i386__ || __i386
+#       define __ARCH_x86_32 1
+#   else
+#       define __ARCH_other 1
+#   endif
+#   define USE_PTHREAD 1
+#endif
+
+// Include files containing declarations of intptr_t and uintptr_t
+#include <stddef.h>  // size_t
+#if _MSC_VER
+typedef unsigned __int32 uint32_t;
+typedef unsigned __int64 uint64_t;
+#else
+#include <stdint.h>
+#endif
+
+namespace rml {
+namespace internal {
+
+extern bool  original_malloc_found;
+extern void* (*original_malloc_ptr)(size_t);
+extern void  (*original_free_ptr)(void*);
+
+} } // namespaces
+
+//! PROVIDE YOUR OWN Customize.h IF YOU FEEL NECESSARY
+#include "Customize.h"
+
+/*
+ * Functions to align an integer down or up to the given power of two,
+ * and test for such an alignment, and for power of two.
+ */
+template<typename T>
+static inline T alignDown(T arg, uintptr_t alignment) {
+    return T( (uintptr_t)arg                & ~(alignment-1));
+}
+template<typename T>
+static inline T alignUp  (T arg, uintptr_t alignment) {
+    return T(((uintptr_t)arg+(alignment-1)) & ~(alignment-1));
+    // /*is this better?*/ return (((uintptr_t)arg-1) | (alignment-1)) + 1;
+}
+template<typename T>
+static inline bool isAligned(T arg, uintptr_t alignment) {
+    return 0==((uintptr_t)arg & (alignment-1));
+}
+static inline bool isPowerOfTwo(uintptr_t arg) {
+    return arg && (0==(arg & (arg-1)));
+}
+static inline bool isPowerOfTwoMultiple(uintptr_t arg, uintptr_t divisor) {
+    // Divisor is assumed to be a power of two (which is valid for current uses).
+    MALLOC_ASSERT( isPowerOfTwo(divisor), "Divisor should be a power of two" );
+    return arg && (0==(arg & (arg-divisor)));
+}
+
+#endif /* _itt_shared_malloc_TypeDefinitions_H_ */
diff --git a/dep/tbb/src/tbbmalloc/lin-tbbmalloc-export.def b/dep/tbb/src/tbbmalloc/lin-tbbmalloc-export.def
new file mode 100644
index 000000000..49a4590a9
--- /dev/null
+++ b/dep/tbb/src/tbbmalloc/lin-tbbmalloc-export.def
@@ -0,0 +1,70 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+{
+global:
+
+scalable_calloc;
+scalable_free;
+scalable_malloc;
+scalable_realloc;
+scalable_posix_memalign;
+scalable_aligned_malloc;
+scalable_aligned_realloc;
+scalable_aligned_free;
+__TBB_internal_calloc;
+__TBB_internal_free;
+__TBB_internal_malloc;
+__TBB_internal_realloc;
+__TBB_internal_posix_memalign;
+scalable_msize;
+
+local:
+
+/* TBB symbols */
+*3rml8internal*;
+*3tbb*;
+*__TBB*;
+__itt_*;
+ITT_DoOneTimeInitialization;
+TBB_runtime_interface_version;
+
+/* Intel Compiler (libirc) symbols */
+__intel_*;
+_intel_*;
+get_memcpy_largest_cachelinesize;
+get_memcpy_largest_cache_size;
+get_mem_ops_method;
+init_mem_ops_method;
+irc__get_msg;
+irc__print;
+override_mem_ops_method;
+set_memcpy_largest_cachelinesize;
+set_memcpy_largest_cache_size;
+
+};
diff --git a/dep/tbb/src/tbbmalloc/lin32-proxy-export.def b/dep/tbb/src/tbbmalloc/lin32-proxy-export.def
new file mode 100644
index 000000000..16411ce47
--- /dev/null
+++ b/dep/tbb/src/tbbmalloc/lin32-proxy-export.def
@@ -0,0 +1,59 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+{
+global:
+calloc;
+free;
+malloc;
+realloc;
+posix_memalign;
+memalign;
+valloc;
+pvalloc;
+mallinfo;
+mallopt;
+__TBB_malloc_proxy;
+__TBB_internal_find_original_malloc;
+_ZdaPv; /* next ones are new/delete */
+_ZdaPvRKSt9nothrow_t;
+_ZdlPv;
+_ZdlPvRKSt9nothrow_t;
+_Znaj;
+_ZnajRKSt9nothrow_t;
+_Znwj;
+_ZnwjRKSt9nothrow_t;
+
+local:
+
+/* TBB symbols */
+*3rml8internal*;
+*3tbb*;
+*__TBB*;
+
+};
diff --git a/dep/tbb/src/tbbmalloc/lin64-proxy-export.def b/dep/tbb/src/tbbmalloc/lin64-proxy-export.def
new file mode 100644
index 000000000..21a0f0832
--- /dev/null
+++ b/dep/tbb/src/tbbmalloc/lin64-proxy-export.def
@@ -0,0 +1,59 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+{
+global:
+calloc;
+free;
+malloc;
+realloc;
+posix_memalign;
+memalign;
+valloc;
+pvalloc;
+mallinfo;
+mallopt;
+__TBB_malloc_proxy;
+__TBB_internal_find_original_malloc;
+_ZdaPv;  /* next ones are new/delete */
+_ZdaPvRKSt9nothrow_t;
+_ZdlPv;
+_ZdlPvRKSt9nothrow_t;
+_Znam;
+_ZnamRKSt9nothrow_t;
+_Znwm;
+_ZnwmRKSt9nothrow_t;
+
+local:
+
+/* TBB symbols */
+*3rml8internal*;
+*3tbb*;
+*__TBB*;
+
+};
diff --git a/dep/tbb/src/tbbmalloc/lin64ipf-proxy-export.def b/dep/tbb/src/tbbmalloc/lin64ipf-proxy-export.def
new file mode 100644
index 000000000..21a0f0832
--- /dev/null
+++ b/dep/tbb/src/tbbmalloc/lin64ipf-proxy-export.def
@@ -0,0 +1,59 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+{
+global:
+calloc;
+free;
+malloc;
+realloc;
+posix_memalign;
+memalign;
+valloc;
+pvalloc;
+mallinfo;
+mallopt;
+__TBB_malloc_proxy;
+__TBB_internal_find_original_malloc;
+_ZdaPv;  /* next ones are new/delete */
+_ZdaPvRKSt9nothrow_t;
+_ZdlPv;
+_ZdlPvRKSt9nothrow_t;
+_Znam;
+_ZnamRKSt9nothrow_t;
+_Znwm;
+_ZnwmRKSt9nothrow_t;
+
+local:
+
+/* TBB symbols */
+*3rml8internal*;
+*3tbb*;
+*__TBB*;
+
+};
diff --git a/dep/tbb/src/tbbmalloc/mac32-tbbmalloc-export.def b/dep/tbb/src/tbbmalloc/mac32-tbbmalloc-export.def
new file mode 100644
index 000000000..c211ce52c
--- /dev/null
+++ b/dep/tbb/src/tbbmalloc/mac32-tbbmalloc-export.def
@@ -0,0 +1,36 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+# MemoryAllocator.cpp
+_scalable_calloc
+_scalable_free
+_scalable_malloc
+_scalable_realloc
+_scalable_posix_memalign
+_scalable_aligned_malloc
+_scalable_aligned_realloc
+_scalable_aligned_free
+_scalable_msize
diff --git a/dep/tbb/src/tbbmalloc/mac64-tbbmalloc-export.def b/dep/tbb/src/tbbmalloc/mac64-tbbmalloc-export.def
new file mode 100644
index 000000000..c211ce52c
--- /dev/null
+++ b/dep/tbb/src/tbbmalloc/mac64-tbbmalloc-export.def
@@ -0,0 +1,36 @@
+# Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+#
+# This file is part of Threading Building Blocks.
+#
+# Threading Building Blocks is free software; you can redistribute it
+# and/or modify it under the terms of the GNU General Public License
+# version 2 as published by the Free Software Foundation.
+#
+# Threading Building Blocks is distributed in the hope that it will be
+# useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+# of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with Threading Building Blocks; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+#
+# As a special exception, you may use this file as part of a free software
+# library without restriction.  Specifically, if other files instantiate
+# templates or use macros or inline functions from this file, or you compile
+# this file and link it with other files to produce an executable, this
+# file does not by itself cause the resulting executable to be covered by
+# the GNU General Public License.  This exception does not however
+# invalidate any other reasons why the executable file might be covered by
+# the GNU General Public License.
+
+# MemoryAllocator.cpp
+_scalable_calloc
+_scalable_free
+_scalable_malloc
+_scalable_realloc
+_scalable_posix_memalign
+_scalable_aligned_malloc
+_scalable_aligned_realloc
+_scalable_aligned_free
+_scalable_msize
diff --git a/dep/tbb/src/tbbmalloc/proxy.cpp b/dep/tbb/src/tbbmalloc/proxy.cpp
new file mode 100644
index 000000000..5f27d3cf3
--- /dev/null
+++ b/dep/tbb/src/tbbmalloc/proxy.cpp
@@ -0,0 +1,434 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "proxy.h"
+
+#if MALLOC_LD_PRELOAD
+
+/*** service functions and variables ***/
+
+#include <unistd.h> // for sysconf
+#include <dlfcn.h>
+
+static long memoryPageSize;
+
+static inline void initPageSize()
+{
+    memoryPageSize = sysconf(_SC_PAGESIZE);
+}
+
+/* For the expected behaviour (i.e., finding malloc/free/etc from libc.so, 
+   not from ld-linux.so) dlsym(RTLD_NEXT) should be called from 
+   a LD_PRELOADed library, not another dynamic library.
+   So we have to put find_original_malloc here.
+ */
+extern "C" bool __TBB_internal_find_original_malloc(int num, const char *names[],
+                                                        void *ptrs[])
+{
+    for (int i=0; i<num; i++)
+        if (NULL == (ptrs[i] = dlsym (RTLD_NEXT, names[i])))
+            return false;
+
+    return true;
+}
+
+/* __TBB_malloc_proxy used as a weak symbol by libtbbmalloc for: 
+   1) detection that the proxy library is loaded
+   2) check that dlsym("malloc") found something different from our replacement malloc
+*/
+extern "C" void *__TBB_malloc_proxy() __attribute__ ((alias ("malloc")));
+
+#ifndef __THROW
+#define __THROW
+#endif
+
+/*** replacements for malloc and the family ***/
+
+extern "C" {
+
+void *malloc(size_t size) __THROW
+{
+    return __TBB_internal_malloc(size);
+}
+
+void * calloc(size_t num, size_t size) __THROW
+{
+    return __TBB_internal_calloc(num, size);
+}
+
+void free(void *object) __THROW
+{
+    __TBB_internal_free(object);
+}
+
+void * realloc(void* ptr, size_t sz) __THROW
+{
+    return __TBB_internal_realloc(ptr, sz);
+}
+
+int posix_memalign(void **memptr, size_t alignment, size_t size) __THROW
+{
+    return __TBB_internal_posix_memalign(memptr, alignment, size);
+}
+
+/* The older *NIX interface for aligned allocations;
+   it's formally substituted by posix_memalign and deprecated,
+   so we do not expect it to cause cyclic dependency with C RTL. */
+void * memalign(size_t alignment, size_t size)  __THROW
+{
+    return scalable_aligned_malloc(size, alignment);
+}
+
+/* valloc allocates memory aligned on a page boundary */
+void * valloc(size_t size) __THROW
+{
+    if (! memoryPageSize) initPageSize();
+
+    return scalable_aligned_malloc(size, memoryPageSize);
+}
+
+/* pvalloc allocates smallest set of complete pages which can hold 
+   the requested number of bytes. Result is aligned on page boundary. */
+void * pvalloc(size_t size) __THROW
+{
+    if (! memoryPageSize) initPageSize();
+    // align size up to the page size
+    size = ((size-1) | (memoryPageSize-1)) + 1;
+
+    return scalable_aligned_malloc(size, memoryPageSize);
+}
+
+int mallopt(int /*param*/, int /*value*/) __THROW
+{
+    return 1;
+}
+
+} /* extern "C" */
+
+#if __linux__
+#include <malloc.h>
+#include <string.h> // for memset
+
+extern "C" struct mallinfo mallinfo() __THROW
+{
+    struct mallinfo m;
+    memset(&m, 0, sizeof(struct mallinfo));
+
+    return m;
+}
+#endif /* __linux__ */
+
+/*** replacements for global operators new and delete ***/
+
+#include <new>
+
+void * operator new(size_t sz) throw (std::bad_alloc) {
+    void *res = scalable_malloc(sz);
+    if (NULL == res) throw std::bad_alloc();
+    return res;
+}
+void* operator new[](size_t sz) throw (std::bad_alloc) {
+    void *res = scalable_malloc(sz);
+    if (NULL == res) throw std::bad_alloc();
+    return res;
+}
+void operator delete(void* ptr) throw() {
+    scalable_free(ptr);
+}
+void operator delete[](void* ptr) throw() {
+    scalable_free(ptr);
+}
+void* operator new(size_t sz, const std::nothrow_t&) throw() {
+    return scalable_malloc(sz);
+}
+void* operator new[](std::size_t sz, const std::nothrow_t&) throw() {
+    return scalable_malloc(sz);
+}
+void operator delete(void* ptr, const std::nothrow_t&) throw() {
+    scalable_free(ptr);
+}
+void operator delete[](void* ptr, const std::nothrow_t&) throw() {
+    scalable_free(ptr);
+}
+
+#endif /* MALLOC_LD_PRELOAD */
+
+
+#ifdef _WIN32
+#include <windows.h>
+
+#include <stdio.h>
+#include "tbb_function_replacement.h"
+
+void safer_scalable_free2( void *ptr)
+{
+    safer_scalable_free( ptr, NULL );
+}
+
+// we do not support _expand();
+void* safer_expand( void *, size_t )
+{
+    return NULL;
+}
+
+#define __TBB_QV(EXP) #EXP
+#define __TBB_ORIG_ALLOCATOR_REPLACEMENT_WRAPPER(CRTLIB)\
+void (*orig_free_##CRTLIB)(void*);                                                        \
+void safer_scalable_free_##CRTLIB( void *ptr)                                             \
+{                                                                                         \
+    safer_scalable_free( ptr, orig_free_##CRTLIB );                                       \
+}                                                                                         \
+                                                                                          \
+size_t (*orig_msize_##CRTLIB)(void*);                                                     \
+size_t safer_scalable_msize_##CRTLIB( void *ptr)                                          \
+{                                                                                         \
+    return safer_scalable_msize( ptr, orig_msize_##CRTLIB );                              \
+}                                                                                         \
+                                                                                          \
+void* safer_scalable_realloc_##CRTLIB( void *ptr, size_t size )                           \
+{                                                                                         \
+    orig_ptrs func_ptrs = {orig_free_##CRTLIB, orig_msize_##CRTLIB};                      \
+    return safer_scalable_realloc( ptr, size, &func_ptrs );                               \
+}                                                                                         \
+                                                                                          \
+void* safer_scalable_aligned_realloc_##CRTLIB( void *ptr, size_t size, size_t aligment )  \
+{                                                                                         \
+    orig_ptrs func_ptrs = {orig_free_##CRTLIB, orig_msize_##CRTLIB};                      \
+    return safer_scalable_aligned_realloc( ptr, size, aligment, &func_ptrs );             \
+} 
+
+#if _WIN64
+#define __TBB_ORIG_ALLOCATOR_REPLACEMENT_CALL(CRT_VER)\
+    ReplaceFunctionWithStore( __TBB_QV(CRT_VER ## d.dll), "free",  (FUNCPTR)safer_scalable_free_ ## CRT_VER ## d,  9, (FUNCPTR*)&orig_free_ ## CRT_VER ## d );  \
+    ReplaceFunctionWithStore( __TBB_QV(CRT_VER ##.dll),   "free",  (FUNCPTR)safer_scalable_free_ ## CRT_VER,       0, NULL );                                   \
+    orig_free_ ## CRT_VER = NULL;                                                                                                                               \
+    ReplaceFunctionWithStore( __TBB_QV(CRT_VER ## d.dll), "_msize",(FUNCPTR)safer_scalable_msize_ ## CRT_VER ## d, 9, (FUNCPTR*)&orig_msize_ ## CRT_VER ## d ); \
+    ReplaceFunctionWithStore( __TBB_QV(CRT_VER ##.dll),   "_msize",(FUNCPTR)safer_scalable_msize_ ## CRT_VER,      7, (FUNCPTR*)&orig_msize_ ## CRT_VER );      \
+    ReplaceFunctionWithStore( __TBB_QV(CRT_VER ## d.dll), "realloc",         (FUNCPTR)safer_scalable_realloc_ ## CRT_VER ## d,         0, NULL); \
+    ReplaceFunctionWithStore( __TBB_QV(CRT_VER ##.dll),   "realloc",         (FUNCPTR)safer_scalable_realloc_ ## CRT_VER,              0, NULL); \
+    ReplaceFunctionWithStore( __TBB_QV(CRT_VER ## d.dll), "_aligned_free",   (FUNCPTR)safer_scalable_free_ ## CRT_VER ## d,            0, NULL); \
+    ReplaceFunctionWithStore( __TBB_QV(CRT_VER ##.dll),   "_aligned_free",   (FUNCPTR)safer_scalable_free_ ## CRT_VER,                 0, NULL); \
+    ReplaceFunctionWithStore( __TBB_QV(CRT_VER ## d.dll), "_aligned_realloc",(FUNCPTR)safer_scalable_aligned_realloc_ ## CRT_VER ## d, 0, NULL); \
+    ReplaceFunctionWithStore( __TBB_QV(CRT_VER ##.dll),   "_aligned_realloc",(FUNCPTR)safer_scalable_aligned_realloc_ ## CRT_VER,      0, NULL);
+#else
+#define __TBB_ORIG_ALLOCATOR_REPLACEMENT_CALL(CRT_VER)\
+    ReplaceFunctionWithStore( __TBB_QV(CRT_VER ## d.dll), "free",  (FUNCPTR)safer_scalable_free_ ## CRT_VER ## d,  5, (FUNCPTR*)&orig_free_ ## CRT_VER ## d );  \
+    ReplaceFunctionWithStore( __TBB_QV(CRT_VER ##.dll),   "free",  (FUNCPTR)safer_scalable_free_ ## CRT_VER,       7, (FUNCPTR*)&orig_free_ ## CRT_VER );       \
+    ReplaceFunctionWithStore( __TBB_QV(CRT_VER ## d.dll), "_msize",(FUNCPTR)safer_scalable_msize_ ## CRT_VER ## d, 5, (FUNCPTR*)&orig_msize_ ## CRT_VER ## d ); \
+    ReplaceFunctionWithStore( __TBB_QV(CRT_VER ##.dll),   "_msize",(FUNCPTR)safer_scalable_msize_ ## CRT_VER,      7, (FUNCPTR*)&orig_msize_ ## CRT_VER );      \
+    ReplaceFunctionWithStore( __TBB_QV(CRT_VER ## d.dll), "realloc",         (FUNCPTR)safer_scalable_realloc_ ## CRT_VER ## d,         0, NULL); \
+    ReplaceFunctionWithStore( __TBB_QV(CRT_VER ##.dll),   "realloc",         (FUNCPTR)safer_scalable_realloc_ ## CRT_VER,              0, NULL); \
+    ReplaceFunctionWithStore( __TBB_QV(CRT_VER ## d.dll), "_aligned_free",   (FUNCPTR)safer_scalable_free_ ## CRT_VER ## d,            0, NULL); \
+    ReplaceFunctionWithStore( __TBB_QV(CRT_VER ##.dll),   "_aligned_free",   (FUNCPTR)safer_scalable_free_ ## CRT_VER,                 0, NULL); \
+    ReplaceFunctionWithStore( __TBB_QV(CRT_VER ## d.dll), "_aligned_realloc",(FUNCPTR)safer_scalable_aligned_realloc_ ## CRT_VER ## d, 0, NULL); \
+    ReplaceFunctionWithStore( __TBB_QV(CRT_VER ##.dll),   "_aligned_realloc",(FUNCPTR)safer_scalable_aligned_realloc_ ## CRT_VER,      0, NULL);
+#endif
+
+__TBB_ORIG_ALLOCATOR_REPLACEMENT_WRAPPER(msvcr70d);
+__TBB_ORIG_ALLOCATOR_REPLACEMENT_WRAPPER(msvcr70);
+__TBB_ORIG_ALLOCATOR_REPLACEMENT_WRAPPER(msvcr71d);
+__TBB_ORIG_ALLOCATOR_REPLACEMENT_WRAPPER(msvcr71);
+__TBB_ORIG_ALLOCATOR_REPLACEMENT_WRAPPER(msvcr80d);
+__TBB_ORIG_ALLOCATOR_REPLACEMENT_WRAPPER(msvcr80);
+__TBB_ORIG_ALLOCATOR_REPLACEMENT_WRAPPER(msvcr90d);
+__TBB_ORIG_ALLOCATOR_REPLACEMENT_WRAPPER(msvcr90);
+
+
+/*** replacements for global operators new and delete ***/
+
+#include <new>
+
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+#pragma warning( push )
+#pragma warning( disable : 4290 )
+#endif
+
+void * operator_new(size_t sz) throw (std::bad_alloc) {
+    void *res = scalable_malloc(sz);
+    if (NULL == res) throw std::bad_alloc();
+    return res;
+}
+void* operator_new_arr(size_t sz) throw (std::bad_alloc) {
+    void *res = scalable_malloc(sz);
+    if (NULL == res) throw std::bad_alloc();
+    return res;
+}
+void operator_delete(void* ptr) throw() {
+    safer_scalable_free2(ptr);
+}
+#if _MSC_VER && !defined(__INTEL_COMPILER)
+#pragma warning( pop )
+#endif
+
+void operator_delete_arr(void* ptr) throw() {
+    safer_scalable_free2(ptr);
+}
+void* operator_new_t(size_t sz, const std::nothrow_t&) throw() {
+    return scalable_malloc(sz);
+}
+void* operator_new_arr_t(std::size_t sz, const std::nothrow_t&) throw() {
+    return scalable_malloc(sz);
+}
+void operator_delete_t(void* ptr, const std::nothrow_t&) throw() {
+    safer_scalable_free2(ptr);
+}
+void operator_delete_arr_t(void* ptr, const std::nothrow_t&) throw() {
+    safer_scalable_free2(ptr);
+}
+
+const char* modules_to_replace[] = {
+    "msvcr80d.dll",
+    "msvcr80.dll",
+    "msvcr90d.dll",
+    "msvcr90.dll",
+    "msvcr70d.dll",
+    "msvcr70.dll",
+    "msvcr71d.dll",
+    "msvcr71.dll",
+    };
+
+/*
+We need to replace following functions:
+malloc
+calloc
+_aligned_malloc
+_expand (by dummy implementation)
+??2@YAPAXI@Z      operator new                         (ia32)
+??_U@YAPAXI@Z     void * operator new[] (size_t size)  (ia32)
+??3@YAXPAX@Z      operator delete                      (ia32)  
+??_V@YAXPAX@Z     operator delete[]                    (ia32)
+??2@YAPEAX_K@Z    void * operator new(unsigned __int64)   (intel64)
+??_V@YAXPEAX@Z    void * operator new[](unsigned __int64) (intel64)
+??3@YAXPEAX@Z     operator delete                         (intel64)  
+??_V@YAXPEAX@Z    operator delete[]                       (intel64)
+??2@YAPAXIABUnothrow_t@std@@@Z      void * operator new (size_t sz, const std::nothrow_t&) throw()  (optional)
+??_U@YAPAXIABUnothrow_t@std@@@Z     void * operator new[] (size_t sz, const std::nothrow_t&) throw() (optional)
+
+and these functions have runtime-specific replacement:
+realloc
+free
+_msize
+_aligned_realloc
+_aligned_free
+*/
+
+typedef struct FRData_t {
+    //char *_module;
+    const char *_func;
+    FUNCPTR _fptr;
+    FRR_ON_ERROR _on_error;
+} FRDATA;
+
+FRDATA routines_to_replace[] = {
+    { "malloc",  (FUNCPTR)scalable_malloc, FRR_FAIL },
+    { "calloc",  (FUNCPTR)scalable_calloc, FRR_FAIL },
+    { "_aligned_malloc",  (FUNCPTR)scalable_aligned_malloc, FRR_FAIL },
+    { "_expand",  (FUNCPTR)safer_expand, FRR_IGNORE },
+#if _WIN64
+    { "??2@YAPEAX_K@Z", (FUNCPTR)operator_new, FRR_FAIL },
+    { "??_U@YAPEAX_K@Z", (FUNCPTR)operator_new_arr, FRR_FAIL },
+    { "??3@YAXPEAX@Z", (FUNCPTR)operator_delete, FRR_FAIL },
+    { "??_V@YAXPEAX@Z", (FUNCPTR)operator_delete_arr, FRR_FAIL },
+#else 
+    { "??2@YAPAXI@Z", (FUNCPTR)operator_new, FRR_FAIL },
+    { "??_U@YAPAXI@Z", (FUNCPTR)operator_new_arr, FRR_FAIL },
+    { "??3@YAXPAX@Z", (FUNCPTR)operator_delete, FRR_FAIL },
+    { "??_V@YAXPAX@Z", (FUNCPTR)operator_delete_arr, FRR_FAIL },
+#endif
+    { "??2@YAPAXIABUnothrow_t@std@@@Z", (FUNCPTR)operator_new_t, FRR_IGNORE },
+    { "??_U@YAPAXIABUnothrow_t@std@@@Z", (FUNCPTR)operator_new_arr_t, FRR_IGNORE }
+};
+
+#ifndef UNICODE
+void ReplaceFunctionWithStore( const char*dllName, const char *funcName, FUNCPTR newFunc, UINT opcodesNumber, FUNCPTR* origFunc )
+#else
+void ReplaceFunctionWithStore( const wchar_t *dllName, const char *funcName, FUNCPTR newFunc, UINT opcodesNumber, FUNCPTR* origFunc )
+#endif
+{
+    FRR_TYPE type = ReplaceFunction( dllName, funcName, newFunc, opcodesNumber, origFunc );
+    if (type == FRR_NODLL) return;
+    if ( type != FRR_OK )
+    {
+        fprintf(stderr, "Failed to replace function %s in module %s\n",
+                funcName, dllName);
+        exit(1);
+    }
+}
+
+void doMallocReplacement()
+{
+    int i,j;
+
+    // Replace functions without storing original code
+    int modules_to_replace_count = sizeof(modules_to_replace) / sizeof(modules_to_replace[0]);
+    int routines_to_replace_count = sizeof(routines_to_replace) / sizeof(routines_to_replace[0]);
+    for ( j=0; j<modules_to_replace_count; j++ )
+        for (i = 0; i < routines_to_replace_count; i++)
+        {
+            FRR_TYPE type = ReplaceFunction( modules_to_replace[j], routines_to_replace[i]._func, routines_to_replace[i]._fptr, NULL, NULL );
+            if (type == FRR_NODLL) break;
+            if (type != FRR_OK && routines_to_replace[i]._on_error==FRR_FAIL)
+            {
+                fprintf(stderr, "Failed to replace function %s in module %s\n",
+                        routines_to_replace[i]._func, modules_to_replace[j]);
+                exit(1);
+            }
+        }
+
+    // Replace functions and keep backup of original code (separate for each runtime)
+    __TBB_ORIG_ALLOCATOR_REPLACEMENT_CALL(msvcr70)
+    __TBB_ORIG_ALLOCATOR_REPLACEMENT_CALL(msvcr71)
+    __TBB_ORIG_ALLOCATOR_REPLACEMENT_CALL(msvcr80)
+    __TBB_ORIG_ALLOCATOR_REPLACEMENT_CALL(msvcr90)
+}
+
+extern "C" BOOL WINAPI DllMain( HINSTANCE hInst, DWORD callReason, LPVOID reserved )
+{
+
+    if ( callReason==DLL_PROCESS_ATTACH && reserved && hInst ) {
+#if TBBMALLOC_USE_TBB_FOR_ALLOCATOR_ENV_CONTROLLED
+        char pinEnvVariable[50];
+        if( GetEnvironmentVariable("TBBMALLOC_USE_TBB_FOR_ALLOCATOR", pinEnvVariable, 50))
+        {
+            doMallocReplacement();
+        }
+#else
+        doMallocReplacement();
+#endif
+    }
+
+    return TRUE;
+}
+
+// Just to make the linker happy and link the DLL to the application
+extern "C" __declspec(dllexport) void __TBB_malloc_proxy()
+{
+
+}
+
+#endif //_WIN32
diff --git a/dep/tbb/src/tbbmalloc/proxy.h b/dep/tbb/src/tbbmalloc/proxy.h
new file mode 100644
index 000000000..8a12fc6de
--- /dev/null
+++ b/dep/tbb/src/tbbmalloc/proxy.h
@@ -0,0 +1,72 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef _TBB_malloc_proxy_H_
+#define _TBB_malloc_proxy_H_
+
+#if __linux__
+#define MALLOC_LD_PRELOAD 1
+#endif
+
+// MALLOC_LD_PRELOAD depends on MALLOC_CHECK_RECURSION stuff
+#if __linux__ || __APPLE__ || __sun || __FreeBSD__ || MALLOC_LD_PRELOAD
+#define MALLOC_CHECK_RECURSION 1
+#endif
+
+#include <stddef.h>
+
+extern "C" {
+    void * scalable_malloc(size_t size);
+    void * scalable_calloc(size_t nobj, size_t size);
+    void   scalable_free(void *ptr);
+    void * scalable_realloc(void* ptr, size_t size);
+    void * scalable_aligned_malloc(size_t size, size_t alignment);
+    void * scalable_aligned_realloc(void* ptr, size_t size, size_t alignment);
+    int    scalable_posix_memalign(void **memptr, size_t alignment, size_t size);
+    size_t scalable_msize(void *ptr);
+    void   safer_scalable_free( void *ptr, void (*original_free)(void*));
+    void * safer_scalable_realloc( void *ptr, size_t, void* );
+    void * safer_scalable_aligned_realloc( void *ptr, size_t, size_t, void* );
+    size_t safer_scalable_msize( void *ptr, size_t (*orig_msize_crt80d)(void*));
+
+    void * __TBB_internal_malloc(size_t size);
+    void * __TBB_internal_calloc(size_t num, size_t size);
+    void   __TBB_internal_free(void *ptr);
+    void * __TBB_internal_realloc(void* ptr, size_t sz);
+    int    __TBB_internal_posix_memalign(void **memptr, size_t alignment, size_t size);
+    
+    bool   __TBB_internal_find_original_malloc(int num, const char *names[], void *table[]);
+} // extern "C"
+
+// Struct with original free() and _msize() pointers
+struct orig_ptrs {
+    void   (*orig_free) (void*);  
+    size_t (*orig_msize)(void*); 
+};
+
+#endif /* _TBB_malloc_proxy_H_ */
diff --git a/dep/tbb/src/tbbmalloc/tbb_function_replacement.cpp b/dep/tbb/src/tbbmalloc/tbb_function_replacement.cpp
new file mode 100644
index 000000000..f4b0d92a9
--- /dev/null
+++ b/dep/tbb/src/tbbmalloc/tbb_function_replacement.cpp
@@ -0,0 +1,396 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+//We works on windows only
+#ifdef _WIN32
+#define _CRT_SECURE_NO_DEPRECATE 1
+
+#include <windows.h>
+#include <new>
+#include "tbb_function_replacement.h"
+
+inline UINT_PTR Ptr2Addrint(LPVOID ptr)
+{
+    Int2Ptr i2p;
+    i2p.lpv = ptr;
+    return i2p.uip;
+}
+
+inline LPVOID Addrint2Ptr(UINT_PTR ptr)
+{
+    Int2Ptr i2p;
+    i2p.uip = ptr;
+    return i2p.lpv;
+}
+
+// Is the distance between addr1 and addr2 smaller than dist
+inline bool IsInDistance(UINT_PTR addr1, UINT_PTR addr2, __int64 dist)
+{
+    __int64 diff = addr1>addr2 ? addr1-addr2 : addr2-addr1;
+    return diff<dist;
+}
+
+/*
+ * When inserting a probe in 64 bits process the distance between the insertion
+ * point and the target may be bigger than 2^32. In this case we are using 
+ * indirect jump through memory where the offset to this memory location
+ * is smaller than 2^32 and it contains the absolute address (8 bytes).
+ *
+ * This class is used to hold the pages used for the above trampolines.
+ * Since this utility will be used to replace malloc functions this implementation
+ * doesn't allocate memory dynamically.
+ *
+ * The struct MemoryBuffer holds the data about a page in the memory used for
+ * replacing functions in Intel64 where the target is too far to be replaced
+ * with a short jump. All the calculations of m_base and m_next are in a multiple
+ * of SIZE_OF_ADDRESS (which is 8 in Win64).
+ */
+class MemoryProvider {
+private:
+    struct MemoryBuffer {
+        UINT_PTR m_base;    // base address of the buffer
+        UINT_PTR m_next;    // next free location in the buffer
+        DWORD    m_size;    // size of buffer
+
+        // Default constructor
+        MemoryBuffer() : m_base(0), m_next(0), m_size(0) {}
+
+        // Constructor
+        MemoryBuffer(void *base, DWORD size)
+        {
+            m_base = Ptr2Addrint(base);
+            m_next = m_base;
+            m_size = size;
+        }
+    };
+
+MemoryBuffer *CreateBuffer(UINT_PTR addr)
+    {
+        // No more room in the pages database
+        if (m_lastBuffer - m_pages == MAX_NUM_BUFFERS)
+            return 0;
+
+        void *newAddr = Addrint2Ptr(addr);
+        // Get information for the region which the given address belongs to
+        MEMORY_BASIC_INFORMATION memInfo;
+        if (VirtualQuery(newAddr, &memInfo, sizeof(memInfo)) != sizeof(memInfo))
+            return 0;
+
+        for(;;) {
+            // The new address to check is beyond the current region and aligned to allocation size
+            newAddr = Addrint2Ptr( (Ptr2Addrint(memInfo.BaseAddress) + memInfo.RegionSize + m_allocSize) & ~(UINT_PTR)(m_allocSize-1) );
+
+            // Check that the address is in the right distance.
+            // VirtualAlloc can only round the address down; so it will remain in the right distance
+            if (!IsInDistance(addr, Ptr2Addrint(newAddr), MAX_DISTANCE))
+                break;
+
+            if (VirtualQuery(newAddr, &memInfo, sizeof(memInfo)) != sizeof(memInfo))
+                break;
+
+            if (memInfo.State == MEM_FREE && memInfo.RegionSize >= m_allocSize)
+            {
+                // Found a free region, try to allocate a page in this region
+                void *newPage = VirtualAlloc(newAddr, m_allocSize, MEM_COMMIT|MEM_RESERVE, PAGE_READWRITE);
+                if (!newPage)
+                    break;
+
+                // Add the new page to the pages database
+                MemoryBuffer *pBuff = new (m_lastBuffer) MemoryBuffer(newPage, m_allocSize);
+                ++m_lastBuffer;
+                return pBuff;
+            }
+        }
+
+        // Failed to find a buffer in the distance
+        return 0;
+    }
+
+public:
+    MemoryProvider() 
+    { 
+        SYSTEM_INFO sysInfo;
+        GetSystemInfo(&sysInfo);
+        m_allocSize = sysInfo.dwAllocationGranularity; 
+        m_lastBuffer = &m_pages[0];
+    }
+
+    // We can't free the pages in the destructor because the trampolines
+    // are using these memory locations and a replaced function might be called
+    // after the destructor was called.
+    ~MemoryProvider() 
+    {
+    }
+
+    // Return a memory location in distance less than 2^31 from input address 
+    UINT_PTR GetLocation(UINT_PTR addr)
+    {
+        MemoryBuffer *pBuff = m_pages;
+        for (; pBuff<m_lastBuffer && IsInDistance(pBuff->m_next, addr, MAX_DISTANCE); ++pBuff)
+        {
+            if (pBuff->m_next < pBuff->m_base + pBuff->m_size)
+            {
+                UINT_PTR loc = pBuff->m_next;
+                pBuff->m_next += MAX_PROBE_SIZE;
+                return loc;
+            }
+        }
+
+        pBuff = CreateBuffer(addr);
+        if(!pBuff)
+            return 0;
+
+        UINT_PTR loc = pBuff->m_next;
+        pBuff->m_next += MAX_PROBE_SIZE;
+        return loc;
+    }
+
+private:
+    MemoryBuffer m_pages[MAX_NUM_BUFFERS];
+    MemoryBuffer *m_lastBuffer;
+    DWORD m_allocSize;
+};
+
+static MemoryProvider memProvider;
+
+// Insert jump relative instruction to the input address
+// RETURN: the size of the trampoline or 0 on failure
+static DWORD InsertTrampoline32(void *inpAddr, void *targetAddr, UINT opcodesNumber, FUNCPTR* storedAddr)
+{
+    UINT_PTR srcAddr = Ptr2Addrint(inpAddr);
+    UINT_PTR tgtAddr = Ptr2Addrint(targetAddr);
+    // Check that the target fits in 32 bits
+    if (!IsInDistance(srcAddr, tgtAddr, MAX_DISTANCE))
+        return 0;
+
+    UINT_PTR offset;
+    UINT offset32;
+    UCHAR *codePtr = (UCHAR *)inpAddr;
+
+    // If requested, store original function code
+    if ( storedAddr ){
+        UINT_PTR strdAddr = memProvider.GetLocation(srcAddr);
+        if (!strdAddr)
+            return 0;
+        *storedAddr = (FUNCPTR)Addrint2Ptr(strdAddr);
+        // Set 'executable' flag for original instructions in the new place
+        DWORD pageFlags = PAGE_EXECUTE_READWRITE;
+        if (!VirtualProtect(*storedAddr, MAX_PROBE_SIZE, pageFlags, &pageFlags)) return 0;
+        // Copy original instructions to the new place
+        memcpy(*storedAddr, codePtr, opcodesNumber);
+        // Set jump to the code after replacement
+        offset = srcAddr - strdAddr - SIZE_OF_RELJUMP;
+        offset32 = (UINT)((offset & 0xFFFFFFFF));
+        *((UCHAR*)*storedAddr+opcodesNumber) = 0xE9;
+        memcpy(((UCHAR*)*storedAddr+opcodesNumber+1), &offset32, sizeof(offset32));
+    }
+
+    // The following will work correctly even if srcAddr>tgtAddr, as long as
+    // address difference is less than 2^31, which is guaranteed by IsInDistance.
+    offset = tgtAddr - srcAddr - SIZE_OF_RELJUMP;
+    offset32 = (UINT)(offset & 0xFFFFFFFF);
+    // Insert the jump to the new code
+    *codePtr = 0xE9;
+    memcpy(codePtr+1, &offset32, sizeof(offset32));
+
+    // Fill the rest with NOPs to correctly see disassembler of old code in debugger.
+    for( unsigned i=SIZE_OF_RELJUMP; i<opcodesNumber; i++ ){
+        *(codePtr+i) = 0x90;
+    }
+
+    return SIZE_OF_RELJUMP;
+}
+
+// This function is called when the offset doesn't fit in 32 bits
+// 1  Find and allocate a page in the small distance (<2^31) from input address
+// 2  Put jump RIP relative indirect through the address in the close page
+// 3  Put the absolute address of the target in the allocated location
+// RETURN: the size of the trampoline or 0 on failure
+static DWORD InsertTrampoline64(void *inpAddr, void *targetAddr, UINT opcodesNumber, FUNCPTR* storedAddr)
+{
+    UINT_PTR srcAddr = Ptr2Addrint(inpAddr);
+    UINT_PTR tgtAddr = Ptr2Addrint(targetAddr);
+
+    // Get a location close to the source address
+    UINT_PTR location = memProvider.GetLocation(srcAddr);
+    if (!location)
+        return 0;
+
+    UINT_PTR offset;
+    UINT offset32;
+    UCHAR *codePtr = (UCHAR *)inpAddr;
+
+    // Fill the location
+    UINT_PTR *locPtr = (UINT_PTR *)Addrint2Ptr(location);
+    *locPtr = tgtAddr;
+
+    // If requested, store original function code
+    if( storedAddr ){
+        UINT_PTR strdAddr = memProvider.GetLocation(srcAddr);
+        if (!strdAddr)
+            return 0;
+        *storedAddr = (FUNCPTR)Addrint2Ptr(strdAddr);
+        // Set 'executable' flag for original instructions in the new place
+        DWORD pageFlags = PAGE_EXECUTE_READWRITE;
+        if (!VirtualProtect(*storedAddr, MAX_PROBE_SIZE, pageFlags, &pageFlags)) return 0;
+        // Copy original instructions to the new place
+        memcpy(*storedAddr, codePtr, opcodesNumber);
+        // Set jump to the code after replacement. It is within the distance of relative jump!
+        offset = srcAddr - strdAddr - SIZE_OF_RELJUMP;
+        offset32 = (UINT)((offset & 0xFFFFFFFF));
+        *((UCHAR*)*storedAddr+opcodesNumber) = 0xE9;
+        memcpy(((UCHAR*)*storedAddr+opcodesNumber+1), &offset32, sizeof(offset32));
+    }
+
+    // Fill the buffer
+     offset = location - srcAddr - SIZE_OF_INDJUMP;
+     offset32 = (UINT)(offset & 0xFFFFFFFF);
+    *(codePtr) = 0xFF;
+    *(codePtr+1) = 0x25;
+    memcpy(codePtr+2, &offset32, sizeof(offset32));
+
+    // Fill the rest with NOPs to correctly see disassembler of old code in debugger.
+    for( unsigned i=SIZE_OF_INDJUMP; i<opcodesNumber; i++ ){
+        *(codePtr+i) = 0x90;
+    }
+
+    return SIZE_OF_INDJUMP;
+}
+
+// Insert a jump instruction in the inpAddr to the targetAddr
+// 1. Get the memory protection of the page containing the input address
+// 2. Change the memory protection to writable
+// 3. Call InsertTrampoline32 or InsertTrampoline64
+// 4. Restore memory protection
+// RETURN: FALSE on failure, TRUE on success
+static bool InsertTrampoline(void *inpAddr, void *targetAddr, UINT opcodesNumber, FUNCPTR* origFunc)
+{
+    DWORD probeSize;
+    // Change page protection to EXECUTE+WRITE
+    DWORD origProt = 0;
+    if (!VirtualProtect(inpAddr, MAX_PROBE_SIZE, PAGE_EXECUTE_WRITECOPY, &origProt))
+        return FALSE;
+    probeSize = InsertTrampoline32(inpAddr, targetAddr, opcodesNumber, origFunc);
+    if (!probeSize)
+        probeSize = InsertTrampoline64(inpAddr, targetAddr, opcodesNumber, origFunc);
+
+    // Restore original protection
+    VirtualProtect(inpAddr, MAX_PROBE_SIZE, origProt, &origProt);
+
+    if (!probeSize)
+        return FALSE;
+
+    FlushInstructionCache(GetCurrentProcess(), inpAddr, probeSize);
+    FlushInstructionCache(GetCurrentProcess(), origFunc, probeSize);
+
+    return TRUE;
+}
+
+// Routine to replace the functions
+// TODO: replace opcodesNumber with opcodes and opcodes number to check if we replace right code.
+FRR_TYPE ReplaceFunctionA(const char *dllName, const char *funcName, FUNCPTR newFunc, UINT opcodesNumber, FUNCPTR* origFunc)
+{
+    // Cache the results of the last search for the module
+    // Assume that there was no DLL unload between 
+    static char cachedName[MAX_PATH+1];
+    static HMODULE cachedHM = 0;
+
+    if (!dllName || !*dllName)
+        return FRR_NODLL;
+
+    if (!cachedHM || strncmp(dllName, cachedName, MAX_PATH) != 0)
+    {
+        // Find the module handle for the input dll
+        HMODULE hModule = GetModuleHandleA(dllName);
+        if (hModule == 0)
+        {
+            // Couldn't find the module with the input name
+            cachedHM = 0;
+            return FRR_NODLL;
+        }
+
+        cachedHM = hModule;
+        strncpy(cachedName, dllName, MAX_PATH);
+    }
+
+    FARPROC inpFunc = GetProcAddress(cachedHM, funcName);
+    if (inpFunc == 0)
+    {
+        // Function was not found
+        return FRR_NOFUNC;
+    }
+
+    if (!InsertTrampoline((void*)inpFunc, (void*)newFunc, opcodesNumber, origFunc)){
+        // Failed to insert the trampoline to the target address
+        return FRR_FAILED;
+    }
+
+    return FRR_OK;
+}
+
+FRR_TYPE ReplaceFunctionW(const wchar_t *dllName, const char *funcName, FUNCPTR newFunc, UINT opcodesNumber, FUNCPTR* origFunc)
+{
+    // Cache the results of the last search for the module
+    // Assume that there was no DLL unload between 
+    static wchar_t cachedName[MAX_PATH+1];
+    static HMODULE cachedHM = 0;
+
+    if (!dllName || !*dllName)
+        return FRR_NODLL;
+
+    if (!cachedHM || wcsncmp(dllName, cachedName, MAX_PATH) != 0)
+    {
+        // Find the module handle for the input dll
+        HMODULE hModule = GetModuleHandleW(dllName);
+        if (hModule == 0)
+        {
+            // Couldn't find the module with the input name
+            cachedHM = 0;
+            return FRR_NODLL;
+        }
+
+        cachedHM = hModule;
+        wcsncpy(cachedName, dllName, MAX_PATH);
+    }
+
+    FARPROC inpFunc = GetProcAddress(cachedHM, funcName);
+    if (inpFunc == 0)
+    {
+        // Function was not found
+        return FRR_NOFUNC;
+    }
+
+    if (!InsertTrampoline((void*)inpFunc, (void*)newFunc, opcodesNumber, origFunc)){
+        // Failed to insert the trampoline to the target address
+        return FRR_FAILED;
+    }
+
+    return FRR_OK;
+}
+
+#endif //_WIN32
diff --git a/dep/tbb/src/tbbmalloc/tbb_function_replacement.h b/dep/tbb/src/tbbmalloc/tbb_function_replacement.h
new file mode 100644
index 000000000..28a824852
--- /dev/null
+++ b/dep/tbb/src/tbbmalloc/tbb_function_replacement.h
@@ -0,0 +1,84 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#ifndef __TBB_function_replacement_H
+#define __TBB_function_replacement_H
+
+typedef enum {
+    FRR_OK,     /* Succeeded in replacing the function */
+    FRR_NODLL,  /* The requested DLL was not found */
+    FRR_NOFUNC, /* The requested function was not found */
+    FRR_FAILED, /* The function replacement request failed */
+} FRR_TYPE;
+
+typedef enum {
+    FRR_FAIL,     /* Required function */
+    FRR_IGNORE,   /* optional function */
+} FRR_ON_ERROR;
+
+typedef void (*FUNCPTR)();
+
+#ifndef UNICODE
+#define ReplaceFunction ReplaceFunctionA
+#else
+#define ReplaceFunction ReplaceFunctionW
+#endif //UNICODE
+
+FRR_TYPE ReplaceFunctionA(const char *dllName, const char *funcName, FUNCPTR newFunc, UINT opcodesNumber, FUNCPTR* origFunc=NULL);
+FRR_TYPE ReplaceFunctionW(const wchar_t *dllName, const char *funcName, FUNCPTR newFunc, UINT opcodesNumber, FUNCPTR* origFunc=NULL);
+
+// Utilities to convert between ADDRESS and LPVOID
+union Int2Ptr {
+    UINT_PTR uip;
+    LPVOID lpv;
+};
+
+inline UINT_PTR Ptr2Addrint(LPVOID ptr);
+inline LPVOID Addrint2Ptr(UINT_PTR ptr);
+
+// Use this value as the maximum size the trampoline region
+const int MAX_PROBE_SIZE = 32;
+
+// The size of a jump relative instruction "e9 00 00 00 00"
+const int SIZE_OF_RELJUMP = 5;
+
+// The size of jump RIP relative indirect "ff 25 00 00 00 00"
+const int SIZE_OF_INDJUMP = 6;
+
+// The size of address we put in the location (in Intel64)
+const int SIZE_OF_ADDRESS = 8;
+
+// The max distance covered in 32 bits: 2^31 - 1 - C
+// where C should not be smaller than the size of a probe.
+// The latter is important to correctly handle "backward" jumps.
+const __int64 MAX_DISTANCE = (((__int64)1 << 31) - 1) - MAX_PROBE_SIZE;
+
+// The maximum number of distinct buffers in memory
+const int MAX_NUM_BUFFERS = 256;
+
+#endif //__TBB_function_replacement_H
diff --git a/dep/tbb/src/tbbmalloc/tbbmalloc.cpp b/dep/tbb/src/tbbmalloc/tbbmalloc.cpp
new file mode 100644
index 000000000..e9bc8080e
--- /dev/null
+++ b/dep/tbb/src/tbbmalloc/tbbmalloc.cpp
@@ -0,0 +1,189 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    This file is part of Threading Building Blocks.
+
+    Threading Building Blocks is free software; you can redistribute it
+    and/or modify it under the terms of the GNU General Public License
+    version 2 as published by the Free Software Foundation.
+
+    Threading Building Blocks is distributed in the hope that it will be
+    useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with Threading Building Blocks; if not, write to the Free Software
+    Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+
+    As a special exception, you may use this file as part of a free software
+    library without restriction.  Specifically, if other files instantiate
+    templates or use macros or inline functions from this file, or you compile
+    this file and link it with other files to produce an executable, this
+    file does not by itself cause the resulting executable to be covered by
+    the GNU General Public License.  This exception does not however
+    invalidate any other reasons why the executable file might be covered by
+    the GNU General Public License.
+*/
+
+#include "TypeDefinitions.h" // Customize.h and proxy.h get included
+
+#include "tbb/itt_notify.cpp"
+
+#if MALLOC_CHECK_RECURSION
+
+#include <pthread.h>
+#include <stdio.h>
+#include <unistd.h>
+#if __sun
+#include <string.h> /* for memset */
+#include <errno.h>
+#endif
+
+#if MALLOC_LD_PRELOAD
+
+extern "C" {
+
+void   safer_scalable_free( void*, void (*)(void*) );
+void * safer_scalable_realloc( void*, size_t, void* );
+
+bool __TBB_internal_find_original_malloc(int num, const char *names[], void *table[])  __attribute__ ((weak));
+
+}
+
+#endif /* MALLOC_LD_PRELOAD */
+#endif /* MALLOC_CHECK_RECURSION */
+
+namespace rml {
+namespace internal {
+
+#if MALLOC_CHECK_RECURSION
+
+void* (*original_malloc_ptr)(size_t) = 0;
+void  (*original_free_ptr)(void*) = 0;
+static void* (*original_calloc_ptr)(size_t,size_t) = 0;
+static void* (*original_realloc_ptr)(void*,size_t) = 0;
+
+#endif /* MALLOC_CHECK_RECURSION */
+
+#if __TBB_NEW_ITT_NOTIFY
+extern "C" 
+#endif
+void ITT_DoOneTimeInitialization() {} // required for itt_notify.cpp to work
+
+#if DO_ITT_NOTIFY
+/** Caller is responsible for ensuring this routine is called exactly once. */
+void MallocInitializeITT() {
+#if __TBB_NEW_ITT_NOTIFY
+    tbb::internal::__TBB_load_ittnotify();
+#else
+    bool success = false;
+    // Check if we are running under control of VTune.
+    if( GetBoolEnvironmentVariable("KMP_FOR_TCHECK") || GetBoolEnvironmentVariable("KMP_FOR_TPROFILE") ) {
+        // Yes, we are under control of VTune.  Check for libittnotify library.
+        success = dynamic_link( LIBITTNOTIFY_NAME, ITT_HandlerTable, 5 );
+    }
+    if (!success){
+        for (int i = 0; i < 5; i++)
+            *ITT_HandlerTable[i].handler = NULL;
+    }
+#endif /* !__TBB_NEW_ITT_NOTIFY */
+}
+#endif /* DO_ITT_NOTIFY */
+
+void init_tbbmalloc() {
+#if MALLOC_LD_PRELOAD
+    if (malloc_proxy && __TBB_internal_find_original_malloc) {
+        const char *alloc_names[] = { "malloc", "free", "realloc", "calloc"};
+        void *orig_alloc_ptrs[4];
+
+        if (__TBB_internal_find_original_malloc(4, alloc_names, orig_alloc_ptrs)) {
+            (void *&)original_malloc_ptr  = orig_alloc_ptrs[0];
+            (void *&)original_free_ptr    = orig_alloc_ptrs[1];
+            (void *&)original_realloc_ptr = orig_alloc_ptrs[2];
+            (void *&)original_calloc_ptr  = orig_alloc_ptrs[3];
+            MALLOC_ASSERT( original_malloc_ptr!=malloc_proxy,
+                           "standard malloc not found" );
+/* It's workaround for a bug in GNU Libc 2.9 (as it shipped with Fedora 10).
+   1st call to libc's malloc should be not from threaded code.
+ */
+            original_free_ptr(original_malloc_ptr(1024));
+            original_malloc_found = 1;
+        }
+    }
+#endif /* MALLOC_LD_PRELOAD */
+
+#if DO_ITT_NOTIFY
+    MallocInitializeITT();
+#endif
+}
+
+#if !(_WIN32||_WIN64)
+struct RegisterProcessShutdownNotification {
+    ~RegisterProcessShutdownNotification() {
+        mallocProcessShutdownNotification();
+    }
+};
+
+static RegisterProcessShutdownNotification reg;
+#endif
+
+#if MALLOC_CHECK_RECURSION
+
+bool  original_malloc_found;
+
+#if MALLOC_LD_PRELOAD
+
+extern "C" {
+
+void * __TBB_internal_malloc(size_t size)
+{
+    return scalable_malloc(size);
+}
+
+void * __TBB_internal_calloc(size_t num, size_t size)
+{
+    return scalable_calloc(num, size);
+}
+
+int __TBB_internal_posix_memalign(void **memptr, size_t alignment, size_t size)
+{
+    return scalable_posix_memalign(memptr, alignment, size);
+}
+
+void* __TBB_internal_realloc(void* ptr, size_t sz)
+{
+    return safer_scalable_realloc(ptr, sz, (void*&)original_realloc_ptr);
+}
+
+void __TBB_internal_free(void *object)
+{
+    safer_scalable_free(object, original_free_ptr);
+}
+
+} /* extern "C" */
+
+#endif /* MALLOC_LD_PRELOAD */
+#endif /* MALLOC_CHECK_RECURSION */
+
+} } // namespaces
+
+#ifdef _WIN32
+#include <windows.h>
+
+extern "C" BOOL WINAPI DllMain( HINSTANCE hInst, DWORD callReason, LPVOID )
+{
+
+    if (callReason==DLL_THREAD_DETACH)
+    {
+        mallocThreadShutdownNotification(NULL);
+    }
+    else if (callReason==DLL_PROCESS_DETACH)
+    {
+        mallocProcessShutdownNotification();
+    }
+    return TRUE;
+}
+
+#endif //_WIN32
+
diff --git a/dep/tbb/src/tbbmalloc/tbbmalloc.rc b/dep/tbb/src/tbbmalloc/tbbmalloc.rc
new file mode 100644
index 000000000..4e8a2ed0b
--- /dev/null
+++ b/dep/tbb/src/tbbmalloc/tbbmalloc.rc
@@ -0,0 +1,129 @@
+// Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+//
+// This file is part of Threading Building Blocks.
+//
+// Threading Building Blocks is free software; you can redistribute it
+// and/or modify it under the terms of the GNU General Public License
+// version 2 as published by the Free Software Foundation.
+//
+// Threading Building Blocks is distributed in the hope that it will be
+// useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+// of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License
+// along with Threading Building Blocks; if not, write to the Free Software
+// Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+//
+// As a special exception, you may use this file as part of a free software
+// library without restriction.  Specifically, if other files instantiate
+// templates or use macros or inline functions from this file, or you compile
+// this file and link it with other files to produce an executable, this
+// file does not by itself cause the resulting executable to be covered by
+// the GNU General Public License.  This exception does not however
+// invalidate any other reasons why the executable file might be covered by
+// the GNU General Public License.
+
+// Microsoft Visual C++ generated resource script.
+//
+#ifdef APSTUDIO_INVOKED
+#ifndef APSTUDIO_READONLY_SYMBOLS
+#define _APS_NO_MFC                     1
+#define _APS_NEXT_RESOURCE_VALUE        102
+#define _APS_NEXT_COMMAND_VALUE         40001
+#define _APS_NEXT_CONTROL_VALUE         1001
+#define _APS_NEXT_SYMED_VALUE           101
+#endif
+#endif
+
+#define APSTUDIO_READONLY_SYMBOLS
+/////////////////////////////////////////////////////////////////////////////
+//
+// Generated from the TEXTINCLUDE 2 resource.
+//
+#include <winresrc.h>
+#define ENDL "\r\n"
+#include "../tbb/tbb_version.h"
+
+#define TBBMALLOC_VERNUMBERS TBB_VERSION_MAJOR, TBB_VERSION_MINOR, __TBB_VERSION_YMD
+#define TBBMALLOC_VERSION __TBB_STRING(TBBMALLOC_VERNUMBERS)
+
+/////////////////////////////////////////////////////////////////////////////
+#undef APSTUDIO_READONLY_SYMBOLS
+
+/////////////////////////////////////////////////////////////////////////////
+// Neutral resources
+
+#if !defined(AFX_RESOURCE_DLL) || defined(AFX_TARG_NEU)
+#ifdef _WIN32
+LANGUAGE LANG_NEUTRAL, SUBLANG_NEUTRAL
+#pragma code_page(1252)
+#endif //_WIN32
+
+/////////////////////////////////////////////////////////////////////////////
+// manifest integration
+#ifdef TBB_MANIFEST
+#include "winuser.h"
+2 RT_MANIFEST tbbmanifest.exe.manifest
+#endif
+
+/////////////////////////////////////////////////////////////////////////////
+//
+// Version
+//
+
+VS_VERSION_INFO VERSIONINFO
+ FILEVERSION TBBMALLOC_VERNUMBERS
+ PRODUCTVERSION TBB_VERNUMBERS
+ FILEFLAGSMASK 0x17L
+#ifdef _DEBUG
+ FILEFLAGS 0x1L
+#else
+ FILEFLAGS 0x0L
+#endif
+ FILEOS 0x40004L
+ FILETYPE 0x2L
+ FILESUBTYPE 0x0L
+BEGIN
+    BLOCK "StringFileInfo"
+    BEGIN
+        BLOCK "000004b0"
+        BEGIN
+            VALUE "CompanyName", "Intel Corporation\0"
+            VALUE "FileDescription", "Scalable Allocator library\0"
+            VALUE "FileVersion", TBBMALLOC_VERSION "\0"
+//what is it?            VALUE "InternalName", "tbbmalloc\0"
+            VALUE "LegalCopyright", "Copyright 2005-2009 Intel Corporation.  All Rights Reserved.\0"
+            VALUE "LegalTrademarks", "\0"
+#ifndef TBB_USE_DEBUG
+            VALUE "OriginalFilename", "tbbmalloc.dll\0"
+#else
+            VALUE "OriginalFilename", "tbbmalloc_debug.dll\0"
+#endif
+            VALUE "ProductName", "Intel(R) Threading Building Blocks for Windows\0"
+            VALUE "ProductVersion", TBB_VERSION "\0"
+            VALUE "Comments", TBB_VERSION_STRINGS "\0"
+            VALUE "PrivateBuild", "\0"
+            VALUE "SpecialBuild", "\0"
+        END
+    END
+    BLOCK "VarFileInfo"
+    BEGIN
+        VALUE "Translation", 0x0, 1200
+    END
+END
+
+#endif    // Neutral resources
+/////////////////////////////////////////////////////////////////////////////
+
+
+#ifndef APSTUDIO_INVOKED
+/////////////////////////////////////////////////////////////////////////////
+//
+// Generated from the TEXTINCLUDE 3 resource.
+//
+
+
+/////////////////////////////////////////////////////////////////////////////
+#endif    // not APSTUDIO_INVOKED
+
diff --git a/dep/tbb/src/tbbmalloc/win-gcc-tbbmalloc-export.def b/dep/tbb/src/tbbmalloc/win-gcc-tbbmalloc-export.def
new file mode 100644
index 000000000..0e55b4dfc
--- /dev/null
+++ b/dep/tbb/src/tbbmalloc/win-gcc-tbbmalloc-export.def
@@ -0,0 +1,37 @@
+/*
+    Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+
+    The source code contained or described herein and all documents related
+    to the source code ("Material") are owned by Intel Corporation or its
+    suppliers or licensors.  Title to the Material remains with Intel
+    Corporation or its suppliers and licensors.  The Material is protected
+    by worldwide copyright laws and treaty provisions.  No part of the
+    Material may be used, copied, reproduced, modified, published, uploaded,
+    posted, transmitted, distributed, or disclosed in any way without
+    Intel's prior express written permission.
+
+    No license under any patent, copyright, trade secret or other
+    intellectual property right is granted to or conferred upon you by
+    disclosure or delivery of the Materials, either expressly, by
+    implication, inducement, estoppel or otherwise.  Any license under such
+    intellectual property rights must be express and approved by Intel in
+    writing.
+*/
+
+{
+global:
+scalable_calloc;
+scalable_free;
+scalable_malloc;
+scalable_realloc;
+scalable_posix_memalign;
+scalable_aligned_malloc;
+scalable_aligned_realloc;
+scalable_aligned_free;
+safer_scalable_free;
+safer_scalable_realloc;
+scalable_msize;
+safer_scalable_msize;
+safer_scalable_aligned_realloc;
+local:*;
+};
diff --git a/dep/tbb/src/tbbmalloc/win32-tbbmalloc-export.def b/dep/tbb/src/tbbmalloc/win32-tbbmalloc-export.def
new file mode 100644
index 000000000..e04026398
--- /dev/null
+++ b/dep/tbb/src/tbbmalloc/win32-tbbmalloc-export.def
@@ -0,0 +1,42 @@
+; Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+;
+; This file is part of Threading Building Blocks.
+;
+; Threading Building Blocks is free software; you can redistribute it
+; and/or modify it under the terms of the GNU General Public License
+; version 2 as published by the Free Software Foundation.
+;
+; Threading Building Blocks is distributed in the hope that it will be
+; useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+; of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+; GNU General Public License for more details.
+;
+; You should have received a copy of the GNU General Public License
+; along with Threading Building Blocks; if not, write to the Free Software
+; Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+;
+; As a special exception, you may use this file as part of a free software
+; library without restriction.  Specifically, if other files instantiate
+; templates or use macros or inline functions from this file, or you compile
+; this file and link it with other files to produce an executable, this
+; file does not by itself cause the resulting executable to be covered by
+; the GNU General Public License.  This exception does not however
+; invalidate any other reasons why the executable file might be covered by
+; the GNU General Public License.
+
+EXPORTS
+
+; MemoryAllocator.cpp
+scalable_calloc
+scalable_free
+scalable_malloc
+scalable_realloc
+scalable_posix_memalign
+scalable_aligned_malloc
+scalable_aligned_realloc
+scalable_aligned_free
+safer_scalable_free
+safer_scalable_realloc
+scalable_msize
+safer_scalable_msize
+safer_scalable_aligned_realloc
diff --git a/dep/tbb/src/tbbmalloc/win64-tbbmalloc-export.def b/dep/tbb/src/tbbmalloc/win64-tbbmalloc-export.def
new file mode 100644
index 000000000..e04026398
--- /dev/null
+++ b/dep/tbb/src/tbbmalloc/win64-tbbmalloc-export.def
@@ -0,0 +1,42 @@
+; Copyright 2005-2009 Intel Corporation.  All Rights Reserved.
+;
+; This file is part of Threading Building Blocks.
+;
+; Threading Building Blocks is free software; you can redistribute it
+; and/or modify it under the terms of the GNU General Public License
+; version 2 as published by the Free Software Foundation.
+;
+; Threading Building Blocks is distributed in the hope that it will be
+; useful, but WITHOUT ANY WARRANTY; without even the implied warranty
+; of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+; GNU General Public License for more details.
+;
+; You should have received a copy of the GNU General Public License
+; along with Threading Building Blocks; if not, write to the Free Software
+; Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA  02110-1301  USA
+;
+; As a special exception, you may use this file as part of a free software
+; library without restriction.  Specifically, if other files instantiate
+; templates or use macros or inline functions from this file, or you compile
+; this file and link it with other files to produce an executable, this
+; file does not by itself cause the resulting executable to be covered by
+; the GNU General Public License.  This exception does not however
+; invalidate any other reasons why the executable file might be covered by
+; the GNU General Public License.
+
+EXPORTS
+
+; MemoryAllocator.cpp
+scalable_calloc
+scalable_free
+scalable_malloc
+scalable_realloc
+scalable_posix_memalign
+scalable_aligned_malloc
+scalable_aligned_realloc
+scalable_aligned_free
+safer_scalable_free
+safer_scalable_realloc
+scalable_msize
+safer_scalable_msize
+safer_scalable_aligned_realloc
diff --git a/src/framework/Makefile.am b/src/framework/Makefile.am
index 748d5325e..9a1e8bb06 100644
--- a/src/framework/Makefile.am
+++ b/src/framework/Makefile.am
@@ -47,6 +47,7 @@ EXTRA_DIST = \
 	Platform/Define.h \
 	Policies/CreationPolicy.h \
 	Policies/ObjectLifeTime.h \
+	Policies/MemoryManagement.cpp \
 	Policies/Singleton.h \
 	Policies/SingletonImp.h \
 	Policies/ThreadingModel.h \
diff --git a/src/framework/Policies/MemoryManagement.cpp b/src/framework/Policies/MemoryManagement.cpp
new file mode 100644
index 000000000..e9555e6ef
--- /dev/null
+++ b/src/framework/Policies/MemoryManagement.cpp
@@ -0,0 +1,69 @@
+/*
+* Copyright (C) 2009 MaNGOS <http://getmangos.com/>
+*
+* This program is free software; you can redistribute it and/or modify
+* it under the terms of the GNU General Public License as published by
+* the Free Software Foundation; either version 2 of the License, or
+* (at your option) any later version.
+*
+* This program is distributed in the hope that it will be useful,
+* but WITHOUT ANY WARRANTY; without even the implied warranty of
+* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+* GNU General Public License for more details.
+*
+* You should have received a copy of the GNU General Public License
+* along with this program; if not, write to the Free Software
+* Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
+*/
+
+//lets use Intel scalable_allocator by default and
+//switch to OS specific allocator only when _STANDARD_MALLOC is defined
+#ifndef USE_STANDARD_MALLOC
+
+#include "../../dep/tbb/include/tbb/scalable_allocator.h"
+
+void * operator new(size_t sz) throw (std::bad_alloc)
+{
+    void *res = scalable_malloc(sz);
+    if (NULL == res) throw std::bad_alloc();
+    return res;
+}
+
+void* operator new[](size_t sz) throw (std::bad_alloc)
+{
+    void *res = scalable_malloc(sz);
+    if (NULL == res) throw std::bad_alloc();
+    return res;
+}
+
+void operator delete(void* ptr) throw()
+{
+    scalable_free(ptr);
+}
+
+void operator delete[](void* ptr) throw()
+{
+    scalable_free(ptr);
+}
+
+void* operator new(size_t sz, const std::nothrow_t&) throw()
+{
+    return scalable_malloc(sz);
+}
+
+void* operator new[](size_t sz, const std::nothrow_t&) throw()
+{
+    return scalable_malloc(sz);
+}
+
+void operator delete(void* ptr, const std::nothrow_t&) throw()
+{
+    scalable_free(ptr);
+}
+
+void operator delete[](void* ptr, const std::nothrow_t&) throw()
+{
+    scalable_free(ptr);
+}
+
+#endif
diff --git a/src/mangosd/Makefile.am b/src/mangosd/Makefile.am
index 3fd406888..608d0a1ba 100644
--- a/src/mangosd/Makefile.am
+++ b/src/mangosd/Makefile.am
@@ -43,9 +43,10 @@ mangos_worldd_LDADD = \
 	../shared/vmap/libmangosvmaps.a \
 	../framework/libmangosframework.a \
 	../../dep/src/sockets/libmangossockets.a \
-	../../dep/src/g3dlite/libg3dlite.a
+	../../dep/src/g3dlite/libg3dlite.a \
+	../../dep/tbb/libtbbmalloc.so
 
-mangos_worldd_LDFLAGS = -L../../dep/src/sockets -L../../dep/src/g3dlite -L../bindings/universal/ -L$(libdir) $(MANGOS_LIBS) -export-dynamic
+mangos_worldd_LDFLAGS = -L../../dep/src/sockets -L../../dep/src/g3dlite -L../bindings/universal/ -L../../dep/tbb -L$(libdir) $(MANGOS_LIBS) -export-dynamic
 
 ## Additional files to include when running 'make dist'
 #  Include world daemon configuration
diff --git a/src/realmd/Makefile.am b/src/realmd/Makefile.am
index 6aa09c392..4969c4082 100644
--- a/src/realmd/Makefile.am
+++ b/src/realmd/Makefile.am
@@ -36,9 +36,10 @@ mangos_realmd_LDADD = \
 	../shared/Auth/libmangosauth.a \
 	../shared/libmangosshared.a \
 	../framework/libmangosframework.a \
-	../../dep/src/sockets/libmangossockets.a
+	../../dep/src/sockets/libmangossockets.a \
+	../../dep/tbb/libtbbmalloc.so
 
-mangos_realmd_LDFLAGS = -L../../dep/src/sockets -L$(libdir) $(MANGOS_LIBS)
+mangos_realmd_LDFLAGS = -L../../dep/src/sockets -L../../dep/tbb -L$(libdir) $(MANGOS_LIBS)
 
 ## Additional files to include when running 'make dist'
 #  Include realm list daemon configuration
diff --git a/src/shared/revision_nr.h b/src/shared/revision_nr.h
index 4b1ffe600..1dc76f1c5 100644
--- a/src/shared/revision_nr.h
+++ b/src/shared/revision_nr.h
@@ -1,4 +1,4 @@
 #ifndef __REVISION_NR_H__
 #define __REVISION_NR_H__
- #define REVISION_NR "8734"
+ #define REVISION_NR "8735"
 #endif // __REVISION_NR_H__
diff --git a/win/VC100/framework.vcxproj b/win/VC100/framework.vcxproj
index 89142fa08..1eafcd3bd 100644
--- a/win/VC100/framework.vcxproj
+++ b/win/VC100/framework.vcxproj
@@ -1,4 +1,5 @@
-﻿<Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
+﻿<?xml version="1.0" encoding="utf-8"?>
+<Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
   <ItemGroup Label="ProjectConfigurations">
     <ProjectConfiguration Include="Debug_NoPCH|Win32">
       <Configuration>Debug_NoPCH</Configuration>
@@ -142,7 +143,7 @@
       <AdditionalOptions>/Zl /MP %(AdditionalOptions)</AdditionalOptions>
       <Optimization>Disabled</Optimization>
       <AdditionalIncludeDirectories>..\..\src\framework;..\..\dep\ACE_wrappers;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
-      <PreprocessorDefinitions>WIN32;_DEBUG;MANGOS_DEBUG;_LIB;%(PreprocessorDefinitions)</PreprocessorDefinitions>
+      <PreprocessorDefinitions>WIN32;USE_STANDARD_MALLOC;;_DEBUG;MANGOS_DEBUG;_LIB;%(PreprocessorDefinitions)</PreprocessorDefinitions>
       <MinimalRebuild>false</MinimalRebuild>
       <BasicRuntimeChecks>EnableFastChecks</BasicRuntimeChecks>
       <RuntimeLibrary>MultiThreadedDebugDLL</RuntimeLibrary>
@@ -169,7 +170,7 @@
       <AdditionalOptions>/Zl /MP %(AdditionalOptions)</AdditionalOptions>
       <Optimization>Disabled</Optimization>
       <AdditionalIncludeDirectories>..\..\src\framework;..\..\dep\ACE_wrappers;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
-      <PreprocessorDefinitions>WIN32;_DEBUG;MANGOS_DEBUG;_LIB;%(PreprocessorDefinitions)</PreprocessorDefinitions>
+      <PreprocessorDefinitions>WIN32;USE_STANDARD_MALLOC;;_DEBUG;MANGOS_DEBUG;_LIB;%(PreprocessorDefinitions)</PreprocessorDefinitions>
       <MinimalRebuild>false</MinimalRebuild>
       <BasicRuntimeChecks>EnableFastChecks</BasicRuntimeChecks>
       <RuntimeLibrary>MultiThreadedDebugDLL</RuntimeLibrary>
@@ -193,7 +194,7 @@
       <AdditionalOptions>/Zl /MP %(AdditionalOptions)</AdditionalOptions>
       <InlineFunctionExpansion>OnlyExplicitInline</InlineFunctionExpansion>
       <AdditionalIncludeDirectories>..\..\src\framework;..\..\dep\ACE_wrappers;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
-      <PreprocessorDefinitions>WIN32;NDEBUG;_LIB;%(PreprocessorDefinitions)</PreprocessorDefinitions>
+      <PreprocessorDefinitions>WIN32;USE_STANDARD_MALLOC;;NDEBUG;_LIB;%(PreprocessorDefinitions)</PreprocessorDefinitions>
       <StringPooling>true</StringPooling>
       <MinimalRebuild>false</MinimalRebuild>
       <RuntimeLibrary>MultiThreadedDLL</RuntimeLibrary>
@@ -220,7 +221,7 @@
       <AdditionalOptions>/Zl /MP %(AdditionalOptions)</AdditionalOptions>
       <InlineFunctionExpansion>OnlyExplicitInline</InlineFunctionExpansion>
       <AdditionalIncludeDirectories>..\..\src\framework;..\..\dep\ACE_wrappers;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
-      <PreprocessorDefinitions>WIN32;NDEBUG;_LIB;%(PreprocessorDefinitions)</PreprocessorDefinitions>
+      <PreprocessorDefinitions>WIN32;USE_STANDARD_MALLOC;;NDEBUG;_LIB;%(PreprocessorDefinitions)</PreprocessorDefinitions>
       <StringPooling>true</StringPooling>
       <MinimalRebuild>false</MinimalRebuild>
       <RuntimeLibrary>MultiThreadedDLL</RuntimeLibrary>
@@ -244,7 +245,7 @@
       <AdditionalOptions>/Zl /MP %(AdditionalOptions)</AdditionalOptions>
       <Optimization>Disabled</Optimization>
       <AdditionalIncludeDirectories>..\..\src\framework;..\..\dep\ACE_wrappers;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
-      <PreprocessorDefinitions>WIN32;_DEBUG;MANGOS_DEBUG;_LIB;%(PreprocessorDefinitions)</PreprocessorDefinitions>
+      <PreprocessorDefinitions>WIN32;USE_STANDARD_MALLOC;;_DEBUG;MANGOS_DEBUG;_LIB;%(PreprocessorDefinitions)</PreprocessorDefinitions>
       <MinimalRebuild>false</MinimalRebuild>
       <BasicRuntimeChecks>EnableFastChecks</BasicRuntimeChecks>
       <RuntimeLibrary>MultiThreadedDebugDLL</RuntimeLibrary>
@@ -271,7 +272,7 @@
       <AdditionalOptions>/Zl /MP %(AdditionalOptions)</AdditionalOptions>
       <Optimization>Disabled</Optimization>
       <AdditionalIncludeDirectories>..\..\src\framework;..\..\dep\ACE_wrappers;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
-      <PreprocessorDefinitions>WIN32;_DEBUG;MANGOS_DEBUG;_LIB;%(PreprocessorDefinitions)</PreprocessorDefinitions>
+      <PreprocessorDefinitions>WIN32;USE_STANDARD_MALLOC;;_DEBUG;MANGOS_DEBUG;_LIB;%(PreprocessorDefinitions)</PreprocessorDefinitions>
       <MinimalRebuild>false</MinimalRebuild>
       <BasicRuntimeChecks>EnableFastChecks</BasicRuntimeChecks>
       <RuntimeLibrary>MultiThreadedDebugDLL</RuntimeLibrary>
@@ -322,6 +323,7 @@
     <ClInclude Include="..\..\src\framework\Utilities\UnorderedMap.h" />
   </ItemGroup>
   <ItemGroup>
+    <ClCompile Include="..\..\src\framework\Policies\MemoryManagement.cpp" />
     <ClCompile Include="..\..\src\framework\Policies\ObjectLifeTime.cpp" />
     <ClCompile Include="..\..\src\framework\Utilities\EventProcessor.cpp" />
   </ItemGroup>
diff --git a/win/VC100/mangosd.vcxproj b/win/VC100/mangosd.vcxproj
index 77325676c..3176231fa 100644
--- a/win/VC100/mangosd.vcxproj
+++ b/win/VC100/mangosd.vcxproj
@@ -174,9 +174,9 @@
     </ResourceCompile>
     <Link>
       <AdditionalOptions>/MACHINE:I386 %(AdditionalOptions)</AdditionalOptions>
-      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;MSVCPRT.LIB;msvcrt.lib;%(AdditionalDependencies)</AdditionalDependencies>
+      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;framework.lib;msvcrt.lib;%(AdditionalDependencies)</AdditionalDependencies>
       <SuppressStartupBanner>true</SuppressStartupBanner>
-      <AdditionalLibraryDirectories>..\..\dep\lib\$(Platform)_$(Configuration);%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
+      <AdditionalLibraryDirectories>..\..\dep\lib\$(Platform)_$(Configuration);.\framework__$(Platform)_$(Configuration);%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
       <GenerateDebugInformation>true</GenerateDebugInformation>
       <ProgramDatabaseFile>..\..\bin\$(Platform)_$(Configuration)\mangosd.pdb</ProgramDatabaseFile>
       <GenerateMapFile>true</GenerateMapFile>
@@ -224,9 +224,9 @@
       <Culture>0x0409</Culture>
     </ResourceCompile>
     <Link>
-      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;MSVCPRT.LIB;msvcrt.lib;%(AdditionalDependencies)</AdditionalDependencies>
+      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;framework.lib;msvcrt.lib;%(AdditionalDependencies)</AdditionalDependencies>
       <SuppressStartupBanner>true</SuppressStartupBanner>
-      <AdditionalLibraryDirectories>..\..\dep\lib\$(Platform)_$(Configuration);%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
+      <AdditionalLibraryDirectories>..\..\dep\lib\$(Platform)_$(Configuration);.\framework__$(Platform)_$(Configuration);%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
       <GenerateDebugInformation>true</GenerateDebugInformation>
       <ProgramDatabaseFile>..\..\bin\$(Platform)_$(Configuration)\mangosd.pdb</ProgramDatabaseFile>
       <GenerateMapFile>true</GenerateMapFile>
@@ -274,11 +274,11 @@
     </ResourceCompile>
     <Link>
       <AdditionalOptions>/MACHINE:I386 %(AdditionalOptions)</AdditionalOptions>
-      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;MSVCPRTD.LIB;msvcrtd.lib;%(AdditionalDependencies)</AdditionalDependencies>
+      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;framework.lib;msvcrtd.lib;%(AdditionalDependencies)</AdditionalDependencies>
       <Version>
       </Version>
       <SuppressStartupBanner>true</SuppressStartupBanner>
-      <AdditionalLibraryDirectories>..\..\dep\lib\$(Platform)_$(Configuration);%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
+      <AdditionalLibraryDirectories>..\..\dep\lib\$(Platform)_$(Configuration);.\framework__$(Platform)_$(Configuration);%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
       <GenerateDebugInformation>true</GenerateDebugInformation>
       <ProgramDatabaseFile>..\..\bin\$(Platform)_$(Configuration)\mangosd.pdb</ProgramDatabaseFile>
       <GenerateMapFile>true</GenerateMapFile>
@@ -325,11 +325,11 @@
       <Culture>0x0409</Culture>
     </ResourceCompile>
     <Link>
-      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;MSVCPRTD.LIB;msvcrtd.lib;%(AdditionalDependencies)</AdditionalDependencies>
+      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;framework.lib;msvcrtd.lib;%(AdditionalDependencies)</AdditionalDependencies>
       <Version>
       </Version>
       <SuppressStartupBanner>true</SuppressStartupBanner>
-      <AdditionalLibraryDirectories>..\..\dep\lib\$(Platform)_$(Configuration);%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
+      <AdditionalLibraryDirectories>..\..\dep\lib\$(Platform)_$(Configuration);.\framework__$(Platform)_$(Configuration);%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
       <GenerateDebugInformation>true</GenerateDebugInformation>
       <ProgramDatabaseFile>..\..\bin\$(Platform)_$(Configuration)\mangosd.pdb</ProgramDatabaseFile>
       <GenerateMapFile>true</GenerateMapFile>
@@ -376,11 +376,11 @@
     </ResourceCompile>
     <Link>
       <AdditionalOptions>/MACHINE:I386 %(AdditionalOptions)</AdditionalOptions>
-      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;MSVCPRTD.LIB;msvcrtd.lib;%(AdditionalDependencies)</AdditionalDependencies>
+      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;framework.lib;msvcrtd.lib;%(AdditionalDependencies)</AdditionalDependencies>
       <Version>
       </Version>
       <SuppressStartupBanner>true</SuppressStartupBanner>
-      <AdditionalLibraryDirectories>..\..\dep\lib\$(Platform)_debug;%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
+      <AdditionalLibraryDirectories>..\..\dep\lib\$(Platform)_$(Configuration);.\framework__$(Platform)_$(Configuration);%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
       <GenerateDebugInformation>true</GenerateDebugInformation>
       <ProgramDatabaseFile>..\..\bin\$(Platform)_$(Configuration)\mangosd.pdb</ProgramDatabaseFile>
       <GenerateMapFile>true</GenerateMapFile>
@@ -427,11 +427,11 @@
       <Culture>0x0409</Culture>
     </ResourceCompile>
     <Link>
-      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;MSVCPRTD.LIB;msvcrtd.lib;%(AdditionalDependencies)</AdditionalDependencies>
+      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;framework.lib;msvcrtd.lib;%(AdditionalDependencies)</AdditionalDependencies>
       <Version>
       </Version>
       <SuppressStartupBanner>true</SuppressStartupBanner>
-      <AdditionalLibraryDirectories>..\..\dep\lib\$(Platform)_debug;%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
+      <AdditionalLibraryDirectories>..\..\dep\lib\$(Platform)_$(Configuration);.\framework__$(Platform)_$(Configuration);%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
       <GenerateDebugInformation>true</GenerateDebugInformation>
       <ProgramDatabaseFile>..\..\bin\$(Platform)_$(Configuration)\mangosd.pdb</ProgramDatabaseFile>
       <GenerateMapFile>true</GenerateMapFile>
diff --git a/win/VC100/realmd.vcxproj b/win/VC100/realmd.vcxproj
index 3a62163a4..c69da23e0 100644
--- a/win/VC100/realmd.vcxproj
+++ b/win/VC100/realmd.vcxproj
@@ -173,7 +173,7 @@
     </ResourceCompile>
     <Link>
       <AdditionalOptions>/MACHINE:I386 %(AdditionalOptions)</AdditionalOptions>
-      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;MSVCPRT.LIB;msvcrt.lib;%(AdditionalDependencies)</AdditionalDependencies>
+      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;%(AdditionalDependencies)</AdditionalDependencies>
       <SuppressStartupBanner>true</SuppressStartupBanner>
       <AdditionalLibraryDirectories>..\..\dep\lib\$(Platform)_$(Configuration);%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
       <GenerateDebugInformation>true</GenerateDebugInformation>
@@ -217,7 +217,7 @@
       <Culture>0x0409</Culture>
     </ResourceCompile>
     <Link>
-      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;MSVCPRT.LIB;msvcrt.lib;%(AdditionalDependencies)</AdditionalDependencies>
+      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;%(AdditionalDependencies)</AdditionalDependencies>
       <SuppressStartupBanner>true</SuppressStartupBanner>
       <AdditionalLibraryDirectories>..\..\dep\lib\$(Platform)_$(Configuration);%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
       <GenerateDebugInformation>true</GenerateDebugInformation>
@@ -261,7 +261,7 @@
     </ResourceCompile>
     <Link>
       <AdditionalOptions>/MACHINE:I386 %(AdditionalOptions)</AdditionalOptions>
-      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;MSVCPRTD.LIB;msvcrtd.lib;%(AdditionalDependencies)</AdditionalDependencies>
+      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;%(AdditionalDependencies)</AdditionalDependencies>
       <SuppressStartupBanner>true</SuppressStartupBanner>
       <AdditionalLibraryDirectories>..\..\dep\lib\$(Platform)_$(Configuration);%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
       <GenerateDebugInformation>true</GenerateDebugInformation>
@@ -306,7 +306,7 @@
       <Culture>0x0409</Culture>
     </ResourceCompile>
     <Link>
-      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;MSVCPRTD.LIB;msvcrtd.lib;%(AdditionalDependencies)</AdditionalDependencies>
+      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;%(AdditionalDependencies)</AdditionalDependencies>
       <SuppressStartupBanner>true</SuppressStartupBanner>
       <AdditionalLibraryDirectories>..\..\dep\lib\$(Platform)_$(Configuration);%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
       <GenerateDebugInformation>true</GenerateDebugInformation>
@@ -351,7 +351,7 @@
     </ResourceCompile>
     <Link>
       <AdditionalOptions>/MACHINE:I386 %(AdditionalOptions)</AdditionalOptions>
-      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;MSVCPRTD.LIB;msvcrtd.lib;%(AdditionalDependencies)</AdditionalDependencies>
+      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;%(AdditionalDependencies)</AdditionalDependencies>
       <SuppressStartupBanner>true</SuppressStartupBanner>
       <AdditionalLibraryDirectories>..\..\dep\lib\$(Platform)_debug;%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
       <GenerateDebugInformation>true</GenerateDebugInformation>
@@ -396,7 +396,7 @@
       <Culture>0x0409</Culture>
     </ResourceCompile>
     <Link>
-      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;MSVCPRTD.LIB;msvcrtd.lib;%(AdditionalDependencies)</AdditionalDependencies>
+      <AdditionalDependencies>libmySQL.lib;libeay32.lib;ws2_32.lib;winmm.lib;odbc32.lib;odbccp32.lib;advapi32.lib;dbghelp.lib;%(AdditionalDependencies)</AdditionalDependencies>
       <SuppressStartupBanner>true</SuppressStartupBanner>
       <AdditionalLibraryDirectories>..\..\dep\lib\$(Platform)_debug;%(AdditionalLibraryDirectories)</AdditionalLibraryDirectories>
       <GenerateDebugInformation>true</GenerateDebugInformation>
diff --git a/win/VC100/tbb.vcxproj b/win/VC100/tbb.vcxproj
new file mode 100644
index 000000000..44a51aa7a
--- /dev/null
+++ b/win/VC100/tbb.vcxproj
@@ -0,0 +1,473 @@
+﻿<Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
+  <ItemGroup Label="ProjectConfigurations">
+    <ProjectConfiguration Include="Debug_NoPCH|Win32">
+      <Configuration>Debug_NoPCH</Configuration>
+      <Platform>Win32</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Debug_NoPCH|Win32">
+      <Configuration>Debug_NoPCH</Configuration>
+      <Platform>Win32</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Debug_NoPCH|x64">
+      <Configuration>Debug_NoPCH</Configuration>
+      <Platform>x64</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Debug_NoPCH|x64">
+      <Configuration>Debug_NoPCH</Configuration>
+      <Platform>x64</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Debug|Win32">
+      <Configuration>Debug</Configuration>
+      <Platform>Win32</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Debug|Win32">
+      <Configuration>Debug</Configuration>
+      <Platform>Win32</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Debug|x64">
+      <Configuration>Debug</Configuration>
+      <Platform>x64</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Debug|x64">
+      <Configuration>Debug</Configuration>
+      <Platform>x64</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Release|Win32">
+      <Configuration>Release</Configuration>
+      <Platform>Win32</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Release|Win32">
+      <Configuration>Release</Configuration>
+      <Platform>Win32</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Release|x64">
+      <Configuration>Release</Configuration>
+      <Platform>x64</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Release|x64">
+      <Configuration>Release</Configuration>
+      <Platform>x64</Platform>
+    </ProjectConfiguration>
+  </ItemGroup>
+  <PropertyGroup Label="Globals">
+    <ProjectGUID>{F62787DD-1327-448B-9818-030062BCFAA5}</ProjectGUID>
+    <RootNamespace>tbb</RootNamespace>
+    <Keyword>Win32Proj</Keyword>
+  </PropertyGroup>
+  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
+    <ConfigurationType>DynamicLibrary</ConfigurationType>
+    <CharacterSet>NotSet</CharacterSet>
+  </PropertyGroup>
+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
+    <ConfigurationType>DynamicLibrary</ConfigurationType>
+    <CharacterSet>NotSet</CharacterSet>
+    <WholeProgramOptimization>true</WholeProgramOptimization>
+  </PropertyGroup>
+  <PropertyGroup Label="Configuration" Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">
+    <ConfigurationType>DynamicLibrary</ConfigurationType>
+    <CharacterSet>NotSet</CharacterSet>
+  </PropertyGroup>
+  <PropertyGroup Label="Configuration" Condition="'$(Configuration)|$(Platform)'=='Release|X64'">
+    <ConfigurationType>DynamicLibrary</ConfigurationType>
+    <CharacterSet>NotSet</CharacterSet>
+    <WholeProgramOptimization>true</WholeProgramOptimization>
+  </PropertyGroup>
+  <PropertyGroup Label="Configuration" Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">
+    <ConfigurationType>DynamicLibrary</ConfigurationType>
+    <CharacterSet>NotSet</CharacterSet>
+  </PropertyGroup>
+  <PropertyGroup Label="Configuration" Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">
+    <ConfigurationType>DynamicLibrary</ConfigurationType>
+    <CharacterSet>NotSet</CharacterSet>
+  </PropertyGroup>
+  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
+  <ImportGroup Label="ExtensionSettings">
+    <Import Project="$(VCTargetsPath)\BuildCustomizations\masm.props" />
+  </ImportGroup>
+  <ImportGroup Label="PropertySheets">
+    <Import Project="$(LocalAppData)\Microsoft\VisualStudio\10.0\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(LocalAppData)\Microsoft\VisualStudio\10.0\Microsoft.Cpp.$(Platform).user.props')" />
+  </ImportGroup>
+  <PropertyGroup Label="UserMacros" />
+  <PropertyGroup>
+    <_ProjectFileVersion>10.0.20506.1</_ProjectFileVersion>
+    <OutDir Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">..\..\dep\lib\$(Platform)_$(Configuration)\</OutDir>
+    <IntDir Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">.\tbb__$(Platform)_$(Configuration)\</IntDir>
+    <TargetName Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">tbb_debug</TargetName>
+    <TargetExt Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">.dll</TargetExt>
+    <LinkIncremental Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">false</LinkIncremental>
+    <OutDir Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">..\..\dep\lib\$(Platform)_$(Configuration)\</OutDir>
+    <IntDir Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">.\tbb__$(Platform)_$(Configuration)\</IntDir>
+    <TargetName Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">tbb_debug</TargetName>
+    <TargetExt Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">.dll</TargetExt>
+    <LinkIncremental Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">false</LinkIncremental>
+    <OutDir Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">..\..\dep\lib\$(Platform)_$(Configuration)\</OutDir>
+    <IntDir Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">.\tbb__$(Platform)_$(Configuration)\</IntDir>
+    <TargetName Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">tbb</TargetName>
+    <TargetExt Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">.dll</TargetExt>
+    <LinkIncremental Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">false</LinkIncremental>
+    <OutDir Condition="'$(Configuration)|$(Platform)'=='Release|X64'">..\..\dep\lib\$(Platform)_$(Configuration)\</OutDir>
+    <IntDir Condition="'$(Configuration)|$(Platform)'=='Release|X64'">.\tbb__$(Platform)_$(Configuration)\</IntDir>
+    <TargetName Condition="'$(Configuration)|$(Platform)'=='Release|X64'">tbb</TargetName>
+    <TargetExt Condition="'$(Configuration)|$(Platform)'=='Release|X64'">.dll</TargetExt>
+    <LinkIncremental Condition="'$(Configuration)|$(Platform)'=='Release|X64'">false</LinkIncremental>
+    <OutDir Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">..\..\dep\lib\$(Platform)_$(Configuration)\</OutDir>
+    <IntDir Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">$(Configuration)\</IntDir>
+    <TargetName Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">tbb_debug</TargetName>
+    <TargetExt Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">.dll</TargetExt>
+    <LinkIncremental Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">false</LinkIncremental>
+    <OutDir Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">..\..\dep\lib\$(Platform)_$(Configuration)\</OutDir>
+    <IntDir Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">$(Platform)\$(Configuration)\</IntDir>
+    <TargetName Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">tbb_debug</TargetName>
+    <TargetExt Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">.dll</TargetExt>
+    <LinkIncremental Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">false</LinkIncremental>
+  </PropertyGroup>
+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
+    <ClCompile>
+      <AdditionalOptions> /c /MDd /Od /Ob0 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /DDO_ITT_ANNOTATE /D_USE_RTM_VERSION /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /W4 /Wp64 /I../../src /I../../include %(AdditionalOptions)</AdditionalOptions>
+      <Optimization>Disabled</Optimization>
+      <AdditionalIncludeDirectories>..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
+      <PreprocessorDefinitions>%(PreprocessorDefinitions)</PreprocessorDefinitions>
+      <MinimalRebuild>true</MinimalRebuild>
+      <BasicRuntimeChecks>EnableFastChecks</BasicRuntimeChecks>
+      <RuntimeLibrary>MultiThreadedDebugDLL</RuntimeLibrary>
+      <PrecompiledHeader>
+      </PrecompiledHeader>
+      <WarningLevel>Level3</WarningLevel>
+      <DebugInformationFormat>ProgramDatabase</DebugInformationFormat>
+    </ClCompile>
+    <Link>
+      <AdditionalOptions>/DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)tbb.def %(AdditionalOptions)</AdditionalOptions>
+      <GenerateDebugInformation>true</GenerateDebugInformation>
+      <SubSystem>Windows</SubSystem>
+      <RandomizedBaseAddress>false</RandomizedBaseAddress>
+      <DataExecutionPrevention>
+      </DataExecutionPrevention>
+      <TargetMachine>MachineX86</TargetMachine>
+    </Link>
+  </ItemDefinitionGroup>
+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">
+    <Midl>
+      <TargetEnvironment>X64</TargetEnvironment>
+    </Midl>
+    <ClCompile>
+      <AdditionalOptions> /c /MDd /Od /Ob0 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /DDO_ITT_ANNOTATE /D_USE_RTM_VERSION /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /W4 /Wp64 /I../../src /I../../include %(AdditionalOptions)</AdditionalOptions>
+      <Optimization>Disabled</Optimization>
+      <AdditionalIncludeDirectories>..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
+      <PreprocessorDefinitions>%(PreprocessorDefinitions)</PreprocessorDefinitions>
+      <MinimalRebuild>true</MinimalRebuild>
+      <BasicRuntimeChecks>EnableFastChecks</BasicRuntimeChecks>
+      <RuntimeLibrary>MultiThreadedDebugDLL</RuntimeLibrary>
+      <PrecompiledHeader>
+      </PrecompiledHeader>
+      <WarningLevel>Level3</WarningLevel>
+      <DebugInformationFormat>ProgramDatabase</DebugInformationFormat>
+      <ShowIncludes>false</ShowIncludes>
+    </ClCompile>
+    <Link>
+      <AdditionalOptions>/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)tbb.def %(AdditionalOptions)</AdditionalOptions>
+      <GenerateDebugInformation>true</GenerateDebugInformation>
+      <SubSystem>Windows</SubSystem>
+      <RandomizedBaseAddress>false</RandomizedBaseAddress>
+      <DataExecutionPrevention>
+      </DataExecutionPrevention>
+      <TargetMachine>MachineX64</TargetMachine>
+    </Link>
+  </ItemDefinitionGroup>
+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
+    <ClCompile>
+      <AdditionalOptions> /c /MD /O2 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /Oy /D_USE_RTM_VERSION /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /W4 /Wp64 /I../../src /I../../include %(AdditionalOptions)</AdditionalOptions>
+      <AdditionalIncludeDirectories>..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
+      <PreprocessorDefinitions>%(PreprocessorDefinitions)</PreprocessorDefinitions>
+      <RuntimeLibrary>MultiThreadedDLL</RuntimeLibrary>
+      <PrecompiledHeader>
+      </PrecompiledHeader>
+      <WarningLevel>Level3</WarningLevel>
+      <DebugInformationFormat>ProgramDatabase</DebugInformationFormat>
+    </ClCompile>
+    <Link>
+      <AdditionalOptions>/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)tbb.def %(AdditionalOptions)</AdditionalOptions>
+      <GenerateDebugInformation>true</GenerateDebugInformation>
+      <SubSystem>Windows</SubSystem>
+      <OptimizeReferences>true</OptimizeReferences>
+      <EnableCOMDATFolding>true</EnableCOMDATFolding>
+      <RandomizedBaseAddress>false</RandomizedBaseAddress>
+      <DataExecutionPrevention>
+      </DataExecutionPrevention>
+      <TargetMachine>MachineX86</TargetMachine>
+    </Link>
+  </ItemDefinitionGroup>
+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|X64'">
+    <Midl>
+      <TargetEnvironment>X64</TargetEnvironment>
+    </Midl>
+    <ClCompile>
+      <AdditionalOptions> /c /MD /O2 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /D_USE_RTM_VERSION /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /W4 /Wp64 /I../../src /I../../include %(AdditionalOptions)</AdditionalOptions>
+      <AdditionalIncludeDirectories>..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
+      <PreprocessorDefinitions>%(PreprocessorDefinitions)</PreprocessorDefinitions>
+      <RuntimeLibrary>MultiThreadedDLL</RuntimeLibrary>
+      <PrecompiledHeader>
+      </PrecompiledHeader>
+      <WarningLevel>Level3</WarningLevel>
+      <DebugInformationFormat>ProgramDatabase</DebugInformationFormat>
+    </ClCompile>
+    <Link>
+      <AdditionalOptions>/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO /DEF:$(IntDir)tbb.def %(AdditionalOptions)</AdditionalOptions>
+      <GenerateDebugInformation>true</GenerateDebugInformation>
+      <SubSystem>Windows</SubSystem>
+      <OptimizeReferences>true</OptimizeReferences>
+      <EnableCOMDATFolding>true</EnableCOMDATFolding>
+      <RandomizedBaseAddress>false</RandomizedBaseAddress>
+      <DataExecutionPrevention>
+      </DataExecutionPrevention>
+      <TargetMachine>MachineX64</TargetMachine>
+    </Link>
+  </ItemDefinitionGroup>
+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">
+    <ClCompile>
+      <AdditionalOptions> /c /MDd /Od /Ob0 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /DDO_ITT_ANNOTATE /D_USE_RTM_VERSION /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /W4 /Wp64 /I../../src /I../../include %(AdditionalOptions)</AdditionalOptions>
+      <Optimization>Disabled</Optimization>
+      <AdditionalIncludeDirectories>..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
+      <PreprocessorDefinitions>%(PreprocessorDefinitions)</PreprocessorDefinitions>
+      <MinimalRebuild>true</MinimalRebuild>
+      <BasicRuntimeChecks>EnableFastChecks</BasicRuntimeChecks>
+      <RuntimeLibrary>MultiThreadedDebugDLL</RuntimeLibrary>
+      <PrecompiledHeader>
+      </PrecompiledHeader>
+      <WarningLevel>Level3</WarningLevel>
+      <DebugInformationFormat>ProgramDatabase</DebugInformationFormat>
+    </ClCompile>
+    <Link>
+      <AdditionalOptions>/DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)tbb.def %(AdditionalOptions)</AdditionalOptions>
+      <GenerateDebugInformation>true</GenerateDebugInformation>
+      <SubSystem>Windows</SubSystem>
+      <RandomizedBaseAddress>false</RandomizedBaseAddress>
+      <DataExecutionPrevention>
+      </DataExecutionPrevention>
+      <TargetMachine>MachineX86</TargetMachine>
+    </Link>
+  </ItemDefinitionGroup>
+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">
+    <Midl>
+      <TargetEnvironment>X64</TargetEnvironment>
+    </Midl>
+    <ClCompile>
+      <AdditionalOptions> /c /MDd /Od /Ob0 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /DDO_ITT_ANNOTATE /D_USE_RTM_VERSION /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /W4 /Wp64 /I../../src /I../../include %(AdditionalOptions)</AdditionalOptions>
+      <Optimization>Disabled</Optimization>
+      <AdditionalIncludeDirectories>..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
+      <PreprocessorDefinitions>%(PreprocessorDefinitions)</PreprocessorDefinitions>
+      <MinimalRebuild>true</MinimalRebuild>
+      <BasicRuntimeChecks>EnableFastChecks</BasicRuntimeChecks>
+      <RuntimeLibrary>MultiThreadedDebugDLL</RuntimeLibrary>
+      <PrecompiledHeader>
+      </PrecompiledHeader>
+      <WarningLevel>Level3</WarningLevel>
+      <DebugInformationFormat>ProgramDatabase</DebugInformationFormat>
+      <ShowIncludes>false</ShowIncludes>
+    </ClCompile>
+    <Link>
+      <AdditionalOptions>/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)tbb.def %(AdditionalOptions)</AdditionalOptions>
+      <GenerateDebugInformation>true</GenerateDebugInformation>
+      <SubSystem>Windows</SubSystem>
+      <RandomizedBaseAddress>false</RandomizedBaseAddress>
+      <DataExecutionPrevention>
+      </DataExecutionPrevention>
+      <TargetMachine>MachineX64</TargetMachine>
+    </Link>
+  </ItemDefinitionGroup>
+  <ItemGroup>
+    <MASM Include="..\..\dep\tbb\src\tbb\ia32-masm\atomic_support.asm">
+      <AdditionalOptions Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">/coff /Zi</AdditionalOptions>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">true</ExcludedFromBuild>
+      <AdditionalOptions Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">/coff /Zi</AdditionalOptions>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|X64'">true</ExcludedFromBuild>
+      <AdditionalOptions Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">/coff /Zi</AdditionalOptions>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">true</ExcludedFromBuild>
+      <AdditionalOptions Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">/coff /Zi</AdditionalOptions>
+    </MASM>
+    <CustomBuild Include="..\..\dep\tbb\src\tbb\intel64-masm\atomic_support.asm">
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">true</ExcludedFromBuild>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">building atomic_support.obj</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">ml64 /Fo"..\..\bin\$(Platform)_$(Configuration)\atomic_support.obj" /DUSE_FRAME_POINTER /DEM64T=1 /c /Zi ../../dep/tbb/src/tbb/intel64-masm/atomic_support.asm
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">..\..\bin\$(Platform)_$(Configuration)\atomic_support.obj;%(Outputs)</Outputs>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">true</ExcludedFromBuild>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Release|X64'">building atomic_support.obj</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Release|X64'">ml64 /Fo"..\..\bin\$(Platform)_$(Configuration)\atomic_support.obj"  /DEM64T=1 /c /Zi ../../dep/tbb/src/tbb/intel64-masm/atomic_support.asm
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Release|X64'">..\..\bin\$(Platform)_$(Configuration)\atomic_support.obj;%(Outputs)</Outputs>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">true</ExcludedFromBuild>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">building atomic_support.obj</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">ml64 /Fo..\..\bin\$(Platform)_$(Configuration)\atomic_support.obj" /DUSE_FRAME_POINTER /DEM64T=1 /c /Zi ../../dep/tbb/src/tbb/intel64-masm/atomic_support.asm
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">..\..\bin\$(Platform)_$(Configuration)\atomic_support.obj;%(Outputs)</Outputs>
+    </CustomBuild>
+    <MASM Include="..\..\dep\tbb\src\tbb\ia32-masm\lock_byte.asm">
+      <AdditionalOptions Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">/coff /Zi</AdditionalOptions>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">true</ExcludedFromBuild>
+      <AdditionalOptions Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">/coff /Zi</AdditionalOptions>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|X64'">true</ExcludedFromBuild>
+      <AdditionalOptions Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">/coff /Zi</AdditionalOptions>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">true</ExcludedFromBuild>
+      <AdditionalOptions Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">/coff /Zi</AdditionalOptions>
+    </MASM>
+  </ItemGroup>
+  <ItemGroup>
+    <CustomBuild Include="..\..\dep\tbb\src\tbb\win32-tbb-export.def">
+      <Message Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">generating tbb.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">cl /nologo /TC /EP ../../dep/tbb/src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../dep/tbb/src /I../../dep/tbb/include &gt;$(IntDir)tbb.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">.\tbb__$(Platform)_$(Configuration)\tbb.def;%(Outputs)</Outputs>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">true</ExcludedFromBuild>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">generating tbb.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">cl /nologo /TC /EP ../../src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)tbb.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">$(IntDir)tbb.def;%(Outputs)</Outputs>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">generating tbb.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">cl /nologo /TC /EP ../../dep/tbb/src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../dep/tbb/src /I../../dep/tbb/include &gt;$(IntDir)tbb.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">.\tbb__$(Platform)_$(Configuration)\tbb.def;%(Outputs)</Outputs>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|X64'">true</ExcludedFromBuild>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Release|X64'">generating tbb.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Release|X64'">cl /nologo /TC /EP ../../src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)tbb.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Release|X64'">$(IntDir)tbb.def;%(Outputs)</Outputs>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">generating tbb.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">cl /nologo /TC /EP ../../dep/tbb/src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../dep/tbb/src /I../../dep/tbb/include &gt;$(IntDir)tbb.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">.\tbb__$(Platform)_$(Configuration)\tbb.def;%(Outputs)</Outputs>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">true</ExcludedFromBuild>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">generating tbb.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">cl /nologo /TC /EP ../../src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)tbb.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">$(IntDir)tbb.def;%(Outputs)</Outputs>
+    </CustomBuild>
+  </ItemGroup>
+  <ItemGroup>
+    <CustomBuild Include="..\..\dep\tbb\src\tbb\win64-tbb-export.def">
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">true</ExcludedFromBuild>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">generating tbb.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)tbb.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">$(IntDir)tbb.def;%(Outputs)</Outputs>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">generating tbb.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">cl /nologo /TC /EP ../../dep/tbb/src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../dep/tbb/src /I../../dep/tbb/include &gt;$(IntDir)tbb.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">..\..\bin\$(Platform)_$(Configuration)\tbb.def;%(Outputs)</Outputs>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">true</ExcludedFromBuild>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">generating tbb.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)tbb.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">$(IntDir)tbb.def;%(Outputs)</Outputs>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Release|X64'">generating tbb.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Release|X64'">cl /nologo /TC /EP ../../dep/tbb/src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../dep/tbb/src /I../../dep/tbb/include &gt;$(IntDir)tbb.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Release|X64'">..\..\bin\$(Platform)_$(Configuration)\tbb.def;%(Outputs)</Outputs>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">true</ExcludedFromBuild>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">generating tbb.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)tbb.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">$(IntDir)tbb.def;%(Outputs)</Outputs>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">generating tbb.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">cl /nologo /TC /EP ../../dep/tbb/src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../dep/tbb/src /I../../dep/tbb/include &gt;$(IntDir)tbb.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">..\..\bin\$(Platform)_$(Configuration)\tbb.def;%(Outputs)</Outputs>
+    </CustomBuild>
+  </ItemGroup>
+  <ItemGroup>
+    <ResourceCompile Include="..\..\dep\tbb\src\tbb\tbb_resource.rc">
+      <AdditionalOptions Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 %(AdditionalOptions)</AdditionalOptions>
+      <AdditionalOptions Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 %(AdditionalOptions)</AdditionalOptions>
+      <AdditionalOptions Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 %(AdditionalOptions)</AdditionalOptions>
+      <AdditionalOptions Condition="'$(Configuration)|$(Platform)'=='Release|X64'">/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 %(AdditionalOptions)</AdditionalOptions>
+      <AdditionalOptions Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 %(AdditionalOptions)</AdditionalOptions>
+      <AdditionalOptions Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 %(AdditionalOptions)</AdditionalOptions>
+    </ResourceCompile>
+  </ItemGroup>
+  <ItemGroup>
+    <ClInclude Include="..\..\dep\tbb\src\old\concurrent_queue_v2.h" />
+    <ClInclude Include="..\..\dep\tbb\src\old\concurrent_vector_v2.h" />
+    <ClInclude Include="..\..\dep\tbb\src\old\spin_rw_mutex_v2.h" />
+    <ClInclude Include="..\..\dep\tbb\src\tbb\dynamic_link.h" />
+    <ClInclude Include="..\..\dep\tbb\src\tbb\gate.h" />
+    <ClInclude Include="..\..\dep\tbb\src\tbb\itt_notify.h" />
+    <ClInclude Include="..\..\dep\tbb\src\tbb\tbb_assert_impl.h" />
+    <ClInclude Include="..\..\dep\tbb\src\tbb\tbb_misc.h" />
+    <ClInclude Include="..\..\dep\tbb\src\tbb\tbb_version.h" />
+    <ClInclude Include="..\..\include\tbb\aligned_space.h" />
+    <ClInclude Include="..\..\include\tbb\atomic.h" />
+    <ClInclude Include="..\..\include\tbb\blocked_range.h" />
+    <ClInclude Include="..\..\include\tbb\blocked_range2d.h" />
+    <ClInclude Include="..\..\include\tbb\blocked_range3d.h" />
+    <ClInclude Include="..\..\include\tbb\cache_aligned_allocator.h" />
+    <ClInclude Include="..\..\include\tbb\concurrent_hash_map.h" />
+    <ClInclude Include="..\..\include\tbb\concurrent_queue.h" />
+    <ClInclude Include="..\..\include\tbb\concurrent_vector.h" />
+    <ClInclude Include="..\..\include\tbb\enumerable_thread_specific.h" />
+    <ClInclude Include="..\..\include\tbb\machine\ibm_aix51.h" />
+    <ClInclude Include="..\..\include\tbb\machine\linux_common.h" />
+    <ClInclude Include="..\..\include\tbb\machine\linux_ia32.h" />
+    <ClInclude Include="..\..\include\tbb\machine\linux_ia64.h" />
+    <ClInclude Include="..\..\include\tbb\machine\linux_intel64.h" />
+    <ClInclude Include="..\..\include\tbb\machine\mac_ppc.h" />
+    <ClInclude Include="..\..\include\tbb\machine\windows_ia32.h" />
+    <ClInclude Include="..\..\include\tbb\machine\windows_intel64.h" />
+    <ClInclude Include="..\..\include\tbb\mutex.h" />
+    <ClInclude Include="..\..\include\tbb\null_mutex.h" />
+    <ClInclude Include="..\..\include\tbb\null_rw_mutex.h" />
+    <ClInclude Include="..\..\include\tbb\parallel_do.h" />
+    <ClInclude Include="..\..\include\tbb\parallel_for.h" />
+    <ClInclude Include="..\..\include\tbb\parallel_reduce.h" />
+    <ClInclude Include="..\..\include\tbb\parallel_scan.h" />
+    <ClInclude Include="..\..\include\tbb\parallel_sort.h" />
+    <ClInclude Include="..\..\include\tbb\parallel_while.h" />
+    <ClInclude Include="..\..\include\tbb\partitioner.h" />
+    <ClInclude Include="..\..\include\tbb\pipeline.h" />
+    <ClInclude Include="..\..\include\tbb\queuing_mutex.h" />
+    <ClInclude Include="..\..\include\tbb\queuing_rw_mutex.h" />
+    <ClInclude Include="..\..\include\tbb\recursive_mutex.h" />
+    <ClInclude Include="..\..\include\tbb\scalable_allocator.h" />
+    <ClInclude Include="..\..\include\tbb\spin_mutex.h" />
+    <ClInclude Include="..\..\include\tbb\spin_rw_mutex.h" />
+    <ClInclude Include="..\..\include\tbb\task.h" />
+    <ClInclude Include="..\..\include\tbb\task_scheduler_init.h" />
+    <ClInclude Include="..\..\include\tbb\task_scheduler_observer.h" />
+    <ClInclude Include="..\..\include\tbb\tbbmalloc_proxy.h" />
+    <ClInclude Include="..\..\include\tbb\tbb_allocator.h" />
+    <ClInclude Include="..\..\include\tbb\tbb_exception.h" />
+    <ClInclude Include="..\..\include\tbb\tbb_machine.h" />
+    <ClInclude Include="..\..\include\tbb\tbb_profiling.h" />
+    <ClInclude Include="..\..\include\tbb\tbb_stddef.h" />
+    <ClInclude Include="..\..\include\tbb\tbb_thread.h" />
+    <ClInclude Include="..\..\include\tbb\tick_count.h" />
+    <ClInclude Include="..\..\include\tbb\_tbb_windef.h" />
+  </ItemGroup>
+  <ItemGroup>
+    <ClCompile Include="..\..\dep\tbb\src\old\concurrent_queue_v2.cpp" />
+    <ClCompile Include="..\..\dep\tbb\src\old\concurrent_vector_v2.cpp" />
+    <ClCompile Include="..\..\dep\tbb\src\old\spin_rw_mutex_v2.cpp" />
+    <ClCompile Include="..\..\dep\tbb\src\rml\client\rml_tbb.cpp" />
+    <ClCompile Include="..\..\dep\tbb\src\tbb\cache_aligned_allocator.cpp" />
+    <ClCompile Include="..\..\dep\tbb\src\tbb\concurrent_hash_map.cpp" />
+    <ClCompile Include="..\..\dep\tbb\src\tbb\concurrent_queue.cpp" />
+    <ClCompile Include="..\..\dep\tbb\src\tbb\concurrent_vector.cpp" />
+    <ClCompile Include="..\..\dep\tbb\src\tbb\dynamic_link.cpp" />
+    <ClCompile Include="..\..\dep\tbb\src\tbb\itt_notify.cpp" />
+    <ClCompile Include="..\..\dep\tbb\src\tbb\itt_notify_proxy.c" />
+    <ClCompile Include="..\..\dep\tbb\src\tbb\mutex.cpp" />
+    <ClCompile Include="..\..\dep\tbb\src\tbb\pipeline.cpp" />
+    <ClCompile Include="..\..\dep\tbb\src\tbb\private_server.cpp" />
+    <ClCompile Include="..\..\dep\tbb\src\tbb\queuing_mutex.cpp" />
+    <ClCompile Include="..\..\dep\tbb\src\tbb\queuing_rw_mutex.cpp" />
+    <ClCompile Include="..\..\dep\tbb\src\tbb\recursive_mutex.cpp" />
+    <ClCompile Include="..\..\dep\tbb\src\tbb\spin_mutex.cpp" />
+    <ClCompile Include="..\..\dep\tbb\src\tbb\spin_rw_mutex.cpp" />
+    <ClCompile Include="..\..\dep\tbb\src\tbb\task.cpp" />
+    <ClCompile Include="..\..\dep\tbb\src\tbb\tbb_misc.cpp" />
+    <ClCompile Include="..\..\dep\tbb\src\tbb\tbb_thread.cpp" />
+  </ItemGroup>
+  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
+  <ImportGroup Label="ExtensionTargets">
+    <Import Project="$(VCTargetsPath)\BuildCustomizations\masm.targets" />
+  </ImportGroup>
+</Project>
\ No newline at end of file
diff --git a/win/VC100/tbbmalloc.vcxproj b/win/VC100/tbbmalloc.vcxproj
new file mode 100644
index 000000000..6adbf0bea
--- /dev/null
+++ b/win/VC100/tbbmalloc.vcxproj
@@ -0,0 +1,449 @@
+﻿<Project DefaultTargets="Build" ToolsVersion="4.0" xmlns="http://schemas.microsoft.com/developer/msbuild/2003">
+  <ItemGroup Label="ProjectConfigurations">
+    <ProjectConfiguration Include="Debug_NoPCH|Win32">
+      <Configuration>Debug_NoPCH</Configuration>
+      <Platform>Win32</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Debug_NoPCH|Win32">
+      <Configuration>Debug_NoPCH</Configuration>
+      <Platform>Win32</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Debug_NoPCH|x64">
+      <Configuration>Debug_NoPCH</Configuration>
+      <Platform>x64</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Debug_NoPCH|x64">
+      <Configuration>Debug_NoPCH</Configuration>
+      <Platform>x64</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Debug|Win32">
+      <Configuration>Debug</Configuration>
+      <Platform>Win32</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Debug|Win32">
+      <Configuration>Debug</Configuration>
+      <Platform>Win32</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Debug|x64">
+      <Configuration>Debug</Configuration>
+      <Platform>x64</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Debug|x64">
+      <Configuration>Debug</Configuration>
+      <Platform>x64</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Release|Win32">
+      <Configuration>Release</Configuration>
+      <Platform>Win32</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Release|Win32">
+      <Configuration>Release</Configuration>
+      <Platform>Win32</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Release|x64">
+      <Configuration>Release</Configuration>
+      <Platform>x64</Platform>
+    </ProjectConfiguration>
+    <ProjectConfiguration Include="Release|x64">
+      <Configuration>Release</Configuration>
+      <Platform>x64</Platform>
+    </ProjectConfiguration>
+  </ItemGroup>
+  <PropertyGroup Label="Globals">
+    <ProjectGUID>{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}</ProjectGUID>
+    <RootNamespace>tbbmalloc</RootNamespace>
+    <Keyword>Win32Proj</Keyword>
+  </PropertyGroup>
+  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.Default.props" />
+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'" Label="Configuration">
+    <ConfigurationType>DynamicLibrary</ConfigurationType>
+    <CharacterSet>NotSet</CharacterSet>
+  </PropertyGroup>
+  <PropertyGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'" Label="Configuration">
+    <ConfigurationType>DynamicLibrary</ConfigurationType>
+    <CharacterSet>NotSet</CharacterSet>
+    <WholeProgramOptimization>true</WholeProgramOptimization>
+  </PropertyGroup>
+  <PropertyGroup Label="Configuration" Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">
+    <ConfigurationType>DynamicLibrary</ConfigurationType>
+    <CharacterSet>NotSet</CharacterSet>
+  </PropertyGroup>
+  <PropertyGroup Label="Configuration" Condition="'$(Configuration)|$(Platform)'=='Release|X64'">
+    <ConfigurationType>DynamicLibrary</ConfigurationType>
+    <CharacterSet>NotSet</CharacterSet>
+    <WholeProgramOptimization>true</WholeProgramOptimization>
+  </PropertyGroup>
+  <PropertyGroup Label="Configuration" Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">
+    <ConfigurationType>DynamicLibrary</ConfigurationType>
+    <CharacterSet>NotSet</CharacterSet>
+  </PropertyGroup>
+  <PropertyGroup Label="Configuration" Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">
+    <ConfigurationType>DynamicLibrary</ConfigurationType>
+    <CharacterSet>NotSet</CharacterSet>
+  </PropertyGroup>
+  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.props" />
+  <ImportGroup Label="ExtensionSettings">
+    <Import Project="$(VCTargetsPath)\BuildCustomizations\masm.props" />
+  </ImportGroup>
+  <ImportGroup Label="PropertySheets">
+    <Import Project="$(LocalAppData)\Microsoft\VisualStudio\10.0\Microsoft.Cpp.$(Platform).user.props" Condition="exists('$(LocalAppData)\Microsoft\VisualStudio\10.0\Microsoft.Cpp.$(Platform).user.props')" />
+  </ImportGroup>
+  <PropertyGroup Label="UserMacros" />
+  <PropertyGroup>
+    <_ProjectFileVersion>10.0.20506.1</_ProjectFileVersion>
+    <OutDir Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">..\..\dep\lib\$(Platform)_$(Configuration)\</OutDir>
+    <IntDir Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">.\tbbmalloc__$(Platform)_$(Configuration)\</IntDir>
+    <TargetName Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">tbbmalloc_debug</TargetName>
+    <TargetExt Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">.dll</TargetExt>
+    <LinkIncremental Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">false</LinkIncremental>
+    <OutDir Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">..\..\dep\lib\$(Platform)_$(Configuration)\</OutDir>
+    <IntDir Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">.\tbbmalloc__$(Platform)_$(Configuration)\</IntDir>
+    <TargetName Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">tbbmalloc_debug</TargetName>
+    <TargetExt Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">.dll</TargetExt>
+    <LinkIncremental Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">false</LinkIncremental>
+    <OutDir Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">..\..\dep\lib\$(Platform)_$(Configuration)\</OutDir>
+    <IntDir Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">.\tbbmalloc__$(Platform)_$(Configuration)\</IntDir>
+    <TargetName Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">tbbmalloc</TargetName>
+    <TargetExt Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">.dll</TargetExt>
+    <LinkIncremental Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">false</LinkIncremental>
+    <OutDir Condition="'$(Configuration)|$(Platform)'=='Release|X64'">..\..\dep\lib\$(Platform)_$(Configuration)\</OutDir>
+    <IntDir Condition="'$(Configuration)|$(Platform)'=='Release|X64'">.\tbbmalloc__$(Platform)_$(Configuration)\</IntDir>
+    <TargetName Condition="'$(Configuration)|$(Platform)'=='Release|X64'">tbbmalloc</TargetName>
+    <TargetExt Condition="'$(Configuration)|$(Platform)'=='Release|X64'">.dll</TargetExt>
+    <LinkIncremental Condition="'$(Configuration)|$(Platform)'=='Release|X64'">false</LinkIncremental>
+    <OutDir Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">..\..\dep\lib\$(Platform)_$(Configuration)\</OutDir>
+    <IntDir Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">.\tbbmalloc__$(Platform)_$(Configuration)\</IntDir>
+    <TargetName Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">tbbmalloc_debug</TargetName>
+    <TargetExt Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">.dll</TargetExt>
+    <LinkIncremental Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">false</LinkIncremental>
+    <OutDir Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">..\..\dep\lib\$(Platform)_$(Configuration)\</OutDir>
+    <IntDir Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">.\tbbmalloc__$(Platform)_$(Configuration)\</IntDir>
+    <TargetName Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">tbbmalloc_debug</TargetName>
+    <TargetExt Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">.dll</TargetExt>
+    <LinkIncremental Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">false</LinkIncremental>
+  </PropertyGroup>
+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">
+    <ClCompile>
+      <AdditionalOptions> /c /MDd /Od /Ob0 /Zi /EHs- /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc %(AdditionalOptions)</AdditionalOptions>
+      <Optimization>Disabled</Optimization>
+      <AdditionalIncludeDirectories>..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
+      <PreprocessorDefinitions>%(PreprocessorDefinitions)</PreprocessorDefinitions>
+      <MinimalRebuild>true</MinimalRebuild>
+      <ExceptionHandling>
+      </ExceptionHandling>
+      <BasicRuntimeChecks>
+      </BasicRuntimeChecks>
+      <RuntimeLibrary>MultiThreadedDebugDLL</RuntimeLibrary>
+      <PrecompiledHeader>
+      </PrecompiledHeader>
+      <WarningLevel>Level3</WarningLevel>
+      <SuppressStartupBanner>false</SuppressStartupBanner>
+      <DebugInformationFormat>ProgramDatabase</DebugInformationFormat>
+      <DisableSpecificWarnings>4244;4267;%(DisableSpecificWarnings)</DisableSpecificWarnings>
+    </ClCompile>
+    <Link>
+      <AdditionalOptions>/DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)tbbmalloc.def %(AdditionalOptions)</AdditionalOptions>
+      <GenerateDebugInformation>true</GenerateDebugInformation>
+      <SubSystem>Windows</SubSystem>
+      <TargetMachine>MachineX86</TargetMachine>
+    </Link>
+  </ItemDefinitionGroup>
+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">
+    <Midl>
+      <TargetEnvironment>X64</TargetEnvironment>
+    </Midl>
+    <ClCompile>
+      <AdditionalOptions> /c /MDd /Od /Ob0 /Zi /EHs- /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc %(AdditionalOptions)</AdditionalOptions>
+      <Optimization>Disabled</Optimization>
+      <AdditionalIncludeDirectories>..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
+      <MinimalRebuild>false</MinimalRebuild>
+      <ExceptionHandling>
+      </ExceptionHandling>
+      <BasicRuntimeChecks>
+      </BasicRuntimeChecks>
+      <RuntimeLibrary>MultiThreadedDebugDLL</RuntimeLibrary>
+      <TreatWChar_tAsBuiltInType>true</TreatWChar_tAsBuiltInType>
+      <PrecompiledHeader>
+      </PrecompiledHeader>
+      <WarningLevel>Level3</WarningLevel>
+      <SuppressStartupBanner>false</SuppressStartupBanner>
+      <DebugInformationFormat>ProgramDatabase</DebugInformationFormat>
+      <DisableSpecificWarnings>4244;4267;%(DisableSpecificWarnings)</DisableSpecificWarnings>
+      <ShowIncludes>false</ShowIncludes>
+    </ClCompile>
+    <Link>
+      <AdditionalOptions>/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)tbbmalloc.def %(AdditionalOptions)</AdditionalOptions>
+      <GenerateDebugInformation>true</GenerateDebugInformation>
+      <SubSystem>Windows</SubSystem>
+      <TargetMachine>MachineX64</TargetMachine>
+    </Link>
+  </ItemDefinitionGroup>
+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">
+    <ClCompile>
+      <AdditionalOptions> /c /MD /O2 /Zi /EHs- /Zc:forScope /Zc:wchar_t /Oy /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc %(AdditionalOptions)</AdditionalOptions>
+      <AdditionalIncludeDirectories>..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
+      <PreprocessorDefinitions>%(PreprocessorDefinitions)</PreprocessorDefinitions>
+      <ExceptionHandling>
+      </ExceptionHandling>
+      <RuntimeLibrary>MultiThreadedDLL</RuntimeLibrary>
+      <PrecompiledHeader>
+      </PrecompiledHeader>
+      <WarningLevel>Level3</WarningLevel>
+      <SuppressStartupBanner>false</SuppressStartupBanner>
+      <DebugInformationFormat>ProgramDatabase</DebugInformationFormat>
+      <DisableSpecificWarnings>4244;4267;%(DisableSpecificWarnings)</DisableSpecificWarnings>
+    </ClCompile>
+    <Link>
+      <AdditionalOptions>/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)tbbmalloc.def %(AdditionalOptions)</AdditionalOptions>
+      <GenerateDebugInformation>true</GenerateDebugInformation>
+      <SubSystem>Windows</SubSystem>
+      <OptimizeReferences>true</OptimizeReferences>
+      <EnableCOMDATFolding>true</EnableCOMDATFolding>
+      <TargetMachine>MachineX86</TargetMachine>
+    </Link>
+  </ItemDefinitionGroup>
+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Release|X64'">
+    <Midl>
+      <TargetEnvironment>X64</TargetEnvironment>
+    </Midl>
+    <ClCompile>
+      <AdditionalOptions> /c /MD /O2 /Zi /EHs- /Zc:forScope /Zc:wchar_t /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc %(AdditionalOptions)</AdditionalOptions>
+      <AdditionalIncludeDirectories>..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
+      <PreprocessorDefinitions>%(PreprocessorDefinitions)</PreprocessorDefinitions>
+      <ExceptionHandling>
+      </ExceptionHandling>
+      <RuntimeLibrary>MultiThreadedDLL</RuntimeLibrary>
+      <PrecompiledHeader>
+      </PrecompiledHeader>
+      <WarningLevel>Level3</WarningLevel>
+      <SuppressStartupBanner>false</SuppressStartupBanner>
+      <DebugInformationFormat>ProgramDatabase</DebugInformationFormat>
+      <DisableSpecificWarnings>4244;4267;%(DisableSpecificWarnings)</DisableSpecificWarnings>
+    </ClCompile>
+    <Link>
+      <AdditionalOptions>/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO /DEF:$(IntDir)tbbmalloc.def %(AdditionalOptions)</AdditionalOptions>
+      <GenerateDebugInformation>true</GenerateDebugInformation>
+      <SubSystem>Windows</SubSystem>
+      <OptimizeReferences>true</OptimizeReferences>
+      <EnableCOMDATFolding>true</EnableCOMDATFolding>
+      <TargetMachine>MachineX64</TargetMachine>
+    </Link>
+  </ItemDefinitionGroup>
+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">
+    <ClCompile>
+      <AdditionalOptions> /c /MDd /Od /Ob0 /Zi /EHs- /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc %(AdditionalOptions)</AdditionalOptions>
+      <Optimization>Disabled</Optimization>
+      <AdditionalIncludeDirectories>..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
+      <PreprocessorDefinitions>%(PreprocessorDefinitions)</PreprocessorDefinitions>
+      <MinimalRebuild>true</MinimalRebuild>
+      <ExceptionHandling>
+      </ExceptionHandling>
+      <BasicRuntimeChecks>
+      </BasicRuntimeChecks>
+      <RuntimeLibrary>MultiThreadedDebugDLL</RuntimeLibrary>
+      <PrecompiledHeader>
+      </PrecompiledHeader>
+      <WarningLevel>Level3</WarningLevel>
+      <SuppressStartupBanner>false</SuppressStartupBanner>
+      <DebugInformationFormat>ProgramDatabase</DebugInformationFormat>
+      <DisableSpecificWarnings>4244;4267;%(DisableSpecificWarnings)</DisableSpecificWarnings>
+    </ClCompile>
+    <Link>
+      <AdditionalOptions>/DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)tbbmalloc.def %(AdditionalOptions)</AdditionalOptions>
+      <GenerateDebugInformation>true</GenerateDebugInformation>
+      <SubSystem>Windows</SubSystem>
+      <TargetMachine>MachineX86</TargetMachine>
+    </Link>
+  </ItemDefinitionGroup>
+  <ItemDefinitionGroup Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">
+    <Midl>
+      <TargetEnvironment>X64</TargetEnvironment>
+    </Midl>
+    <ClCompile>
+      <AdditionalOptions> /c /MDd /Od /Ob0 /Zi /EHs- /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc %(AdditionalOptions)</AdditionalOptions>
+      <Optimization>Disabled</Optimization>
+      <AdditionalIncludeDirectories>..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
+      <MinimalRebuild>false</MinimalRebuild>
+      <ExceptionHandling>
+      </ExceptionHandling>
+      <BasicRuntimeChecks>
+      </BasicRuntimeChecks>
+      <RuntimeLibrary>MultiThreadedDebugDLL</RuntimeLibrary>
+      <TreatWChar_tAsBuiltInType>true</TreatWChar_tAsBuiltInType>
+      <PrecompiledHeader>
+      </PrecompiledHeader>
+      <WarningLevel>Level3</WarningLevel>
+      <SuppressStartupBanner>false</SuppressStartupBanner>
+      <DebugInformationFormat>ProgramDatabase</DebugInformationFormat>
+      <DisableSpecificWarnings>4244;4267;%(DisableSpecificWarnings)</DisableSpecificWarnings>
+      <ShowIncludes>false</ShowIncludes>
+    </ClCompile>
+    <Link>
+      <AdditionalOptions>/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)tbbmalloc.def %(AdditionalOptions)</AdditionalOptions>
+      <GenerateDebugInformation>true</GenerateDebugInformation>
+      <SubSystem>Windows</SubSystem>
+      <TargetMachine>MachineX64</TargetMachine>
+    </Link>
+  </ItemDefinitionGroup>
+  <ItemGroup>
+    <CustomBuild Include="..\..\dep\tbb\src\tbb\em64t-masm\atomic_support.asm">
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">true</ExcludedFromBuild>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">building atomic_support.obj</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">ml64 /Fo"..\..\bin\$(Platform)_$(Configuration)\atomic_support.obj" /DUSE_FRAME_POINTER /DEM64T=1 /c /Zi ../../dep/tbb/src/tbb/intel64-masm/atomic_support.asm
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">..\..\bin\$(Platform)_$(Configuration)\atomic_support.obj;%(Outputs)</Outputs>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">true</ExcludedFromBuild>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Release|X64'">building atomic_support.obj</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Release|X64'">ml64 /Fo"..\..\bin\$(Platform)_$(Configuration)\atomic_support.obj"  /DEM64T=1 /c /Zi ../../dep/tbb/src/tbb/intel64-masm/atomic_support.asm
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Release|X64'">..\..\bin\$(Platform)_$(Configuration)\atomic_support.obj;%(Outputs)</Outputs>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">true</ExcludedFromBuild>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">building atomic_support.obj</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">ml64 /Fo"..\..\bin\$(Platform)_$(Configuration)\atomic_support.obj" /DUSE_FRAME_POINTER /DEM64T=1 /c /Zi ../../dep/tbb/src/tbb/intel64-masm/atomic_support.asm
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">..\..\bin\$(Platform)_$(Configuration)\atomic_support.obj;%(Outputs)</Outputs>
+    </CustomBuild>
+  </ItemGroup>
+  <ItemGroup>
+    <MASM Include="..\..\dep\tbb\src\tbb\ia32-masm\atomic_support.asm">
+      <AdditionalOptions Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">/coff /Zi</AdditionalOptions>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">true</ExcludedFromBuild>
+      <AdditionalOptions Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">/coff /Zi</AdditionalOptions>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|X64'">true</ExcludedFromBuild>
+      <AdditionalOptions Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">/coff /Zi</AdditionalOptions>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">true</ExcludedFromBuild>
+      <AdditionalOptions Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">/coff /Zi</AdditionalOptions>
+    </MASM>
+    <MASM Include="..\..\dep\tbb\src\tbb\ia32-masm\lock_byte.asm">
+      <AdditionalOptions Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">/coff /Zi</AdditionalOptions>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">true</ExcludedFromBuild>
+      <AdditionalOptions Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">/coff /Zi</AdditionalOptions>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|X64'">true</ExcludedFromBuild>
+      <AdditionalOptions Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">/coff /Zi</AdditionalOptions>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">true</ExcludedFromBuild>
+      <AdditionalOptions Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">/coff /Zi</AdditionalOptions>
+    </MASM>
+  </ItemGroup>
+  <ItemGroup>
+    <CustomBuild Include="..\..\dep\tbb\src\tbbmalloc\win32-tbbmalloc-export.def">
+      <Message Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">generating tbbmalloc.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">cl /nologo /TC /EP ../../dep/tbb/src/tbbmalloc/win32-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)tbbmalloc.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">.\tbbmalloc__$(Platform)_$(Configuration)\tbbmalloc.def;%(Outputs)</Outputs>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">true</ExcludedFromBuild>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">generating tbb.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">cl /nologo /TC /EP ../../src/tbbmalloc/win32-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)tbbmalloc.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">$(IntDir)tbbmalloc.def;%(Outputs)</Outputs>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">generating tbbmalloc.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">cl /nologo /TC /EP ../../dep/tbb/src/tbbmalloc/win32-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)tbbmalloc.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">.\tbbmalloc__$(Platform)_$(Configuration)\tbbmalloc.def;%(Outputs)</Outputs>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|X64'">true</ExcludedFromBuild>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Release|X64'">generating tbb.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Release|X64'">cl /nologo /TC /EP ../../src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)tbb.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Release|X64'">$(IntDir)tbb.def;%(Outputs)</Outputs>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">generating tbbmalloc.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">cl /nologo /TC /EP ../../dep/tbb/src/tbbmalloc/win32-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)tbbmalloc.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">.\tbbmalloc__$(Platform)_$(Configuration)\tbbmalloc.def;%(Outputs)</Outputs>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">true</ExcludedFromBuild>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">generating tbb.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">cl /nologo /TC /EP ../../src/tbbmalloc/win32-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)tbbmalloc.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">$(IntDir)tbbmalloc.def;%(Outputs)</Outputs>
+    </CustomBuild>
+  </ItemGroup>
+  <ItemGroup>
+    <CustomBuild Include="..\..\dep\tbb\src\tbbmalloc\win64-tbbmalloc-export.def">
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">true</ExcludedFromBuild>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">generating tbb.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)tbb.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|Win32'">$(IntDir)tbb.def;%(Outputs)</Outputs>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">generating tbbmalloc.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">cl /nologo /TC /EP ../../dep/tbb/src/tbbmalloc/win64-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)tbbmalloc.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Debug|X64'">..\..\bin\$(Platform)_$(Configuration)\tbbmalloc.def;%(Outputs)</Outputs>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">true</ExcludedFromBuild>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">generating tbb.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)tbb.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Release|Win32'">$(IntDir)tbb.def;%(Outputs)</Outputs>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Release|X64'">generating tbbmalloc.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Release|X64'">cl /nologo /TC /EP ../../dep/tbb/src/tbbmalloc/win64-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)tbbmalloc.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Release|X64'">..\..\bin\$(Platform)_$(Configuration)\tbbmalloc.def;%(Outputs)</Outputs>
+      <ExcludedFromBuild Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">true</ExcludedFromBuild>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">generating tbb.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)tbb.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|Win32'">$(IntDir)tbb.def;%(Outputs)</Outputs>
+      <Message Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">generating tbbmalloc.def file</Message>
+      <Command Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">cl /nologo /TC /EP ../../dep/tbb/src/tbbmalloc/win64-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)tbbmalloc.def
+</Command>
+      <Outputs Condition="'$(Configuration)|$(Platform)'=='Debug_NoPCH|X64'">..\..\bin\$(Platform)_$(Configuration)\tbbmalloc.def;%(Outputs)</Outputs>
+    </CustomBuild>
+  </ItemGroup>
+  <ItemGroup>
+    <ClInclude Include="..\..\dep\tbb\src\tbbmalloc\Customize.h" />
+    <ClInclude Include="..\..\dep\tbb\src\tbbmalloc\LifoQueue.h" />
+    <ClInclude Include="..\..\dep\tbb\src\tbbmalloc\MapMemory.h" />
+    <ClInclude Include="..\..\dep\tbb\src\tbbmalloc\Statistics.h" />
+    <ClInclude Include="..\..\dep\tbb\src\tbbmalloc\TypeDefinitions.h" />
+    <ClInclude Include="..\..\include\tbb\aligned_space.h" />
+    <ClInclude Include="..\..\include\tbb\atomic.h" />
+    <ClInclude Include="..\..\include\tbb\blocked_range.h" />
+    <ClInclude Include="..\..\include\tbb\blocked_range2d.h" />
+    <ClInclude Include="..\..\include\tbb\blocked_range3d.h" />
+    <ClInclude Include="..\..\include\tbb\cache_aligned_allocator.h" />
+    <ClInclude Include="..\..\include\tbb\concurrent_hash_map.h" />
+    <ClInclude Include="..\..\include\tbb\concurrent_queue.h" />
+    <ClInclude Include="..\..\include\tbb\concurrent_vector.h" />
+    <ClInclude Include="..\..\include\tbb\enumerable_thread_specific.h" />
+    <ClInclude Include="..\..\include\tbb\machine\windows_em64t.h" />
+    <ClInclude Include="..\..\include\tbb\machine\windows_ia32.h" />
+    <ClInclude Include="..\..\include\tbb\machine\windows_ia32_inline.h" />
+    <ClInclude Include="..\..\include\tbb\mutex.h" />
+    <ClInclude Include="..\..\include\tbb\null_mutex.h" />
+    <ClInclude Include="..\..\include\tbb\null_rw_mutex.h" />
+    <ClInclude Include="..\..\include\tbb\parallel_do.h" />
+    <ClInclude Include="..\..\include\tbb\parallel_for.h" />
+    <ClInclude Include="..\..\include\tbb\parallel_reduce.h" />
+    <ClInclude Include="..\..\include\tbb\parallel_scan.h" />
+    <ClInclude Include="..\..\include\tbb\parallel_sort.h" />
+    <ClInclude Include="..\..\include\tbb\parallel_while.h" />
+    <ClInclude Include="..\..\include\tbb\partitioner.h" />
+    <ClInclude Include="..\..\include\tbb\pipeline.h" />
+    <ClInclude Include="..\..\include\tbb\queuing_mutex.h" />
+    <ClInclude Include="..\..\include\tbb\queuing_rw_mutex.h" />
+    <ClInclude Include="..\..\include\tbb\recursive_mutex.h" />
+    <ClInclude Include="..\..\include\tbb\scalable_allocator.h" />
+    <ClInclude Include="..\..\include\tbb\spin_mutex.h" />
+    <ClInclude Include="..\..\include\tbb\spin_rw_mutex.h" />
+    <ClInclude Include="..\..\include\tbb\task.h" />
+    <ClInclude Include="..\..\include\tbb\task_scheduler_init.h" />
+    <ClInclude Include="..\..\include\tbb\task_scheduler_observer.h" />
+    <ClInclude Include="..\..\include\tbb\tbbmalloc_proxy.h" />
+    <ClInclude Include="..\..\include\tbb\tbb_allocator.h" />
+    <ClInclude Include="..\..\include\tbb\tbb_exception.h" />
+    <ClInclude Include="..\..\include\tbb\tbb_machine.h" />
+    <ClInclude Include="..\..\include\tbb\tbb_profiling.h" />
+    <ClInclude Include="..\..\include\tbb\tbb_stddef.h" />
+    <ClInclude Include="..\..\include\tbb\tbb_thread.h" />
+    <ClInclude Include="..\..\include\tbb\tick_count.h" />
+    <ClInclude Include="..\..\include\tbb\_tbb_windef.h" />
+  </ItemGroup>
+  <ItemGroup>
+    <ClCompile Include="..\..\dep\tbb\src\tbbmalloc\MemoryAllocator.cpp" />
+    <ClCompile Include="..\..\dep\tbb\src\tbbmalloc\tbbmalloc.cpp" />
+    <ClCompile Include="..\..\dep\tbb\src\tbb\dynamic_link.cpp" />
+    <ClCompile Include="..\..\dep\tbb\src\tbb\itt_notify_proxy.c" />
+    <ClCompile Include="..\..\dep\tbb\src\tbb\tbb_misc.cpp" />
+  </ItemGroup>
+  <Import Project="$(VCTargetsPath)\Microsoft.Cpp.targets" />
+  <ImportGroup Label="ExtensionTargets">
+    <Import Project="$(VCTargetsPath)\BuildCustomizations\masm.targets" />
+  </ImportGroup>
+</Project>
\ No newline at end of file
diff --git a/win/VC80/framework.vcproj b/win/VC80/framework.vcproj
index 833262ea8..60c60b7c5 100644
--- a/win/VC80/framework.vcproj
+++ b/win/VC80/framework.vcproj
@@ -526,6 +526,10 @@
 				RelativePath="..\..\src\framework\Policies\CreationPolicy.h"
 				>
 			</File>
+			<File
+				RelativePath="..\..\src\framework\Policies\MemoryManagement.cpp"
+				>
+			</File>
 			<File
 				RelativePath="..\..\src\framework\Policies\ObjectLifeTime.cpp"
 				>
@@ -566,10 +570,6 @@
 				RelativePath="..\..\src\framework\Utilities\EventProcessor.h"
 				>
 			</File>
-			<File
-				RelativePath="..\..\src\framework\Utilities\UnorderedMap.h"
-				>
-			</File>
 			<File
 				RelativePath="..\..\src\framework\Utilities\LinkedList.h"
 				>
@@ -578,6 +578,10 @@
 				RelativePath="..\..\src\framework\Utilities\TypeList.h"
 				>
 			</File>
+			<File
+				RelativePath="..\..\src\framework\Utilities\UnorderedMap.h"
+				>
+			</File>
 			<Filter
 				Name="CountedReference"
 				>
diff --git a/win/VC80/genrevision.vcproj b/win/VC80/genrevision.vcproj
index 2faca2dd6..43683025d 100644
--- a/win/VC80/genrevision.vcproj
+++ b/win/VC80/genrevision.vcproj
@@ -44,7 +44,6 @@
 				Name="VCCLCompilerTool"
 				Optimization="0"
 				PreprocessorDefinitions="WIN32;_DEBUG;_CONSOLE"
-				MinimalRebuild="true"
 				BasicRuntimeChecks="3"
 				RuntimeLibrary="3"
 				UsePrecompiledHeader="0"
@@ -85,6 +84,83 @@
 			<Tool
 				Name="VCAppVerifierTool"
 			/>
+			<Tool
+				Name="VCWebDeploymentTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+		<Configuration
+			Name="Debug|x64"
+			OutputDirectory=".\genrevision__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\genrevision__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="1"
+			CharacterSet="1"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+				TargetEnvironment="3"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				Optimization="0"
+				PreprocessorDefinitions="WIN32;_DEBUG;_CONSOLE"
+				BasicRuntimeChecks="3"
+				RuntimeLibrary="3"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				DebugInformationFormat="3"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				LinkIncremental="2"
+				GenerateDebugInformation="true"
+				SubSystem="1"
+				TargetMachine="17"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCWebDeploymentTool"
+			/>
 			<Tool
 				Name="VCPostBuildEventTool"
 			/>
@@ -115,7 +191,6 @@
 				Name="VCCLCompilerTool"
 				Optimization="0"
 				PreprocessorDefinitions="WIN32;_DEBUG;_CONSOLE"
-				MinimalRebuild="true"
 				BasicRuntimeChecks="3"
 				RuntimeLibrary="3"
 				UsePrecompiledHeader="0"
@@ -156,6 +231,83 @@
 			<Tool
 				Name="VCAppVerifierTool"
 			/>
+			<Tool
+				Name="VCWebDeploymentTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+		<Configuration
+			Name="Debug_NoPCH|x64"
+			OutputDirectory=".\genrevision__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\genrevision__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="1"
+			CharacterSet="1"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+				TargetEnvironment="3"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				Optimization="0"
+				PreprocessorDefinitions="WIN32;_DEBUG;_CONSOLE"
+				BasicRuntimeChecks="3"
+				RuntimeLibrary="3"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				DebugInformationFormat="3"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				LinkIncremental="2"
+				GenerateDebugInformation="true"
+				SubSystem="1"
+				TargetMachine="17"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCWebDeploymentTool"
+			/>
 			<Tool
 				Name="VCPostBuildEventTool"
 			/>
@@ -231,148 +383,7 @@
 				Name="VCAppVerifierTool"
 			/>
 			<Tool
-				Name="VCPostBuildEventTool"
-			/>
-		</Configuration>
-		<Configuration
-			Name="Debug|x64"
-			OutputDirectory=".\genrevision__$(PlatformName)_$(ConfigurationName)"
-			IntermediateDirectory=".\genrevision__$(PlatformName)_$(ConfigurationName)"
-			ConfigurationType="1"
-			CharacterSet="1"
-			>
-			<Tool
-				Name="VCPreBuildEventTool"
-			/>
-			<Tool
-				Name="VCCustomBuildTool"
-			/>
-			<Tool
-				Name="VCXMLDataGeneratorTool"
-			/>
-			<Tool
-				Name="VCWebServiceProxyGeneratorTool"
-			/>
-			<Tool
-				Name="VCMIDLTool"
-				TargetEnvironment="3"
-			/>
-			<Tool
-				Name="VCCLCompilerTool"
-				Optimization="0"
-				PreprocessorDefinitions="WIN32;_DEBUG;_CONSOLE"
-				MinimalRebuild="true"
-				BasicRuntimeChecks="3"
-				RuntimeLibrary="3"
-				UsePrecompiledHeader="0"
-				WarningLevel="3"
-				DebugInformationFormat="3"
-			/>
-			<Tool
-				Name="VCManagedResourceCompilerTool"
-			/>
-			<Tool
-				Name="VCResourceCompilerTool"
-			/>
-			<Tool
-				Name="VCPreLinkEventTool"
-			/>
-			<Tool
-				Name="VCLinkerTool"
-				LinkIncremental="2"
-				GenerateDebugInformation="true"
-				SubSystem="1"
-				TargetMachine="17"
-			/>
-			<Tool
-				Name="VCALinkTool"
-			/>
-			<Tool
-				Name="VCManifestTool"
-			/>
-			<Tool
-				Name="VCXDCMakeTool"
-			/>
-			<Tool
-				Name="VCBscMakeTool"
-			/>
-			<Tool
-				Name="VCFxCopTool"
-			/>
-			<Tool
-				Name="VCAppVerifierTool"
-			/>
-			<Tool
-				Name="VCPostBuildEventTool"
-			/>
-		</Configuration>
-		<Configuration
-			Name="Debug_NoPCH|x64"
-			OutputDirectory=".\genrevision__$(PlatformName)_$(ConfigurationName)"
-			IntermediateDirectory=".\genrevision__$(PlatformName)_$(ConfigurationName)"
-			ConfigurationType="1"
-			CharacterSet="1"
-			>
-			<Tool
-				Name="VCPreBuildEventTool"
-			/>
-			<Tool
-				Name="VCCustomBuildTool"
-			/>
-			<Tool
-				Name="VCXMLDataGeneratorTool"
-			/>
-			<Tool
-				Name="VCWebServiceProxyGeneratorTool"
-			/>
-			<Tool
-				Name="VCMIDLTool"
-				TargetEnvironment="3"
-			/>
-			<Tool
-				Name="VCCLCompilerTool"
-				Optimization="0"
-				PreprocessorDefinitions="WIN32;_DEBUG;_CONSOLE"
-				MinimalRebuild="true"
-				BasicRuntimeChecks="3"
-				RuntimeLibrary="3"
-				UsePrecompiledHeader="0"
-				WarningLevel="3"
-				DebugInformationFormat="3"
-			/>
-			<Tool
-				Name="VCManagedResourceCompilerTool"
-			/>
-			<Tool
-				Name="VCResourceCompilerTool"
-			/>
-			<Tool
-				Name="VCPreLinkEventTool"
-			/>
-			<Tool
-				Name="VCLinkerTool"
-				LinkIncremental="2"
-				GenerateDebugInformation="true"
-				SubSystem="1"
-				TargetMachine="17"
-			/>
-			<Tool
-				Name="VCALinkTool"
-			/>
-			<Tool
-				Name="VCManifestTool"
-			/>
-			<Tool
-				Name="VCXDCMakeTool"
-			/>
-			<Tool
-				Name="VCBscMakeTool"
-			/>
-			<Tool
-				Name="VCFxCopTool"
-			/>
-			<Tool
-				Name="VCAppVerifierTool"
+				Name="VCWebDeploymentTool"
 			/>
 			<Tool
 				Name="VCPostBuildEventTool"
@@ -449,6 +460,9 @@
 			<Tool
 				Name="VCAppVerifierTool"
 			/>
+			<Tool
+				Name="VCWebDeploymentTool"
+			/>
 			<Tool
 				Name="VCPostBuildEventTool"
 			/>
diff --git a/win/VC80/mangosd.vcproj b/win/VC80/mangosd.vcproj
index 02c78396f..0e1a61c4a 100644
--- a/win/VC80/mangosd.vcproj
+++ b/win/VC80/mangosd.vcproj
@@ -78,11 +78,11 @@
 			<Tool
 				Name="VCLinkerTool"
 				AdditionalOptions="/MACHINE:I386"
-				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib msvcrt.lib"
+				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib framework.lib msvcrt.lib"
 				OutputFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.exe"
 				LinkIncremental="1"
 				SuppressStartupBanner="true"
-				AdditionalLibraryDirectories="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)"
+				AdditionalLibraryDirectories="&quot;..\..\dep\lib\$(PlatformName)_$(ConfigurationName)&quot;;&quot;.\framework__$(PlatformName)_$(ConfigurationName)&quot;"
 				GenerateDebugInformation="true"
 				ProgramDatabaseFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.pdb"
 				GenerateMapFile="true"
@@ -118,6 +118,107 @@
 				CommandLine="copy ..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\*.dll ..\..\bin\$(PlatformName)_$(ConfigurationName)"
 			/>
 		</Configuration>
+		<Configuration
+			Name="Release|x64"
+			OutputDirectory=".\mangosd__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\mangosd__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="1"
+			InheritedPropertySheets="$(VCInstallDir)VCProjectDefaults\UpgradeFromVC71.vsprops"
+			UseOfMFC="0"
+			ATLMinimizesCRunTimeLibraryUsage="false"
+			CharacterSet="2"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+				TargetEnvironment="3"
+				TypeLibraryName=".\..\..\bin\Release\mangosd.tlb"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions="/MP"
+				InlineFunctionExpansion="1"
+				AdditionalIncludeDirectories="..\..\dep\include,..\..\src\framework,..\..\src\shared,..\..\src\game,..\..\src\mangosd;..\..\dep\ACE_wrappers"
+				PreprocessorDefinitions="VERSION=&quot;0.14.0-DEV&quot;;WIN32;NDEBUG;_CONSOLE;ENABLE_CLI;_SECURE_SCL=0"
+				StringPooling="true"
+				RuntimeLibrary="2"
+				EnableFunctionLevelLinking="true"
+				EnableEnhancedInstructionSet="0"
+				RuntimeTypeInfo="true"
+				PrecompiledHeaderFile=".\mangosd__$(PlatformName)_$(ConfigurationName)\mangosd.pch"
+				AssemblerListingLocation=".\mangosd__$(PlatformName)_$(ConfigurationName)\"
+				ObjectFile=".\mangosd__$(PlatformName)_$(ConfigurationName)\"
+				ProgramDataBaseFileName=".\mangosd__$(PlatformName)_$(ConfigurationName)\"
+				WarningLevel="3"
+				SuppressStartupBanner="true"
+				Detect64BitPortabilityProblems="true"
+				DebugInformationFormat="3"
+				CompileAs="0"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+				PreprocessorDefinitions="NDEBUG"
+				Culture="1033"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib framework.lib msvcrt.lib"
+				OutputFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.exe"
+				LinkIncremental="1"
+				SuppressStartupBanner="true"
+				AdditionalLibraryDirectories="&quot;..\..\dep\lib\$(PlatformName)_$(ConfigurationName)&quot;;&quot;.\framework__$(PlatformName)_$(ConfigurationName)&quot;"
+				GenerateDebugInformation="true"
+				ProgramDatabaseFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.pdb"
+				GenerateMapFile="true"
+				MapFileName="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.map"
+				SubSystem="1"
+				LinkTimeCodeGeneration="0"
+				ImportLibrary="$(OutDir)\mangosd.lib"
+				TargetMachine="17"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCWebDeploymentTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+				CommandLine="copy ..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\*.dll ..\..\bin\$(PlatformName)_$(ConfigurationName)"
+			/>
+		</Configuration>
 		<Configuration
 			Name="Debug|Win32"
 			OutputDirectory=".\mangosd__$(PlatformName)_$(ConfigurationName)"
@@ -178,12 +279,12 @@
 			<Tool
 				Name="VCLinkerTool"
 				AdditionalOptions="/MACHINE:I386"
-				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib msvcrtd.lib"
+				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib framework.lib msvcrtd.lib"
 				OutputFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.exe"
 				Version=""
 				LinkIncremental="1"
 				SuppressStartupBanner="true"
-				AdditionalLibraryDirectories="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)"
+				AdditionalLibraryDirectories="&quot;..\..\dep\lib\$(PlatformName)_$(ConfigurationName)&quot;;&quot;.\framework__$(PlatformName)_$(ConfigurationName)&quot;"
 				GenerateDebugInformation="true"
 				ProgramDatabaseFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.pdb"
 				GenerateMapFile="true"
@@ -219,6 +320,107 @@
 				CommandLine="copy ..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\*.dll ..\..\bin\$(PlatformName)_$(ConfigurationName)"
 			/>
 		</Configuration>
+		<Configuration
+			Name="Debug|x64"
+			OutputDirectory=".\mangosd__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\mangosd__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="1"
+			InheritedPropertySheets="$(VCInstallDir)VCProjectDefaults\UpgradeFromVC71.vsprops"
+			UseOfMFC="0"
+			ATLMinimizesCRunTimeLibraryUsage="false"
+			CharacterSet="2"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+				TargetEnvironment="3"
+				TypeLibraryName=".\mangosd__$(PlatformName)_$(ConfigurationName)\mangosd.tlb"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions="/MP"
+				Optimization="0"
+				AdditionalIncludeDirectories="..\..\dep\include,..\..\src\framework,..\..\src\shared,..\..\src\game,..\..\src\mangosd;..\..\dep\ACE_wrappers"
+				PreprocessorDefinitions="VERSION=&quot;0.14.0-DEV&quot;;WIN32;_DEBUG;MANGOS_DEBUG;_CONSOLE;ENABLE_CLI"
+				IgnoreStandardIncludePath="false"
+				BasicRuntimeChecks="3"
+				RuntimeLibrary="3"
+				RuntimeTypeInfo="true"
+				PrecompiledHeaderFile=".\mangosd__$(PlatformName)_$(ConfigurationName)\mangosd.pch"
+				AssemblerListingLocation=".\mangosd__$(PlatformName)_$(ConfigurationName)\"
+				ObjectFile=".\mangosd__$(PlatformName)_$(ConfigurationName)\"
+				ProgramDataBaseFileName=".\mangosd__$(PlatformName)_$(ConfigurationName)\"
+				WarningLevel="3"
+				SuppressStartupBanner="true"
+				Detect64BitPortabilityProblems="true"
+				DebugInformationFormat="3"
+				CompileAs="0"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+				PreprocessorDefinitions="_DEBUG"
+				Culture="1033"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib framework.lib msvcrtd.lib"
+				OutputFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.exe"
+				Version=""
+				LinkIncremental="1"
+				SuppressStartupBanner="true"
+				AdditionalLibraryDirectories="&quot;..\..\dep\lib\$(PlatformName)_$(ConfigurationName)&quot;;&quot;.\framework__$(PlatformName)_$(ConfigurationName)&quot;"
+				GenerateDebugInformation="true"
+				ProgramDatabaseFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.pdb"
+				GenerateMapFile="true"
+				MapFileName="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.map"
+				SubSystem="1"
+				ImportLibrary="$(OutDir)\mangosd.lib"
+				TargetMachine="17"
+				FixedBaseAddress="1"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCWebDeploymentTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+				CommandLine="copy ..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\*.dll ..\..\bin\$(PlatformName)_$(ConfigurationName)"
+			/>
+		</Configuration>
 		<Configuration
 			Name="Debug_NoPCH|Win32"
 			OutputDirectory=".\mangosd__$(PlatformName)_$(ConfigurationName)"
@@ -279,12 +481,12 @@
 			<Tool
 				Name="VCLinkerTool"
 				AdditionalOptions="/MACHINE:I386"
-				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib msvcrtd.lib"
+				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib framework.lib msvcrtd.lib"
 				OutputFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.exe"
 				Version=""
 				LinkIncremental="1"
 				SuppressStartupBanner="true"
-				AdditionalLibraryDirectories="..\..\dep\lib\$(PlatformName)_debug"
+				AdditionalLibraryDirectories="&quot;..\..\dep\lib\$(PlatformName)_$(ConfigurationName)&quot;;&quot;.\framework__$(PlatformName)_$(ConfigurationName)&quot;"
 				GenerateDebugInformation="true"
 				ProgramDatabaseFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.pdb"
 				GenerateMapFile="true"
@@ -320,208 +522,6 @@
 				CommandLine="copy ..\..\dep\lib\$(PlatformName)_debug\*.dll ..\..\bin\$(PlatformName)_$(ConfigurationName)"
 			/>
 		</Configuration>
-		<Configuration
-			Name="Release|x64"
-			OutputDirectory=".\mangosd__$(PlatformName)_$(ConfigurationName)"
-			IntermediateDirectory=".\mangosd__$(PlatformName)_$(ConfigurationName)"
-			ConfigurationType="1"
-			InheritedPropertySheets="$(VCInstallDir)VCProjectDefaults\UpgradeFromVC71.vsprops"
-			UseOfMFC="0"
-			ATLMinimizesCRunTimeLibraryUsage="false"
-			CharacterSet="2"
-			>
-			<Tool
-				Name="VCPreBuildEventTool"
-			/>
-			<Tool
-				Name="VCCustomBuildTool"
-			/>
-			<Tool
-				Name="VCXMLDataGeneratorTool"
-			/>
-			<Tool
-				Name="VCWebServiceProxyGeneratorTool"
-			/>
-			<Tool
-				Name="VCMIDLTool"
-				TargetEnvironment="3"
-				TypeLibraryName=".\..\..\bin\Release\mangosd.tlb"
-			/>
-			<Tool
-				Name="VCCLCompilerTool"
-				AdditionalOptions="/MP"
-				InlineFunctionExpansion="1"
-				AdditionalIncludeDirectories="..\..\dep\include,..\..\src\framework,..\..\src\shared,..\..\src\game,..\..\src\mangosd;..\..\dep\ACE_wrappers"
-				PreprocessorDefinitions="VERSION=&quot;0.15.0-DEV&quot;;WIN32;NDEBUG;_CONSOLE;ENABLE_CLI;_SECURE_SCL=0"
-				StringPooling="true"
-				RuntimeLibrary="2"
-				EnableFunctionLevelLinking="true"
-				EnableEnhancedInstructionSet="0"
-				RuntimeTypeInfo="true"
-				PrecompiledHeaderFile=".\mangosd__$(PlatformName)_$(ConfigurationName)\mangosd.pch"
-				AssemblerListingLocation=".\mangosd__$(PlatformName)_$(ConfigurationName)\"
-				ObjectFile=".\mangosd__$(PlatformName)_$(ConfigurationName)\"
-				ProgramDataBaseFileName=".\mangosd__$(PlatformName)_$(ConfigurationName)\"
-				WarningLevel="3"
-				SuppressStartupBanner="true"
-				Detect64BitPortabilityProblems="true"
-				DebugInformationFormat="3"
-				CompileAs="0"
-			/>
-			<Tool
-				Name="VCManagedResourceCompilerTool"
-			/>
-			<Tool
-				Name="VCResourceCompilerTool"
-				PreprocessorDefinitions="NDEBUG"
-				Culture="1033"
-			/>
-			<Tool
-				Name="VCPreLinkEventTool"
-			/>
-			<Tool
-				Name="VCLinkerTool"
-				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib msvcrt.lib"
-				OutputFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.exe"
-				LinkIncremental="1"
-				SuppressStartupBanner="true"
-				AdditionalLibraryDirectories="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)"
-				GenerateDebugInformation="true"
-				ProgramDatabaseFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.pdb"
-				GenerateMapFile="true"
-				MapFileName="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.map"
-				SubSystem="1"
-				LinkTimeCodeGeneration="0"
-				ImportLibrary="$(OutDir)\mangosd.lib"
-				TargetMachine="17"
-			/>
-			<Tool
-				Name="VCALinkTool"
-			/>
-			<Tool
-				Name="VCManifestTool"
-			/>
-			<Tool
-				Name="VCXDCMakeTool"
-			/>
-			<Tool
-				Name="VCBscMakeTool"
-			/>
-			<Tool
-				Name="VCFxCopTool"
-			/>
-			<Tool
-				Name="VCAppVerifierTool"
-			/>
-			<Tool
-				Name="VCWebDeploymentTool"
-			/>
-			<Tool
-				Name="VCPostBuildEventTool"
-				CommandLine="copy ..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\*.dll ..\..\bin\$(PlatformName)_$(ConfigurationName)"
-			/>
-		</Configuration>
-		<Configuration
-			Name="Debug|x64"
-			OutputDirectory=".\mangosd__$(PlatformName)_$(ConfigurationName)"
-			IntermediateDirectory=".\mangosd__$(PlatformName)_$(ConfigurationName)"
-			ConfigurationType="1"
-			InheritedPropertySheets="$(VCInstallDir)VCProjectDefaults\UpgradeFromVC71.vsprops"
-			UseOfMFC="0"
-			ATLMinimizesCRunTimeLibraryUsage="false"
-			CharacterSet="2"
-			>
-			<Tool
-				Name="VCPreBuildEventTool"
-			/>
-			<Tool
-				Name="VCCustomBuildTool"
-			/>
-			<Tool
-				Name="VCXMLDataGeneratorTool"
-			/>
-			<Tool
-				Name="VCWebServiceProxyGeneratorTool"
-			/>
-			<Tool
-				Name="VCMIDLTool"
-				TargetEnvironment="3"
-				TypeLibraryName=".\mangosd__$(PlatformName)_$(ConfigurationName)\mangosd.tlb"
-			/>
-			<Tool
-				Name="VCCLCompilerTool"
-				AdditionalOptions="/MP"
-				Optimization="0"
-				AdditionalIncludeDirectories="..\..\dep\include,..\..\src\framework,..\..\src\shared,..\..\src\game,..\..\src\mangosd;..\..\dep\ACE_wrappers"
-				PreprocessorDefinitions="VERSION=&quot;0.15.0-DEV&quot;;WIN32;_DEBUG;MANGOS_DEBUG;_CONSOLE;ENABLE_CLI"
-				IgnoreStandardIncludePath="false"
-				BasicRuntimeChecks="3"
-				RuntimeLibrary="3"
-				RuntimeTypeInfo="true"
-				PrecompiledHeaderFile=".\mangosd__$(PlatformName)_$(ConfigurationName)\mangosd.pch"
-				AssemblerListingLocation=".\mangosd__$(PlatformName)_$(ConfigurationName)\"
-				ObjectFile=".\mangosd__$(PlatformName)_$(ConfigurationName)\"
-				ProgramDataBaseFileName=".\mangosd__$(PlatformName)_$(ConfigurationName)\"
-				WarningLevel="3"
-				SuppressStartupBanner="true"
-				Detect64BitPortabilityProblems="true"
-				DebugInformationFormat="3"
-				CompileAs="0"
-			/>
-			<Tool
-				Name="VCManagedResourceCompilerTool"
-			/>
-			<Tool
-				Name="VCResourceCompilerTool"
-				PreprocessorDefinitions="_DEBUG"
-				Culture="1033"
-			/>
-			<Tool
-				Name="VCPreLinkEventTool"
-			/>
-			<Tool
-				Name="VCLinkerTool"
-				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib msvcrtd.lib"
-				OutputFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.exe"
-				Version=""
-				LinkIncremental="1"
-				SuppressStartupBanner="true"
-				AdditionalLibraryDirectories="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)"
-				GenerateDebugInformation="true"
-				ProgramDatabaseFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.pdb"
-				GenerateMapFile="true"
-				MapFileName="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.map"
-				SubSystem="1"
-				ImportLibrary="$(OutDir)\mangosd.lib"
-				TargetMachine="17"
-				FixedBaseAddress="1"
-			/>
-			<Tool
-				Name="VCALinkTool"
-			/>
-			<Tool
-				Name="VCManifestTool"
-			/>
-			<Tool
-				Name="VCXDCMakeTool"
-			/>
-			<Tool
-				Name="VCBscMakeTool"
-			/>
-			<Tool
-				Name="VCFxCopTool"
-			/>
-			<Tool
-				Name="VCAppVerifierTool"
-			/>
-			<Tool
-				Name="VCWebDeploymentTool"
-			/>
-			<Tool
-				Name="VCPostBuildEventTool"
-				CommandLine="copy ..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\*.dll ..\..\bin\$(PlatformName)_$(ConfigurationName)"
-			/>
-		</Configuration>
 		<Configuration
 			Name="Debug_NoPCH|x64"
 			OutputDirectory=".\mangosd__$(PlatformName)_$(ConfigurationName)"
@@ -582,12 +582,12 @@
 			/>
 			<Tool
 				Name="VCLinkerTool"
-				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib msvcrtd.lib"
+				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib framework.lib msvcrtd.lib"
 				OutputFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.exe"
 				Version=""
 				LinkIncremental="1"
 				SuppressStartupBanner="true"
-				AdditionalLibraryDirectories="..\..\dep\lib\$(PlatformName)_debug"
+				AdditionalLibraryDirectories="&quot;..\..\dep\lib\$(PlatformName)_$(ConfigurationName)&quot;;&quot;.\framework__$(PlatformName)_$(ConfigurationName)&quot;"
 				GenerateDebugInformation="true"
 				ProgramDatabaseFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.pdb"
 				GenerateMapFile="true"
diff --git a/win/VC80/script.vcproj b/win/VC80/script.vcproj
index 811740ae6..fe71c05fd 100644
--- a/win/VC80/script.vcproj
+++ b/win/VC80/script.vcproj
@@ -46,7 +46,6 @@
 				Optimization="0"
 				AdditionalIncludeDirectories="..\..\src\shared;..\..\src\framework;..\..\dep\include;..\..\dep\include;..\..\src\framework;..\..\src\shared;..\..\src\realmd;..\..\dep\ACE_wrappers"
 				PreprocessorDefinitions="WIN32;_DEBUG;MANGOS_DEBUG;_WINDOWS;_USRDLL;SCRIPT"
-				MinimalRebuild="true"
 				BasicRuntimeChecks="3"
 				RuntimeLibrary="3"
 				UsePrecompiledHeader="0"
@@ -129,7 +128,6 @@
 				Optimization="0"
 				AdditionalIncludeDirectories="..\..\src\shared;..\..\src\framework;..\..\dep\include;..\..\dep\include;..\..\src\framework;..\..\src\shared;..\..\src\realmd;..\..\dep\ACE_wrappers"
 				PreprocessorDefinitions="WIN32;_DEBUG;MANGOS_DEBUG;_WINDOWS;_USRDLL;SCRIPT"
-				MinimalRebuild="true"
 				BasicRuntimeChecks="3"
 				RuntimeLibrary="3"
 				UsePrecompiledHeader="0"
@@ -211,7 +209,6 @@
 				Optimization="0"
 				AdditionalIncludeDirectories="..\..\src\shared;..\..\src\framework;..\..\dep\include;..\..\dep\include;..\..\src\framework;..\..\src\shared;..\..\src\realmd;..\..\dep\ACE_wrappers"
 				PreprocessorDefinitions="WIN32;_DEBUG;MANGOS_DEBUG;_WINDOWS;_USRDLL;SCRIPT"
-				MinimalRebuild="true"
 				BasicRuntimeChecks="3"
 				RuntimeLibrary="3"
 				UsePrecompiledHeader="0"
@@ -294,7 +291,6 @@
 				Optimization="0"
 				AdditionalIncludeDirectories="..\..\src\shared;..\..\src\framework;..\..\dep\include;..\..\dep\include;..\..\src\framework;..\..\src\shared;..\..\src\realmd;..\..\dep\ACE_wrappers"
 				PreprocessorDefinitions="WIN32;_DEBUG;MANGOS_DEBUG;_WINDOWS;_USRDLL;SCRIPT"
-				MinimalRebuild="true"
 				BasicRuntimeChecks="3"
 				RuntimeLibrary="3"
 				UsePrecompiledHeader="0"
diff --git a/win/VC80/tbb.vcproj b/win/VC80/tbb.vcproj
new file mode 100644
index 000000000..56aa744e1
--- /dev/null
+++ b/win/VC80/tbb.vcproj
@@ -0,0 +1,1194 @@
+<?xml version="1.0" encoding="windows-1251"?>
+<VisualStudioProject
+	ProjectType="Visual C++"
+	Version="8,00"
+	Name="tbb"
+	ProjectGUID="{F62787DD-1327-448B-9818-030062BCFAA5}"
+	RootNamespace="tbb"
+	Keyword="Win32Proj"
+	>
+	<Platforms>
+		<Platform
+			Name="Win32"
+		/>
+		<Platform
+			Name="x64"
+		/>
+	</Platforms>
+	<ToolFiles>
+		<DefaultToolFile
+			FileName="masm.rules"
+		/>
+	</ToolFiles>
+	<Configurations>
+		<Configuration
+			Name="Debug|Win32"
+			OutputDirectory=".\tbb__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\tbb__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MDd /Od /Ob0 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /DDO_ITT_ANNOTATE /D_USE_RTM_VERSION /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /W4 /Wp64 /I../../src /I../../include"
+				Optimization="0"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				PreprocessorDefinitions=""
+				MinimalRebuild="true"
+				BasicRuntimeChecks="3"
+				RuntimeLibrary="3"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbb.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbb_debug.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				TargetMachine="1"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCWebDeploymentTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+		<Configuration
+			Name="Debug|x64"
+			OutputDirectory=".\tbb__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\tbb__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+				TargetEnvironment="3"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MDd /Od /Ob0 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /DDO_ITT_ANNOTATE /D_USE_RTM_VERSION /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /W4 /Wp64 /I../../src /I../../include"
+				Optimization="0"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				PreprocessorDefinitions=""
+				MinimalRebuild="true"
+				BasicRuntimeChecks="3"
+				RuntimeLibrary="3"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+				ShowIncludes="false"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbb.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbb_debug.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				TargetMachine="17"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCWebDeploymentTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+		<Configuration
+			Name="Release|Win32"
+			OutputDirectory=".\tbb__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\tbb__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			WholeProgramOptimization="1"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MD /O2 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /Oy /D_USE_RTM_VERSION /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /W4 /Wp64 /I../../src /I../../include"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				PreprocessorDefinitions=""
+				RuntimeLibrary="2"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbb.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbb.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				OptimizeReferences="2"
+				EnableCOMDATFolding="2"
+				TargetMachine="1"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCWebDeploymentTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+		<Configuration
+			Name="Release|x64"
+			OutputDirectory=".\tbb__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\tbb__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			WholeProgramOptimization="1"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+				TargetEnvironment="3"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MD /O2 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /D_USE_RTM_VERSION /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /W4 /Wp64 /I../../src /I../../include"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				PreprocessorDefinitions=""
+				RuntimeLibrary="2"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO /DEF:$(IntDir)\tbb.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbb.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				OptimizeReferences="2"
+				EnableCOMDATFolding="2"
+				TargetMachine="17"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCWebDeploymentTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+		<Configuration
+			Name="Debug_NoPCH|Win32"
+			OutputDirectory="$(ConfigurationName)"
+			IntermediateDirectory="$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MDd /Od /Ob0 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /DDO_ITT_ANNOTATE /D_USE_RTM_VERSION /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /W4 /Wp64 /I../../src /I../../include"
+				Optimization="0"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				PreprocessorDefinitions=""
+				MinimalRebuild="true"
+				BasicRuntimeChecks="3"
+				RuntimeLibrary="3"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbb.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbb_debug.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				TargetMachine="1"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCWebDeploymentTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+		<Configuration
+			Name="Debug_NoPCH|x64"
+			OutputDirectory="$(PlatformName)\$(ConfigurationName)"
+			IntermediateDirectory="$(PlatformName)\$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+				TargetEnvironment="3"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MDd /Od /Ob0 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /DDO_ITT_ANNOTATE /D_USE_RTM_VERSION /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /W4 /Wp64 /I../../src /I../../include"
+				Optimization="0"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				PreprocessorDefinitions=""
+				MinimalRebuild="true"
+				BasicRuntimeChecks="3"
+				RuntimeLibrary="3"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+				ShowIncludes="false"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbb.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbb_debug.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				TargetMachine="17"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCWebDeploymentTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+	</Configurations>
+	<References>
+	</References>
+	<Files>
+		<Filter
+			Name="Source Files"
+			Filter="cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx"
+			UniqueIdentifier="{4FC737F1-C7A5-4376-A066-2A32D752A2FF}"
+			>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\ia32-masm\atomic_support.asm"
+				>
+				<FileConfiguration
+					Name="Debug|Win32"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|Win32"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\intel64-masm\atomic_support.asm"
+				>
+				<FileConfiguration
+					Name="Debug|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="building atomic_support.obj"
+						CommandLine="ml64 /Fo&quot;..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj&quot; /DUSE_FRAME_POINTER /DEM64T=1 /c /Zi ../../dep/tbb/src/tbb/intel64-masm/atomic_support.asm&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="building atomic_support.obj"
+						CommandLine="ml64 /Fo&quot;..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj&quot;  /DEM64T=1 /c /Zi ../../dep/tbb/src/tbb/intel64-masm/atomic_support.asm&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="building atomic_support.obj"
+						CommandLine="ml64 /Fo..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj&quot; /DUSE_FRAME_POINTER /DEM64T=1 /c /Zi ../../dep/tbb/src/tbb/intel64-masm/atomic_support.asm&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj"
+					/>
+				</FileConfiguration>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\cache_aligned_allocator.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\concurrent_hash_map.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\concurrent_queue.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\old\concurrent_queue_v2.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\concurrent_vector.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\old\concurrent_vector_v2.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\dynamic_link.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\itt_notify.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\itt_notify_proxy.c"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\ia32-masm\lock_byte.asm"
+				>
+				<FileConfiguration
+					Name="Debug|Win32"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|Win32"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\mutex.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\pipeline.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\private_server.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\queuing_mutex.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\queuing_rw_mutex.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\recursive_mutex.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\rml\client\rml_tbb.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\spin_mutex.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\spin_rw_mutex.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\old\spin_rw_mutex_v2.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\task.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\tbb_misc.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\tbb_thread.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\win32-tbb-export.def"
+				>
+				<FileConfiguration
+					Name="Debug|Win32"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../dep/tbb/src /I../../dep/tbb/include &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs=".\tbb__$(PlatformName)_$(ConfigurationName)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|Win32"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../dep/tbb/src /I../../dep/tbb/include &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs=".\tbb__$(PlatformName)_$(ConfigurationName)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|Win32"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../dep/tbb/src /I../../dep/tbb/include &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs=".\tbb__$(PlatformName)_$(ConfigurationName)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbb.def"
+					/>
+				</FileConfiguration>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\win64-tbb-export.def"
+				>
+				<FileConfiguration
+					Name="Debug|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../dep/tbb/src /I../../dep/tbb/include &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../dep/tbb/src /I../../dep/tbb/include &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../dep/tbb/src /I../../dep/tbb/include &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\tbb.def"
+					/>
+				</FileConfiguration>
+			</File>
+		</Filter>
+		<Filter
+			Name="Header Files"
+			Filter="h;hpp;hxx;hm;inl;inc;xsd"
+			UniqueIdentifier="{93995380-89BD-4b04-88EB-625FBE52EBFB}"
+			>
+			<File
+				RelativePath="..\..\include\tbb\_tbb_windef.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\aligned_space.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\atomic.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\blocked_range.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\blocked_range2d.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\blocked_range3d.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\cache_aligned_allocator.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\concurrent_hash_map.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\concurrent_queue.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\old\concurrent_queue_v2.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\concurrent_vector.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\old\concurrent_vector_v2.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\dynamic_link.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\enumerable_thread_specific.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\gate.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\machine\ibm_aix51.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\itt_notify.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\machine\linux_common.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\machine\linux_ia32.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\machine\linux_ia64.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\machine\linux_intel64.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\machine\mac_ppc.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\null_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\null_rw_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_do.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_for.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_reduce.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_scan.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_sort.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_while.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\partitioner.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\pipeline.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\queuing_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\queuing_rw_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\recursive_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\scalable_allocator.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\spin_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\spin_rw_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\old\spin_rw_mutex_v2.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\task.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\task_scheduler_init.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\task_scheduler_observer.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_allocator.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\tbb_assert_impl.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_exception.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_machine.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\tbb_misc.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_profiling.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_stddef.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_thread.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\tbb_version.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbbmalloc_proxy.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tick_count.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\machine\windows_ia32.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\machine\windows_intel64.h"
+				>
+			</File>
+		</Filter>
+		<Filter
+			Name="Resource Files"
+			Filter="rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav"
+			UniqueIdentifier="{67DA6AB6-F800-4c08-8B7A-83BB121AAD01}"
+			>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\tbb_resource.rc"
+				>
+				<FileConfiguration
+					Name="Debug|Win32"
+					>
+					<Tool
+						Name="VCResourceCompilerTool"
+						AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug|x64"
+					>
+					<Tool
+						Name="VCResourceCompilerTool"
+						AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|Win32"
+					>
+					<Tool
+						Name="VCResourceCompilerTool"
+						AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|x64"
+					>
+					<Tool
+						Name="VCResourceCompilerTool"
+						AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|Win32"
+					>
+					<Tool
+						Name="VCResourceCompilerTool"
+						AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|x64"
+					>
+					<Tool
+						Name="VCResourceCompilerTool"
+						AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"
+					/>
+				</FileConfiguration>
+			</File>
+		</Filter>
+	</Files>
+	<Globals>
+	</Globals>
+</VisualStudioProject>
diff --git a/win/VC80/tbbmalloc.vcproj b/win/VC80/tbbmalloc.vcproj
new file mode 100644
index 000000000..1a890c291
--- /dev/null
+++ b/win/VC80/tbbmalloc.vcproj
@@ -0,0 +1,1056 @@
+<?xml version="1.0" encoding="windows-1251"?>
+<VisualStudioProject
+	ProjectType="Visual C++"
+	Version="8,00"
+	Name="tbbmalloc"
+	ProjectGUID="{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}"
+	RootNamespace="tbbmalloc"
+	Keyword="Win32Proj"
+	>
+	<Platforms>
+		<Platform
+			Name="Win32"
+		/>
+		<Platform
+			Name="x64"
+		/>
+	</Platforms>
+	<ToolFiles>
+		<DefaultToolFile
+			FileName="masm.rules"
+		/>
+	</ToolFiles>
+	<Configurations>
+		<Configuration
+			Name="Debug|Win32"
+			OutputDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MDd /Od /Ob0 /Zi /EHs- /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc"
+				Optimization="0"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				PreprocessorDefinitions=""
+				ExceptionHandling="0"
+				BasicRuntimeChecks="0"
+				RuntimeLibrary="3"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				SuppressStartupBanner="false"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+				DisableSpecificWarnings="4244;4267"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbbmalloc.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbbmalloc_debug.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				TargetMachine="1"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCWebDeploymentTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+		<Configuration
+			Name="Debug|x64"
+			OutputDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+				TargetEnvironment="3"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MDd /Od /Ob0 /Zi /EHs- /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc"
+				Optimization="0"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				ExceptionHandling="0"
+				BasicRuntimeChecks="0"
+				RuntimeLibrary="3"
+				TreatWChar_tAsBuiltInType="true"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				SuppressStartupBanner="false"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+				DisableSpecificWarnings="4244;4267"
+				ShowIncludes="false"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbbmalloc.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbbmalloc_debug.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				TargetMachine="17"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCWebDeploymentTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+		<Configuration
+			Name="Release|Win32"
+			OutputDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			WholeProgramOptimization="1"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MD /O2 /Zi /EHs- /Zc:forScope /Zc:wchar_t /Oy /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				PreprocessorDefinitions=""
+				ExceptionHandling="0"
+				RuntimeLibrary="2"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				SuppressStartupBanner="false"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+				DisableSpecificWarnings="4244;4267"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbbmalloc.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbbmalloc.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				OptimizeReferences="2"
+				EnableCOMDATFolding="2"
+				TargetMachine="1"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCWebDeploymentTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+		<Configuration
+			Name="Release|x64"
+			OutputDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			WholeProgramOptimization="1"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+				TargetEnvironment="3"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MD /O2 /Zi /EHs- /Zc:forScope /Zc:wchar_t /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				PreprocessorDefinitions=""
+				ExceptionHandling="0"
+				RuntimeLibrary="2"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				SuppressStartupBanner="false"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+				DisableSpecificWarnings="4244;4267"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO /DEF:$(IntDir)\tbbmalloc.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbbmalloc.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				OptimizeReferences="2"
+				EnableCOMDATFolding="2"
+				TargetMachine="17"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCWebDeploymentTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+		<Configuration
+			Name="Debug_NoPCH|Win32"
+			OutputDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MDd /Od /Ob0 /Zi /EHs- /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc"
+				Optimization="0"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				PreprocessorDefinitions=""
+				ExceptionHandling="0"
+				BasicRuntimeChecks="0"
+				RuntimeLibrary="3"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				SuppressStartupBanner="false"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+				DisableSpecificWarnings="4244;4267"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbbmalloc.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbbmalloc_debug.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				TargetMachine="1"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCWebDeploymentTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+		<Configuration
+			Name="Debug_NoPCH|x64"
+			OutputDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+				TargetEnvironment="3"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MDd /Od /Ob0 /Zi /EHs- /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc"
+				Optimization="0"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				ExceptionHandling="0"
+				BasicRuntimeChecks="0"
+				RuntimeLibrary="3"
+				TreatWChar_tAsBuiltInType="true"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				SuppressStartupBanner="false"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+				DisableSpecificWarnings="4244;4267"
+				ShowIncludes="false"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbbmalloc.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbbmalloc_debug.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				TargetMachine="17"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCWebDeploymentTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+	</Configurations>
+	<References>
+	</References>
+	<Files>
+		<Filter
+			Name="Source Files"
+			Filter="cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx"
+			UniqueIdentifier="{4FC737F1-C7A5-4376-A066-2A32D752A2FF}"
+			>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\ia32-masm\atomic_support.asm"
+				>
+				<FileConfiguration
+					Name="Debug|Win32"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|Win32"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\em64t-masm\atomic_support.asm"
+				>
+				<FileConfiguration
+					Name="Debug|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="building atomic_support.obj"
+						CommandLine="ml64 /Fo&quot;..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj&quot; /DUSE_FRAME_POINTER /DEM64T=1 /c /Zi ../../dep/tbb/src/tbb/intel64-masm/atomic_support.asm&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="building atomic_support.obj"
+						CommandLine="ml64 /Fo&quot;..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj&quot;  /DEM64T=1 /c /Zi ../../dep/tbb/src/tbb/intel64-masm/atomic_support.asm&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="building atomic_support.obj"
+						CommandLine="ml64 /Fo&quot;..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj&quot; /DUSE_FRAME_POINTER /DEM64T=1 /c /Zi ../../dep/tbb/src/tbb/intel64-masm/atomic_support.asm&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj"
+					/>
+				</FileConfiguration>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\dynamic_link.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\itt_notify_proxy.c"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\ia32-masm\lock_byte.asm"
+				>
+				<FileConfiguration
+					Name="Debug|Win32"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|Win32"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbbmalloc\MemoryAllocator.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\tbb_misc.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbbmalloc\tbbmalloc.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbbmalloc\win32-tbbmalloc-export.def"
+				>
+				<FileConfiguration
+					Name="Debug|Win32"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbbmalloc.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbbmalloc/win32-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbbmalloc.def&#x0D;&#x0A;"
+						Outputs=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)\tbbmalloc.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbbmalloc/win32-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbbmalloc.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbbmalloc.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|Win32"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbbmalloc.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbbmalloc/win32-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbbmalloc.def&#x0D;&#x0A;"
+						Outputs=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)\tbbmalloc.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|Win32"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbbmalloc.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbbmalloc/win32-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbbmalloc.def&#x0D;&#x0A;"
+						Outputs=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)\tbbmalloc.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbbmalloc/win32-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbbmalloc.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbbmalloc.def"
+					/>
+				</FileConfiguration>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbbmalloc\win64-tbbmalloc-export.def"
+				>
+				<FileConfiguration
+					Name="Debug|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbbmalloc.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbbmalloc/win64-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbbmalloc.def&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\tbbmalloc.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbbmalloc.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbbmalloc/win64-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbbmalloc.def&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\tbbmalloc.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbbmalloc.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbbmalloc/win64-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbbmalloc.def&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\tbbmalloc.def"
+					/>
+				</FileConfiguration>
+			</File>
+		</Filter>
+		<Filter
+			Name="Header Files"
+			Filter="h;hpp;hxx;hm;inl;inc;xsd"
+			UniqueIdentifier="{93995380-89BD-4b04-88EB-625FBE52EBFB}"
+			>
+			<File
+				RelativePath="..\..\include\tbb\_tbb_windef.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\aligned_space.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\atomic.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\blocked_range.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\blocked_range2d.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\blocked_range3d.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\cache_aligned_allocator.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\concurrent_hash_map.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\concurrent_queue.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\concurrent_vector.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbbmalloc\Customize.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\enumerable_thread_specific.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbbmalloc\LifoQueue.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbbmalloc\MapMemory.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\null_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\null_rw_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_do.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_for.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_reduce.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_scan.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_sort.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_while.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\partitioner.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\pipeline.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\queuing_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\queuing_rw_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\recursive_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\scalable_allocator.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\spin_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\spin_rw_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbbmalloc\Statistics.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\task.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\task_scheduler_init.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\task_scheduler_observer.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_allocator.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_exception.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_machine.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_profiling.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_stddef.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_thread.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbbmalloc_proxy.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tick_count.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbbmalloc\TypeDefinitions.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\machine\windows_em64t.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\machine\windows_ia32.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\machine\windows_ia32_inline.h"
+				>
+			</File>
+		</Filter>
+		<Filter
+			Name="Resource Files"
+			Filter="rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav"
+			UniqueIdentifier="{67DA6AB6-F800-4c08-8B7A-83BB121AAD01}"
+			>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbbmalloc\tbbmalloc.rc"
+				>
+			</File>
+		</Filter>
+	</Files>
+	<Globals>
+	</Globals>
+</VisualStudioProject>
diff --git a/win/VC90/ACE_vc9.vcproj b/win/VC90/ACE_vc9.vcproj
index 5fbd7008c..70a9f1429 100644
--- a/win/VC90/ACE_vc9.vcproj
+++ b/win/VC90/ACE_vc9.vcproj
@@ -6,6 +6,7 @@
 	ProjectGUID="{BD537C9A-FECA-1BAD-6757-8A6348EA12C8}"
 	RootNamespace="ACE"
 	Keyword="Win32Proj"
+	TargetFrameworkVersion="0"
 	>
 	<Platforms>
 		<Platform
@@ -59,7 +60,7 @@
 				WarningLevel="3"
 				Detect64BitPortabilityProblems="false"
 				DebugInformationFormat="3"
-				DisableSpecificWarnings="4355"
+				DisableSpecificWarnings="4355;4244"
 			/>
 			<Tool
 				Name="VCManagedResourceCompilerTool"
@@ -149,7 +150,7 @@
 				WarningLevel="3"
 				Detect64BitPortabilityProblems="false"
 				DebugInformationFormat="3"
-				DisableSpecificWarnings="4355"
+				DisableSpecificWarnings="4355;4244"
 			/>
 			<Tool
 				Name="VCManagedResourceCompilerTool"
@@ -237,7 +238,7 @@
 				RuntimeTypeInfo="true"
 				WarningLevel="3"
 				Detect64BitPortabilityProblems="false"
-				DisableSpecificWarnings="4355"
+				DisableSpecificWarnings="4355;4244"
 			/>
 			<Tool
 				Name="VCManagedResourceCompilerTool"
@@ -326,7 +327,7 @@
 				RuntimeTypeInfo="true"
 				WarningLevel="3"
 				Detect64BitPortabilityProblems="false"
-				DisableSpecificWarnings="4355"
+				DisableSpecificWarnings="4355;4244"
 			/>
 			<Tool
 				Name="VCManagedResourceCompilerTool"
@@ -419,7 +420,7 @@
 				WarningLevel="3"
 				Detect64BitPortabilityProblems="false"
 				DebugInformationFormat="3"
-				DisableSpecificWarnings="4355"
+				DisableSpecificWarnings="4355;4244"
 			/>
 			<Tool
 				Name="VCManagedResourceCompilerTool"
@@ -509,7 +510,7 @@
 				WarningLevel="3"
 				Detect64BitPortabilityProblems="false"
 				DebugInformationFormat="3"
-				DisableSpecificWarnings="4355"
+				DisableSpecificWarnings="4355;4244"
 			/>
 			<Tool
 				Name="VCManagedResourceCompilerTool"
@@ -3037,11 +3038,11 @@
 				>
 			</File>
 			<File
-				RelativePath="..\..\dep\ACE_wrappers\ace\os_include\os_errno.h"
+				RelativePath="..\..\dep\ACE_wrappers\ace\OS_Errno.h"
 				>
 			</File>
 			<File
-				RelativePath="..\..\dep\ACE_wrappers\ace\OS_Errno.h"
+				RelativePath="..\..\dep\ACE_wrappers\ace\os_include\os_errno.h"
 				>
 			</File>
 			<File
@@ -3421,11 +3422,11 @@
 				>
 			</File>
 			<File
-				RelativePath="..\..\dep\ACE_wrappers\ace\OS_String.h"
+				RelativePath="..\..\dep\ACE_wrappers\ace\os_include\os_string.h"
 				>
 			</File>
 			<File
-				RelativePath="..\..\dep\ACE_wrappers\ace\os_include\os_string.h"
+				RelativePath="..\..\dep\ACE_wrappers\ace\OS_String.h"
 				>
 			</File>
 			<File
@@ -3469,11 +3470,11 @@
 				>
 			</File>
 			<File
-				RelativePath="..\..\dep\ACE_wrappers\ace\os_include\sys\os_time.h"
+				RelativePath="..\..\dep\ACE_wrappers\ace\os_include\os_time.h"
 				>
 			</File>
 			<File
-				RelativePath="..\..\dep\ACE_wrappers\ace\os_include\os_time.h"
+				RelativePath="..\..\dep\ACE_wrappers\ace\os_include\sys\os_time.h"
 				>
 			</File>
 			<File
diff --git a/win/VC90/framework.vcproj b/win/VC90/framework.vcproj
index 1d4ef0f81..cda4de128 100644
--- a/win/VC90/framework.vcproj
+++ b/win/VC90/framework.vcproj
@@ -531,6 +531,10 @@
 				RelativePath="..\..\src\framework\Policies\CreationPolicy.h"
 				>
 			</File>
+			<File
+				RelativePath="..\..\src\framework\Policies\MemoryManagement.cpp"
+				>
+			</File>
 			<File
 				RelativePath="..\..\src\framework\Policies\ObjectLifeTime.cpp"
 				>
diff --git a/win/VC90/mangosd.vcproj b/win/VC90/mangosd.vcproj
index 8098e9919..2f2c6f5ea 100644
--- a/win/VC90/mangosd.vcproj
+++ b/win/VC90/mangosd.vcproj
@@ -80,11 +80,11 @@
 			<Tool
 				Name="VCLinkerTool"
 				AdditionalOptions="/MACHINE:I386"
-				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib MSVCPRT.LIB msvcrt.lib"
+				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib framework.lib msvcrt.lib"
 				OutputFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.exe"
 				LinkIncremental="1"
 				SuppressStartupBanner="true"
-				AdditionalLibraryDirectories="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)"
+				AdditionalLibraryDirectories="&quot;..\..\dep\lib\$(PlatformName)_$(ConfigurationName)&quot;;&quot;.\framework__$(PlatformName)_$(ConfigurationName)&quot;"
 				GenerateDebugInformation="true"
 				ProgramDatabaseFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.pdb"
 				GenerateMapFile="true"
@@ -180,11 +180,11 @@
 			/>
 			<Tool
 				Name="VCLinkerTool"
-				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib MSVCPRT.LIB msvcrt.lib"
+				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib framework.lib msvcrt.lib"
 				OutputFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.exe"
 				LinkIncremental="1"
 				SuppressStartupBanner="true"
-				AdditionalLibraryDirectories="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)"
+				AdditionalLibraryDirectories="&quot;..\..\dep\lib\$(PlatformName)_$(ConfigurationName)&quot;;&quot;.\framework__$(PlatformName)_$(ConfigurationName)&quot;"
 				GenerateDebugInformation="true"
 				ProgramDatabaseFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.pdb"
 				GenerateMapFile="true"
@@ -280,12 +280,12 @@
 			<Tool
 				Name="VCLinkerTool"
 				AdditionalOptions="/MACHINE:I386"
-				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib MSVCPRTD.LIB msvcrtd.lib"
+				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib framework.lib msvcrtd.lib"
 				OutputFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.exe"
 				Version=""
 				LinkIncremental="2"
 				SuppressStartupBanner="true"
-				AdditionalLibraryDirectories="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)"
+				AdditionalLibraryDirectories="&quot;..\..\dep\lib\$(PlatformName)_$(ConfigurationName)&quot;;&quot;.\framework__$(PlatformName)_$(ConfigurationName)&quot;"
 				GenerateDebugInformation="true"
 				ProgramDatabaseFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.pdb"
 				GenerateMapFile="true"
@@ -381,12 +381,12 @@
 			/>
 			<Tool
 				Name="VCLinkerTool"
-				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib MSVCPRTD.LIB msvcrtd.lib"
+				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib framework.lib msvcrtd.lib"
 				OutputFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.exe"
 				Version=""
 				LinkIncremental="1"
 				SuppressStartupBanner="true"
-				AdditionalLibraryDirectories="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)"
+				AdditionalLibraryDirectories="&quot;..\..\dep\lib\$(PlatformName)_$(ConfigurationName)&quot;;&quot;.\framework__$(PlatformName)_$(ConfigurationName)&quot;"
 				GenerateDebugInformation="true"
 				ProgramDatabaseFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.pdb"
 				GenerateMapFile="true"
@@ -482,12 +482,12 @@
 			<Tool
 				Name="VCLinkerTool"
 				AdditionalOptions="/MACHINE:I386"
-				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib MSVCPRTD.LIB msvcrtd.lib"
+				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib framework.lib msvcrtd.lib"
 				OutputFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.exe"
 				Version=""
 				LinkIncremental="1"
 				SuppressStartupBanner="true"
-				AdditionalLibraryDirectories="..\..\dep\lib\$(PlatformName)_debug"
+				AdditionalLibraryDirectories="&quot;..\..\dep\lib\$(PlatformName)_$(ConfigurationName)&quot;;&quot;.\framework__$(PlatformName)_$(ConfigurationName)&quot;"
 				GenerateDebugInformation="true"
 				ProgramDatabaseFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.pdb"
 				GenerateMapFile="true"
@@ -583,12 +583,12 @@
 			/>
 			<Tool
 				Name="VCLinkerTool"
-				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib MSVCPRTD.LIB msvcrtd.lib"
+				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib framework.lib msvcrtd.lib"
 				OutputFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.exe"
 				Version=""
 				LinkIncremental="1"
 				SuppressStartupBanner="true"
-				AdditionalLibraryDirectories="..\..\dep\lib\$(PlatformName)_debug"
+				AdditionalLibraryDirectories="&quot;..\..\dep\lib\$(PlatformName)_$(ConfigurationName)&quot;;&quot;.\framework__$(PlatformName)_$(ConfigurationName)&quot;"
 				GenerateDebugInformation="true"
 				ProgramDatabaseFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\mangosd.pdb"
 				GenerateMapFile="true"
diff --git a/win/VC90/realmd.vcproj b/win/VC90/realmd.vcproj
index 923eaa57b..2de72347f 100644
--- a/win/VC90/realmd.vcproj
+++ b/win/VC90/realmd.vcproj
@@ -80,7 +80,7 @@
 			<Tool
 				Name="VCLinkerTool"
 				AdditionalOptions="/MACHINE:I386"
-				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib MSVCPRT.LIB msvcrt.lib"
+				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib"
 				OutputFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\realmd.exe"
 				LinkIncremental="1"
 				SuppressStartupBanner="true"
@@ -177,7 +177,7 @@
 			/>
 			<Tool
 				Name="VCLinkerTool"
-				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib MSVCPRT.LIB msvcrt.lib"
+				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib"
 				OutputFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\realmd.exe"
 				LinkIncremental="1"
 				SuppressStartupBanner="true"
@@ -274,7 +274,7 @@
 			<Tool
 				Name="VCLinkerTool"
 				AdditionalOptions="/MACHINE:I386"
-				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib MSVCPRTD.LIB msvcrtd.lib"
+				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib"
 				OutputFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\realmd.exe"
 				LinkIncremental="1"
 				SuppressStartupBanner="true"
@@ -372,7 +372,7 @@
 			/>
 			<Tool
 				Name="VCLinkerTool"
-				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib MSVCPRTD.LIB msvcrtd.lib"
+				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib"
 				OutputFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\realmd.exe"
 				LinkIncremental="1"
 				SuppressStartupBanner="true"
@@ -470,7 +470,7 @@
 			<Tool
 				Name="VCLinkerTool"
 				AdditionalOptions="/MACHINE:I386"
-				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib MSVCPRTD.LIB msvcrtd.lib"
+				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib"
 				OutputFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\realmd.exe"
 				LinkIncremental="1"
 				SuppressStartupBanner="true"
@@ -568,7 +568,7 @@
 			/>
 			<Tool
 				Name="VCLinkerTool"
-				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib MSVCPRTD.LIB msvcrtd.lib"
+				AdditionalDependencies="libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib advapi32.lib dbghelp.lib"
 				OutputFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\realmd.exe"
 				LinkIncremental="1"
 				SuppressStartupBanner="true"
diff --git a/win/VC90/script.vcproj b/win/VC90/script.vcproj
index 0b07bdc3c..3f52a9ff3 100644
--- a/win/VC90/script.vcproj
+++ b/win/VC90/script.vcproj
@@ -315,7 +315,7 @@
 				AdditionalDependencies="mangosd.lib zlib.lib libmySQL.lib libeay32.lib ws2_32.lib winmm.lib odbc32.lib odbccp32.lib ace.lib"
 				OutputFile="..\..\bin\$(PlatformName)_$(ConfigurationName)\MaNGOSScript.dll"
 				LinkIncremental="1"
-				AdditionalLibraryDirectories=";.\mangosd__$(PlatformName)_$(ConfigurationName);.\zlib__$(PlatformName)_$(ConfigurationName);..\..\dep\lib\$(PlatformName)_$(ConfigurationName)"
+				AdditionalLibraryDirectories="&quot;.\mangosd__$(PlatformName)_$(ConfigurationName)&quot;;&quot;.\zlib__$(PlatformName)_$(ConfigurationName)&quot;;&quot;..\..\dep\lib\$(PlatformName)_$(ConfigurationName)&quot;"
 				GenerateDebugInformation="false"
 				SubSystem="2"
 				OptimizeReferences="2"
diff --git a/win/VC90/tbb.vcproj b/win/VC90/tbb.vcproj
new file mode 100644
index 000000000..8a26e9a4f
--- /dev/null
+++ b/win/VC90/tbb.vcproj
@@ -0,0 +1,1189 @@
+<?xml version="1.0" encoding="windows-1251"?>
+<VisualStudioProject
+	ProjectType="Visual C++"
+	Version="9,00"
+	Name="tbb"
+	ProjectGUID="{F62787DD-1327-448B-9818-030062BCFAA5}"
+	RootNamespace="tbb"
+	Keyword="Win32Proj"
+	TargetFrameworkVersion="131072"
+	>
+	<Platforms>
+		<Platform
+			Name="Win32"
+		/>
+		<Platform
+			Name="x64"
+		/>
+	</Platforms>
+	<ToolFiles>
+		<DefaultToolFile
+			FileName="masm.rules"
+		/>
+	</ToolFiles>
+	<Configurations>
+		<Configuration
+			Name="Debug|Win32"
+			OutputDirectory=".\tbb__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\tbb__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MDd /Od /Ob0 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /DDO_ITT_ANNOTATE /D_USE_RTM_VERSION /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /W4 /Wp64 /I../../src /I../../include"
+				Optimization="0"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				PreprocessorDefinitions=""
+				MinimalRebuild="true"
+				BasicRuntimeChecks="3"
+				RuntimeLibrary="3"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbb.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbb_debug.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				RandomizedBaseAddress="1"
+				DataExecutionPrevention="0"
+				TargetMachine="1"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+		<Configuration
+			Name="Debug|x64"
+			OutputDirectory=".\tbb__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\tbb__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+				TargetEnvironment="3"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MDd /Od /Ob0 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /DDO_ITT_ANNOTATE /D_USE_RTM_VERSION /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /W4 /Wp64 /I../../src /I../../include"
+				Optimization="0"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				PreprocessorDefinitions=""
+				MinimalRebuild="true"
+				BasicRuntimeChecks="3"
+				RuntimeLibrary="3"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+				ShowIncludes="false"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbb.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbb_debug.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				RandomizedBaseAddress="1"
+				DataExecutionPrevention="0"
+				TargetMachine="17"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+		<Configuration
+			Name="Release|Win32"
+			OutputDirectory=".\tbb__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\tbb__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			WholeProgramOptimization="1"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MD /O2 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /Oy /D_USE_RTM_VERSION /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /W4 /Wp64 /I../../src /I../../include"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				PreprocessorDefinitions=""
+				RuntimeLibrary="2"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbb.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbb.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				OptimizeReferences="2"
+				EnableCOMDATFolding="2"
+				RandomizedBaseAddress="1"
+				DataExecutionPrevention="0"
+				TargetMachine="1"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+		<Configuration
+			Name="Release|x64"
+			OutputDirectory=".\tbb__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\tbb__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			WholeProgramOptimization="1"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+				TargetEnvironment="3"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MD /O2 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /D_USE_RTM_VERSION /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /W4 /Wp64 /I../../src /I../../include"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				PreprocessorDefinitions=""
+				RuntimeLibrary="2"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO /DEF:$(IntDir)\tbb.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbb.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				OptimizeReferences="2"
+				EnableCOMDATFolding="2"
+				RandomizedBaseAddress="1"
+				DataExecutionPrevention="0"
+				TargetMachine="17"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+		<Configuration
+			Name="Debug_NoPCH|Win32"
+			OutputDirectory="$(ConfigurationName)"
+			IntermediateDirectory="$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MDd /Od /Ob0 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /DDO_ITT_ANNOTATE /D_USE_RTM_VERSION /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /W4 /Wp64 /I../../src /I../../include"
+				Optimization="0"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				PreprocessorDefinitions=""
+				MinimalRebuild="true"
+				BasicRuntimeChecks="3"
+				RuntimeLibrary="3"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbb.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbb_debug.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				RandomizedBaseAddress="1"
+				DataExecutionPrevention="0"
+				TargetMachine="1"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+		<Configuration
+			Name="Debug_NoPCH|x64"
+			OutputDirectory="$(PlatformName)\$(ConfigurationName)"
+			IntermediateDirectory="$(PlatformName)\$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+				TargetEnvironment="3"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MDd /Od /Ob0 /Zi /EHsc /GR /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /DDO_ITT_ANNOTATE /D_USE_RTM_VERSION /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /W4 /Wp64 /I../../src /I../../include"
+				Optimization="0"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				PreprocessorDefinitions=""
+				MinimalRebuild="true"
+				BasicRuntimeChecks="3"
+				RuntimeLibrary="3"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+				ShowIncludes="false"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbb.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbb_debug.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				RandomizedBaseAddress="1"
+				DataExecutionPrevention="0"
+				TargetMachine="17"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+	</Configurations>
+	<References>
+	</References>
+	<Files>
+		<Filter
+			Name="Source Files"
+			Filter="cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx"
+			UniqueIdentifier="{4FC737F1-C7A5-4376-A066-2A32D752A2FF}"
+			>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\intel64-masm\atomic_support.asm"
+				>
+				<FileConfiguration
+					Name="Debug|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="building atomic_support.obj"
+						CommandLine="ml64 /Fo&quot;..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj&quot; /DUSE_FRAME_POINTER /DEM64T=1 /c /Zi ../../dep/tbb/src/tbb/intel64-masm/atomic_support.asm&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="building atomic_support.obj"
+						CommandLine="ml64 /Fo&quot;..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj&quot;  /DEM64T=1 /c /Zi ../../dep/tbb/src/tbb/intel64-masm/atomic_support.asm&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="building atomic_support.obj"
+						CommandLine="ml64 /Fo..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj&quot; /DUSE_FRAME_POINTER /DEM64T=1 /c /Zi ../../dep/tbb/src/tbb/intel64-masm/atomic_support.asm&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj"
+					/>
+				</FileConfiguration>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\ia32-masm\atomic_support.asm"
+				>
+				<FileConfiguration
+					Name="Debug|Win32"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|Win32"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\cache_aligned_allocator.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\concurrent_hash_map.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\concurrent_queue.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\old\concurrent_queue_v2.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\concurrent_vector.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\old\concurrent_vector_v2.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\dynamic_link.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\itt_notify.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\itt_notify_proxy.c"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\ia32-masm\lock_byte.asm"
+				>
+				<FileConfiguration
+					Name="Debug|Win32"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|Win32"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\mutex.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\pipeline.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\private_server.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\queuing_mutex.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\queuing_rw_mutex.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\recursive_mutex.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\rml\client\rml_tbb.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\spin_mutex.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\spin_rw_mutex.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\old\spin_rw_mutex_v2.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\task.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\tbb_misc.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\tbb_thread.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\win32-tbb-export.def"
+				>
+				<FileConfiguration
+					Name="Debug|Win32"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../dep/tbb/src /I../../dep/tbb/include &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs=".\tbb__$(PlatformName)_$(ConfigurationName)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|Win32"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../dep/tbb/src /I../../dep/tbb/include &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs=".\tbb__$(PlatformName)_$(ConfigurationName)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|Win32"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../dep/tbb/src /I../../dep/tbb/include &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs=".\tbb__$(PlatformName)_$(ConfigurationName)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbb.def"
+					/>
+				</FileConfiguration>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\win64-tbb-export.def"
+				>
+				<FileConfiguration
+					Name="Debug|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../dep/tbb/src /I../../dep/tbb/include &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../dep/tbb/src /I../../dep/tbb/include &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../dep/tbb/src /I../../dep/tbb/include &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\tbb.def"
+					/>
+				</FileConfiguration>
+			</File>
+		</Filter>
+		<Filter
+			Name="Header Files"
+			Filter="h;hpp;hxx;hm;inl;inc;xsd"
+			UniqueIdentifier="{93995380-89BD-4b04-88EB-625FBE52EBFB}"
+			>
+			<File
+				RelativePath="..\..\include\tbb\_tbb_windef.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\aligned_space.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\atomic.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\blocked_range.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\blocked_range2d.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\blocked_range3d.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\cache_aligned_allocator.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\concurrent_hash_map.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\concurrent_queue.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\old\concurrent_queue_v2.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\concurrent_vector.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\old\concurrent_vector_v2.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\dynamic_link.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\enumerable_thread_specific.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\gate.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\machine\ibm_aix51.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\itt_notify.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\machine\linux_common.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\machine\linux_ia32.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\machine\linux_ia64.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\machine\linux_intel64.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\machine\mac_ppc.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\null_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\null_rw_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_do.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_for.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_reduce.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_scan.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_sort.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_while.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\partitioner.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\pipeline.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\queuing_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\queuing_rw_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\recursive_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\scalable_allocator.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\spin_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\spin_rw_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\old\spin_rw_mutex_v2.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\task.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\task_scheduler_init.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\task_scheduler_observer.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_allocator.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\tbb_assert_impl.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_exception.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_machine.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\tbb_misc.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_profiling.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_stddef.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_thread.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\tbb_version.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbbmalloc_proxy.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tick_count.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\machine\windows_ia32.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\machine\windows_intel64.h"
+				>
+			</File>
+		</Filter>
+		<Filter
+			Name="Resource Files"
+			Filter="rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav"
+			UniqueIdentifier="{67DA6AB6-F800-4c08-8B7A-83BB121AAD01}"
+			>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\tbb_resource.rc"
+				>
+				<FileConfiguration
+					Name="Debug|Win32"
+					>
+					<Tool
+						Name="VCResourceCompilerTool"
+						AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug|x64"
+					>
+					<Tool
+						Name="VCResourceCompilerTool"
+						AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|Win32"
+					>
+					<Tool
+						Name="VCResourceCompilerTool"
+						AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|x64"
+					>
+					<Tool
+						Name="VCResourceCompilerTool"
+						AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|Win32"
+					>
+					<Tool
+						Name="VCResourceCompilerTool"
+						AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|x64"
+					>
+					<Tool
+						Name="VCResourceCompilerTool"
+						AdditionalOptions="/I../../src /I../../include /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400"
+					/>
+				</FileConfiguration>
+			</File>
+		</Filter>
+	</Files>
+	<Globals>
+	</Globals>
+</VisualStudioProject>
diff --git a/win/VC90/tbbmalloc.vcproj b/win/VC90/tbbmalloc.vcproj
new file mode 100644
index 000000000..54f0968c5
--- /dev/null
+++ b/win/VC90/tbbmalloc.vcproj
@@ -0,0 +1,1051 @@
+<?xml version="1.0" encoding="windows-1251"?>
+<VisualStudioProject
+	ProjectType="Visual C++"
+	Version="9,00"
+	Name="tbbmalloc"
+	ProjectGUID="{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}"
+	RootNamespace="tbbmalloc"
+	Keyword="Win32Proj"
+	TargetFrameworkVersion="131072"
+	>
+	<Platforms>
+		<Platform
+			Name="Win32"
+		/>
+		<Platform
+			Name="x64"
+		/>
+	</Platforms>
+	<ToolFiles>
+		<DefaultToolFile
+			FileName="masm.rules"
+		/>
+	</ToolFiles>
+	<Configurations>
+		<Configuration
+			Name="Debug|Win32"
+			OutputDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MDd /Od /Ob0 /Zi /EHs- /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc"
+				Optimization="0"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				PreprocessorDefinitions=""
+				ExceptionHandling="0"
+				BasicRuntimeChecks="0"
+				RuntimeLibrary="3"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				SuppressStartupBanner="false"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+				DisableSpecificWarnings="4244;4267"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbbmalloc.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbbmalloc_debug.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				RandomizedBaseAddress="1"
+				DataExecutionPrevention="0"
+				TargetMachine="1"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+		<Configuration
+			Name="Debug|x64"
+			OutputDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+				TargetEnvironment="3"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MDd /Od /Ob0 /Zi /EHs- /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc"
+				Optimization="0"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				ExceptionHandling="0"
+				BasicRuntimeChecks="0"
+				RuntimeLibrary="3"
+				TreatWChar_tAsBuiltInType="true"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				SuppressStartupBanner="false"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+				DisableSpecificWarnings="4244;4267"
+				ShowIncludes="false"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbbmalloc.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbbmalloc_debug.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				RandomizedBaseAddress="1"
+				DataExecutionPrevention="0"
+				TargetMachine="17"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+		<Configuration
+			Name="Release|Win32"
+			OutputDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			WholeProgramOptimization="1"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MD /O2 /Zi /EHs- /Zc:forScope /Zc:wchar_t /Oy /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				PreprocessorDefinitions=""
+				ExceptionHandling="0"
+				RuntimeLibrary="2"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				SuppressStartupBanner="false"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+				DisableSpecificWarnings="4244;4267"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbbmalloc.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbbmalloc.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				OptimizeReferences="2"
+				EnableCOMDATFolding="2"
+				RandomizedBaseAddress="1"
+				DataExecutionPrevention="0"
+				TargetMachine="1"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+		<Configuration
+			Name="Release|x64"
+			OutputDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			WholeProgramOptimization="1"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+				TargetEnvironment="3"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MD /O2 /Zi /EHs- /Zc:forScope /Zc:wchar_t /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				PreprocessorDefinitions=""
+				ExceptionHandling="0"
+				RuntimeLibrary="2"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				SuppressStartupBanner="false"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+				DisableSpecificWarnings="4244;4267"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO /DEF:$(IntDir)\tbbmalloc.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbbmalloc.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				OptimizeReferences="2"
+				EnableCOMDATFolding="2"
+				RandomizedBaseAddress="1"
+				DataExecutionPrevention="0"
+				TargetMachine="17"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+		<Configuration
+			Name="Debug_NoPCH|Win32"
+			OutputDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MDd /Od /Ob0 /Zi /EHs- /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc"
+				Optimization="0"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				PreprocessorDefinitions=""
+				ExceptionHandling="0"
+				BasicRuntimeChecks="0"
+				RuntimeLibrary="3"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				SuppressStartupBanner="false"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+				DisableSpecificWarnings="4244;4267"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbbmalloc.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbbmalloc_debug.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				RandomizedBaseAddress="1"
+				DataExecutionPrevention="0"
+				TargetMachine="1"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+		<Configuration
+			Name="Debug_NoPCH|x64"
+			OutputDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			IntermediateDirectory=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)"
+			ConfigurationType="2"
+			CharacterSet="0"
+			>
+			<Tool
+				Name="VCPreBuildEventTool"
+			/>
+			<Tool
+				Name="VCCustomBuildTool"
+			/>
+			<Tool
+				Name="MASM"
+			/>
+			<Tool
+				Name="VCXMLDataGeneratorTool"
+			/>
+			<Tool
+				Name="VCWebServiceProxyGeneratorTool"
+			/>
+			<Tool
+				Name="VCMIDLTool"
+				TargetEnvironment="3"
+			/>
+			<Tool
+				Name="VCCLCompilerTool"
+				AdditionalOptions=" /c /MDd /Od /Ob0 /Zi /EHs- /Zc:forScope /Zc:wchar_t /DTBB_USE_DEBUG /GS- /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 /I../../src /I../../include /I../../src/tbbmalloc /I../../src/tbbmalloc"
+				Optimization="0"
+				AdditionalIncludeDirectories="..\..\dep\tbb\include;..\..\dep\tbb\src;..\..\dep\tbb\build;..\..\dep\tbb\build\vsproject"
+				ExceptionHandling="0"
+				BasicRuntimeChecks="0"
+				RuntimeLibrary="3"
+				TreatWChar_tAsBuiltInType="true"
+				UsePrecompiledHeader="0"
+				WarningLevel="3"
+				SuppressStartupBanner="false"
+				Detect64BitPortabilityProblems="false"
+				DebugInformationFormat="3"
+				DisableSpecificWarnings="4244;4267"
+				ShowIncludes="false"
+			/>
+			<Tool
+				Name="VCManagedResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCResourceCompilerTool"
+			/>
+			<Tool
+				Name="VCPreLinkEventTool"
+			/>
+			<Tool
+				Name="VCLinkerTool"
+				AdditionalOptions="/nologo /DLL /MAP /DEBUG /fixed:no /INCREMENTAL:NO  /DEF:$(IntDir)\tbbmalloc.def"
+				OutputFile="..\..\dep\lib\$(PlatformName)_$(ConfigurationName)\tbbmalloc_debug.dll"
+				LinkIncremental="1"
+				GenerateDebugInformation="true"
+				SubSystem="2"
+				RandomizedBaseAddress="1"
+				DataExecutionPrevention="0"
+				TargetMachine="17"
+			/>
+			<Tool
+				Name="VCALinkTool"
+			/>
+			<Tool
+				Name="VCManifestTool"
+			/>
+			<Tool
+				Name="VCXDCMakeTool"
+			/>
+			<Tool
+				Name="VCBscMakeTool"
+			/>
+			<Tool
+				Name="VCFxCopTool"
+			/>
+			<Tool
+				Name="VCAppVerifierTool"
+			/>
+			<Tool
+				Name="VCPostBuildEventTool"
+			/>
+		</Configuration>
+	</Configurations>
+	<References>
+	</References>
+	<Files>
+		<Filter
+			Name="Source Files"
+			Filter="cpp;c;cc;cxx;def;odl;idl;hpj;bat;asm;asmx"
+			UniqueIdentifier="{4FC737F1-C7A5-4376-A066-2A32D752A2FF}"
+			>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\em64t-masm\atomic_support.asm"
+				>
+				<FileConfiguration
+					Name="Debug|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="building atomic_support.obj"
+						CommandLine="ml64 /Fo&quot;..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj&quot; /DUSE_FRAME_POINTER /DEM64T=1 /c /Zi ../../dep/tbb/src/tbb/intel64-masm/atomic_support.asm&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="building atomic_support.obj"
+						CommandLine="ml64 /Fo&quot;..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj&quot;  /DEM64T=1 /c /Zi ../../dep/tbb/src/tbb/intel64-masm/atomic_support.asm&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="building atomic_support.obj"
+						CommandLine="ml64 /Fo&quot;..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj&quot; /DUSE_FRAME_POINTER /DEM64T=1 /c /Zi ../../dep/tbb/src/tbb/intel64-masm/atomic_support.asm&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\atomic_support.obj"
+					/>
+				</FileConfiguration>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\ia32-masm\atomic_support.asm"
+				>
+				<FileConfiguration
+					Name="Debug|Win32"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|Win32"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\dynamic_link.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\itt_notify_proxy.c"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\ia32-masm\lock_byte.asm"
+				>
+				<FileConfiguration
+					Name="Debug|Win32"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|Win32"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="MASM"
+						AdditionalOptions="/coff /Zi"
+					/>
+				</FileConfiguration>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbbmalloc\MemoryAllocator.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbb\tbb_misc.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbbmalloc\tbbmalloc.cpp"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbbmalloc\win32-tbbmalloc-export.def"
+				>
+				<FileConfiguration
+					Name="Debug|Win32"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbbmalloc.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbbmalloc/win32-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbbmalloc.def&#x0D;&#x0A;"
+						Outputs=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)\tbbmalloc.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbbmalloc/win32-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbbmalloc.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbbmalloc.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|Win32"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbbmalloc.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbbmalloc/win32-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbbmalloc.def&#x0D;&#x0A;"
+						Outputs=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)\tbbmalloc.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbb/win32-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|Win32"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbbmalloc.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbbmalloc/win32-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbbmalloc.def&#x0D;&#x0A;"
+						Outputs=".\tbbmalloc__$(PlatformName)_$(ConfigurationName)\tbbmalloc.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|x64"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbbmalloc/win32-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbbmalloc.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbbmalloc.def"
+					/>
+				</FileConfiguration>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbbmalloc\win64-tbbmalloc-export.def"
+				>
+				<FileConfiguration
+					Name="Debug|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbbmalloc.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbbmalloc/win64-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbbmalloc.def&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\tbbmalloc.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Release|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbbmalloc.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbbmalloc/win64-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbbmalloc.def&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\tbbmalloc.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|Win32"
+					ExcludedFromBuild="true"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbb.def file"
+						CommandLine="cl /nologo /TC /EP ../../src/tbb/win64-tbb-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbb.def&#x0D;&#x0A;"
+						Outputs="$(IntDir)\tbb.def"
+					/>
+				</FileConfiguration>
+				<FileConfiguration
+					Name="Debug_NoPCH|x64"
+					>
+					<Tool
+						Name="VCCustomBuildTool"
+						Description="generating tbbmalloc.def file"
+						CommandLine="cl /nologo /TC /EP ../../dep/tbb/src/tbbmalloc/win64-tbbmalloc-export.def /DTBB_USE_DEBUG /DDO_ITT_NOTIFY /DUSE_WINTHREAD /D_CRT_SECURE_NO_DEPRECATE /D_WIN32_WINNT=0x0400 /D__TBB_BUILD=1 &gt;$(IntDir)\tbbmalloc.def&#x0D;&#x0A;"
+						Outputs="..\..\bin\$(PlatformName)_$(ConfigurationName)\tbbmalloc.def"
+					/>
+				</FileConfiguration>
+			</File>
+		</Filter>
+		<Filter
+			Name="Header Files"
+			Filter="h;hpp;hxx;hm;inl;inc;xsd"
+			UniqueIdentifier="{93995380-89BD-4b04-88EB-625FBE52EBFB}"
+			>
+			<File
+				RelativePath="..\..\include\tbb\_tbb_windef.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\aligned_space.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\atomic.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\blocked_range.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\blocked_range2d.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\blocked_range3d.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\cache_aligned_allocator.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\concurrent_hash_map.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\concurrent_queue.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\concurrent_vector.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbbmalloc\Customize.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\enumerable_thread_specific.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbbmalloc\LifoQueue.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbbmalloc\MapMemory.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\null_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\null_rw_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_do.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_for.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_reduce.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_scan.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_sort.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\parallel_while.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\partitioner.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\pipeline.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\queuing_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\queuing_rw_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\recursive_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\scalable_allocator.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\spin_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\spin_rw_mutex.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbbmalloc\Statistics.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\task.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\task_scheduler_init.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\task_scheduler_observer.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_allocator.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_exception.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_machine.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_profiling.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_stddef.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbb_thread.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tbbmalloc_proxy.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\tick_count.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbbmalloc\TypeDefinitions.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\machine\windows_em64t.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\machine\windows_ia32.h"
+				>
+			</File>
+			<File
+				RelativePath="..\..\include\tbb\machine\windows_ia32_inline.h"
+				>
+			</File>
+		</Filter>
+		<Filter
+			Name="Resource Files"
+			Filter="rc;ico;cur;bmp;dlg;rc2;rct;bin;rgs;gif;jpg;jpeg;jpe;resx;tiff;tif;png;wav"
+			UniqueIdentifier="{67DA6AB6-F800-4c08-8B7A-83BB121AAD01}"
+			>
+			<File
+				RelativePath="..\..\dep\tbb\src\tbbmalloc\tbbmalloc.rc"
+				>
+			</File>
+		</Filter>
+	</Files>
+	<Globals>
+	</Globals>
+</VisualStudioProject>
diff --git a/win/mangosdVC100.sln b/win/mangosdVC100.sln
index e2052abf3..8c5c46a57 100644
--- a/win/mangosdVC100.sln
+++ b/win/mangosdVC100.sln
@@ -12,6 +12,7 @@ Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "shared", "VC100\shared.vcxp
 		{803F488E-4C5A-4866-8D5C-1E6C03C007C2} = {803F488E-4C5A-4866-8D5C-1E6C03C007C2}
 		{BD537C9A-FECA-1BAD-6757-8A6348EA12C8} = {BD537C9A-FECA-1BAD-6757-8A6348EA12C8}
 		{8072769E-CF10-48BF-B9E1-12752A5DAC6E} = {8072769E-CF10-48BF-B9E1-12752A5DAC6E}
+		{F62787DD-1327-448B-9818-030062BCFAA5} = {F62787DD-1327-448B-9818-030062BCFAA5}
 	EndProjectSection
 EndProject
 Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "mangosd", "VC100\mangosd.vcxproj", "{A3A04E47-43A2-4C08-90B3-029CEF558594}"
@@ -25,6 +26,9 @@ EndProject
 Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "zlib", "VC100\zlib.vcxproj", "{8F1DEA42-6A5B-4B62-839D-C141A7BFACF2}"
 EndProject
 Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "framework", "VC100\framework.vcxproj", "{BF6F5D0E-33A5-4E23-9E7D-DD481B7B5B9E}"
+	ProjectSection(ProjectDependencies) = postProject
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8} = {B15F131E-328A-4D42-ADC2-9FF4CA6306D8}
+	EndProjectSection
 EndProject
 Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "realmd", "VC100\realmd.vcxproj", "{563E9905-3657-460C-AE63-0AC39D162E23}"
 	ProjectSection(ProjectDependencies) = postProject
@@ -45,6 +49,13 @@ Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "genrevision", "VC100\genrev
 EndProject
 Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "ACE_Wrappers", "VC100\ACE_vc10.vcxproj", "{BD537C9A-FECA-1BAD-6757-8A6348EA12C8}"
 EndProject
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "tbbmalloc", "VC100\tbbmalloc.vcxproj", "{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}"
+	ProjectSection(ProjectDependencies) = postProject
+		{F62787DD-1327-448B-9818-030062BCFAA5} = {F62787DD-1327-448B-9818-030062BCFAA5}
+	EndProjectSection
+EndProject
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "tbb", "VC100\tbb.vcxproj", "{F62787DD-1327-448B-9818-030062BCFAA5}"
+EndProject
 Global
 	GlobalSection(SolutionConfigurationPlatforms) = preSolution
 		Debug_NoPCH|Win32 = Debug_NoPCH|Win32
@@ -187,6 +198,30 @@ Global
 		{BD537C9A-FECA-1BAD-6757-8A6348EA12C8}.Release|Win32.Build.0 = Release|Win32
 		{BD537C9A-FECA-1BAD-6757-8A6348EA12C8}.Release|x64.ActiveCfg = Release|X64
 		{BD537C9A-FECA-1BAD-6757-8A6348EA12C8}.Release|x64.Build.0 = Release|X64
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug_NoPCH|Win32.ActiveCfg = Debug_NoPCH|Win32
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug_NoPCH|Win32.Build.0 = Debug_NoPCH|Win32
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug_NoPCH|x64.ActiveCfg = Debug_NoPCH|X64
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug_NoPCH|x64.Build.0 = Debug_NoPCH|X64
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug|Win32.ActiveCfg = Debug|Win32
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug|Win32.Build.0 = Debug|Win32
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug|x64.ActiveCfg = Debug|X64
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug|x64.Build.0 = Debug|X64
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Release|Win32.ActiveCfg = Release|Win32
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Release|Win32.Build.0 = Release|Win32
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Release|x64.ActiveCfg = Release|X64
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Release|x64.Build.0 = Release|X64
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug_NoPCH|Win32.ActiveCfg = Debug_NoPCH|Win32
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug_NoPCH|Win32.Build.0 = Debug_NoPCH|Win32
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug_NoPCH|x64.ActiveCfg = Debug_NoPCH|X64
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug_NoPCH|x64.Build.0 = Debug_NoPCH|X64
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug|Win32.ActiveCfg = Debug|Win32
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug|Win32.Build.0 = Debug|Win32
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug|x64.ActiveCfg = Debug|X64
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug|x64.Build.0 = Debug|X64
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Release|Win32.ActiveCfg = Release|Win32
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Release|Win32.Build.0 = Release|Win32
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Release|x64.ActiveCfg = Release|X64
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Release|x64.Build.0 = Release|X64
 	EndGlobalSection
 	GlobalSection(SolutionProperties) = preSolution
 		HideSolutionNode = FALSE
diff --git a/win/mangosdVC80.sln b/win/mangosdVC80.sln
index 67c2d9734..b177a1e00 100644
--- a/win/mangosdVC80.sln
+++ b/win/mangosdVC80.sln
@@ -7,19 +7,20 @@ Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "game", "VC80\game.vcproj",
 EndProject
 Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "shared", "VC80\shared.vcproj", "{90297C34-F231-4DF4-848E-A74BCC0E40ED}"
 	ProjectSection(ProjectDependencies) = postProject
-		{BF6F5D0E-33A5-4E23-9E7D-DD481B7B5B9E} = {BF6F5D0E-33A5-4E23-9E7D-DD481B7B5B9E}
-		{AD537C9A-FECA-1BAD-6757-8A6348EA12C8} = {AD537C9A-FECA-1BAD-6757-8A6348EA12C8}
-		{8072769E-CF10-48BF-B9E1-12752A5DAC6E} = {8072769E-CF10-48BF-B9E1-12752A5DAC6E}
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8} = {B15F131E-328A-4D42-ADC2-9FF4CA6306D8}
 		{803F488E-4C5A-4866-8D5C-1E6C03C007C2} = {803F488E-4C5A-4866-8D5C-1E6C03C007C2}
+		{8072769E-CF10-48BF-B9E1-12752A5DAC6E} = {8072769E-CF10-48BF-B9E1-12752A5DAC6E}
+		{AD537C9A-FECA-1BAD-6757-8A6348EA12C8} = {AD537C9A-FECA-1BAD-6757-8A6348EA12C8}
+		{BF6F5D0E-33A5-4E23-9E7D-DD481B7B5B9E} = {BF6F5D0E-33A5-4E23-9E7D-DD481B7B5B9E}
 		{8F1DEA42-6A5B-4B62-839D-C141A7BFACF2} = {8F1DEA42-6A5B-4B62-839D-C141A7BFACF2}
 	EndProjectSection
 EndProject
 Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "mangosd", "VC80\mangosd.vcproj", "{A3A04E47-43A2-4C08-90B3-029CEF558594}"
 	ProjectSection(ProjectDependencies) = postProject
-		{90297C34-F231-4DF4-848E-A74BCC0E40ED} = {90297C34-F231-4DF4-848E-A74BCC0E40ED}
-		{1DC6C4DA-A028-41F3-877D-D5400C594F88} = {1DC6C4DA-A028-41F3-877D-D5400C594F88}
-		{04BAF755-0D67-46F8-B1C6-77AE5368F3CB} = {04BAF755-0D67-46F8-B1C6-77AE5368F3CB}
 		{563E9905-3657-460C-AE63-0AC39D162E23} = {563E9905-3657-460C-AE63-0AC39D162E23}
+		{04BAF755-0D67-46F8-B1C6-77AE5368F3CB} = {04BAF755-0D67-46F8-B1C6-77AE5368F3CB}
+		{1DC6C4DA-A028-41F3-877D-D5400C594F88} = {1DC6C4DA-A028-41F3-877D-D5400C594F88}
+		{90297C34-F231-4DF4-848E-A74BCC0E40ED} = {90297C34-F231-4DF4-848E-A74BCC0E40ED}
 	EndProjectSection
 EndProject
 Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "zlib", "VC80\zlib.vcproj", "{8F1DEA42-6A5B-4B62-839D-C141A7BFACF2}"
@@ -28,8 +29,8 @@ Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "framework", "VC80\framework
 EndProject
 Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "realmd", "VC80\realmd.vcproj", "{563E9905-3657-460C-AE63-0AC39D162E23}"
 	ProjectSection(ProjectDependencies) = postProject
-		{04BAF755-0D67-46F8-B1C6-77AE5368F3CB} = {04BAF755-0D67-46F8-B1C6-77AE5368F3CB}
 		{90297C34-F231-4DF4-848E-A74BCC0E40ED} = {90297C34-F231-4DF4-848E-A74BCC0E40ED}
+		{04BAF755-0D67-46F8-B1C6-77AE5368F3CB} = {04BAF755-0D67-46F8-B1C6-77AE5368F3CB}
 	EndProjectSection
 EndProject
 Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "script", "VC80\script.vcproj", "{4205C8A9-79B7-4354-8064-F05FB9CA0C96}"
@@ -45,6 +46,13 @@ Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "genrevision", "VC80\genrevi
 EndProject
 Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "ACE_Wrappers", "VC80\ACE_vc8.vcproj", "{AD537C9A-FECA-1BAD-6757-8A6348EA12C8}"
 EndProject
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "tbb", "VC80\tbb.vcproj", "{F62787DD-1327-448B-9818-030062BCFAA5}"
+EndProject
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "tbbmalloc", "VC80\tbbmalloc.vcproj", "{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}"
+	ProjectSection(ProjectDependencies) = postProject
+		{F62787DD-1327-448B-9818-030062BCFAA5} = {F62787DD-1327-448B-9818-030062BCFAA5}
+	EndProjectSection
+EndProject
 Global
 	GlobalSection(SolutionConfigurationPlatforms) = preSolution
 		Debug_NoPCH|Win32 = Debug_NoPCH|Win32
@@ -173,8 +181,8 @@ Global
 		{803F488E-4C5A-4866-8D5C-1E6C03C007C2}.Debug|x64.Build.0 = Debug|Win32
 		{803F488E-4C5A-4866-8D5C-1E6C03C007C2}.Release|Win32.ActiveCfg = Release|Win32
 		{803F488E-4C5A-4866-8D5C-1E6C03C007C2}.Release|Win32.Build.0 = Release|Win32
-		{803F488E-4C5A-4866-8D5C-1E6C03C007C2}.Release|x64.ActiveCfg = Release|Win32
-		{803F488E-4C5A-4866-8D5C-1E6C03C007C2}.Release|x64.Build.0 = Release|Win32
+		{803F488E-4C5A-4866-8D5C-1E6C03C007C2}.Release|x64.ActiveCfg = Release|x64
+		{803F488E-4C5A-4866-8D5C-1E6C03C007C2}.Release|x64.Build.0 = Release|x64
 		{AD537C9A-FECA-1BAD-6757-8A6348EA12C8}.Debug_NoPCH|Win32.ActiveCfg = Debug_NoPCH|Win32
 		{AD537C9A-FECA-1BAD-6757-8A6348EA12C8}.Debug_NoPCH|Win32.Build.0 = Debug_NoPCH|Win32
 		{AD537C9A-FECA-1BAD-6757-8A6348EA12C8}.Debug_NoPCH|x64.ActiveCfg = Debug_NoPCH|x64
@@ -187,6 +195,30 @@ Global
 		{AD537C9A-FECA-1BAD-6757-8A6348EA12C8}.Release|Win32.Build.0 = Release|Win32
 		{AD537C9A-FECA-1BAD-6757-8A6348EA12C8}.Release|x64.ActiveCfg = Release|x64
 		{AD537C9A-FECA-1BAD-6757-8A6348EA12C8}.Release|x64.Build.0 = Release|x64
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug_NoPCH|Win32.ActiveCfg = Debug_NoPCH|Win32
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug_NoPCH|Win32.Build.0 = Debug_NoPCH|Win32
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug_NoPCH|x64.ActiveCfg = Debug_NoPCH|x64
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug_NoPCH|x64.Build.0 = Debug_NoPCH|x64
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug|Win32.ActiveCfg = Debug|Win32
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug|Win32.Build.0 = Debug|Win32
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug|x64.ActiveCfg = Debug|x64
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug|x64.Build.0 = Debug|x64
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Release|Win32.ActiveCfg = Release|Win32
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Release|Win32.Build.0 = Release|Win32
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Release|x64.ActiveCfg = Release|x64
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Release|x64.Build.0 = Release|x64
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug_NoPCH|Win32.ActiveCfg = Debug_NoPCH|Win32
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug_NoPCH|Win32.Build.0 = Debug_NoPCH|Win32
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug_NoPCH|x64.ActiveCfg = Debug_NoPCH|x64
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug_NoPCH|x64.Build.0 = Debug_NoPCH|x64
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug|Win32.ActiveCfg = Debug|Win32
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug|Win32.Build.0 = Debug|Win32
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug|x64.ActiveCfg = Debug|x64
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug|x64.Build.0 = Debug|x64
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Release|Win32.ActiveCfg = Release|Win32
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Release|Win32.Build.0 = Release|Win32
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Release|x64.ActiveCfg = Release|x64
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Release|x64.Build.0 = Release|x64
 	EndGlobalSection
 	GlobalSection(SolutionProperties) = preSolution
 		HideSolutionNode = FALSE
diff --git a/win/mangosdVC90.sln b/win/mangosdVC90.sln
index 266b2100d..b271296fd 100644
--- a/win/mangosdVC90.sln
+++ b/win/mangosdVC90.sln
@@ -25,6 +25,9 @@ EndProject
 Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "zlib", "VC90\zlib.vcproj", "{8F1DEA42-6A5B-4B62-839D-C141A7BFACF2}"
 EndProject
 Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "framework", "VC90\framework.vcproj", "{BF6F5D0E-33A5-4E23-9E7D-DD481B7B5B9E}"
+	ProjectSection(ProjectDependencies) = postProject
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8} = {B15F131E-328A-4D42-ADC2-9FF4CA6306D8}
+	EndProjectSection
 EndProject
 Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "realmd", "VC90\realmd.vcproj", "{563E9905-3657-460C-AE63-0AC39D162E23}"
 	ProjectSection(ProjectDependencies) = postProject
@@ -45,6 +48,13 @@ Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "genrevision", "VC90\genrevi
 EndProject
 Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "ACE_Wrappers", "VC90\ACE_vc9.vcproj", "{BD537C9A-FECA-1BAD-6757-8A6348EA12C8}"
 EndProject
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "tbb", "VC90\tbb.vcproj", "{F62787DD-1327-448B-9818-030062BCFAA5}"
+EndProject
+Project("{8BC9CEB8-8B4A-11D0-8D11-00A0C91BC942}") = "tbbmalloc", "VC90\tbbmalloc.vcproj", "{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}"
+	ProjectSection(ProjectDependencies) = postProject
+		{F62787DD-1327-448B-9818-030062BCFAA5} = {F62787DD-1327-448B-9818-030062BCFAA5}
+	EndProjectSection
+EndProject
 Global
 	GlobalSection(SolutionConfigurationPlatforms) = preSolution
 		Debug_NoPCH|Win32 = Debug_NoPCH|Win32
@@ -187,6 +197,30 @@ Global
 		{BD537C9A-FECA-1BAD-6757-8A6348EA12C8}.Release|Win32.Build.0 = Release|Win32
 		{BD537C9A-FECA-1BAD-6757-8A6348EA12C8}.Release|x64.ActiveCfg = Release|x64
 		{BD537C9A-FECA-1BAD-6757-8A6348EA12C8}.Release|x64.Build.0 = Release|x64
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug_NoPCH|Win32.ActiveCfg = Debug_NoPCH|Win32
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug_NoPCH|Win32.Build.0 = Debug_NoPCH|Win32
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug_NoPCH|x64.ActiveCfg = Debug_NoPCH|x64
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug_NoPCH|x64.Build.0 = Debug_NoPCH|x64
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug|Win32.ActiveCfg = Debug|Win32
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug|Win32.Build.0 = Debug|Win32
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug|x64.ActiveCfg = Debug|x64
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Debug|x64.Build.0 = Debug|x64
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Release|Win32.ActiveCfg = Release|Win32
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Release|Win32.Build.0 = Release|Win32
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Release|x64.ActiveCfg = Release|x64
+		{F62787DD-1327-448B-9818-030062BCFAA5}.Release|x64.Build.0 = Release|x64
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug_NoPCH|Win32.ActiveCfg = Debug_NoPCH|Win32
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug_NoPCH|Win32.Build.0 = Debug_NoPCH|Win32
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug_NoPCH|x64.ActiveCfg = Debug_NoPCH|x64
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug_NoPCH|x64.Build.0 = Debug_NoPCH|x64
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug|Win32.ActiveCfg = Debug|Win32
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug|Win32.Build.0 = Debug|Win32
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug|x64.ActiveCfg = Debug|x64
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Debug|x64.Build.0 = Debug|x64
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Release|Win32.ActiveCfg = Release|Win32
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Release|Win32.Build.0 = Release|Win32
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Release|x64.ActiveCfg = Release|x64
+		{B15F131E-328A-4D42-ADC2-9FF4CA6306D8}.Release|x64.Build.0 = Release|x64
 	EndGlobalSection
 	GlobalSection(SolutionProperties) = preSolution
 		HideSolutionNode = FALSE