Add batching parameter to parallel_for_each

parallel_for_each currently requires each thread to process at least
10 elements.  However, when indexing, it's fine for a thread to handle
just a single CU.  This patch parameterizes this, and updates the one
user.



3 files changed