๐Ÿ“ฆ veggiemonk / batch

Split an array/slice into n evenly chunks. Spread load evenly across workers

โ˜… 6 stars โ‘‚ 1 forks ๐Ÿ‘ 6 watching โš–๏ธ Apache License 2.0
batchcloud-runcloud-run-jobsgogo-genericsgolangjobs
๐Ÿ“ฅ Clone https://github.com/veggiemonk/batch.git
HTTPS git clone https://github.com/veggiemonk/batch.git
SSH git clone git@github.com:veggiemonk/batch.git
CLI gh repo clone veggiemonk/batch
Julien Bisconti Julien Bisconti Add links and doc 46ebb45 1 years ago ๐Ÿ“ History
๐Ÿ“‚ main View all commits โ†’
๐Ÿ“„ .gitignore
๐Ÿ“„ batch_fuzz_test.go
๐Ÿ“„ batch_test.go
๐Ÿ“„ batch.go
๐Ÿ“„ go.mod
๐Ÿ“„ go.sum
๐Ÿ“„ LICENSE
๐Ÿ“„ README.md
๐Ÿ“„ README.md

Batch

Go Reference

Split an array/slice into n evenly chunks.

Inspired from the blog post by Paul Di Gian on his blog: Split a slice or array in a defined number of chunks in golang

Note: you might better off just copying the function into your codebase. It has little code.

See Go Proverbs for more details.

A little copying is better than a little dependency.

This library isn't really meant to be imported. Just copy the one function and adapt it to your needs. Look at the tests for edge cases. The benchmarks and fuzzing are just for me to learn and have a playground to try things out.

Installation

Requires Go 1.18 or later.

Just copy the function in batch.go

Usage

package main

import (
	"fmt"

	"github.com/veggiemonk/batch"
)

func main() {
    s :=  []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

    // Split the slice into 3 even parts
    chunks := batch.Slice(s, 3)

    fmt.Println(chunks)
    // length      3       3        4
    // output: [[1 2 3] [4 5 6] [7 8 9 10]]
    // the size of each batch has variation of max 1 item
    // this can spread the load evenly amongst workers
}

Usage with Cloud Run Jobs

batchID = uuid.New().String()
taskCount, _ = strconv.Atoi(os.Getenv("CLOUD_RUN_TASK_COUNT"))
taskIndex, _ = strconv.Atoi(os.Getenv("CLOUD_RUN_TASK_INDEX"))

tt, _ := requestToTasks(request)

batches := batch.Slice(tt, taskCount)
if taskIndex >= len(batches) || taskIndex < 0 {
	return fmt.Errorf("index (%d) out of bounds (max: %d), (id:%s): %w", taskIndex, len(batches), batchID, ErrTaskIndexOutOfBounds)
}

b := batches[taskIndex]
if err := process(b); err != nil {
    return fmt.Errorf("failed to process batch (id:%s): %w", batchID, err)
}

Rationale

Having (almost) same sized batch is useful when you want to distribute the workload evenly across multiple workers.

As opposed to defining the size of each batch, we define the number of batch we want to have.

Here a counter example:

actions := []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
batchSize := 3
batches := make([][]int, 0, (len(actions) + batchSize - 1) / batchSize)

for batchSize < len(actions) {
    actions, batches = actions[batchSize:], append(batches, actions[0:batchSize:batchSize])
}
batches = append(batches, actions)
fmt.Println(result)
// length       4    |    4    |  2
// output: [[1 2 3 4] [5 6 7 8] [9 10]]
// 2 workers will do double the work of the last worker.
// --> Not what we want.
}

This is not ideal when you want to distribute the workload evenly across multiple workers.

The code was taken from Go wiki - Slice Tricks.

Links