๐Ÿ“ฆ veggiemonk / batch

๐Ÿ“„ README.md ยท 133 lines
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133# Batch

Split an array/slice into `n` evenly chunks.

Inspired from the blog post by [Paul Di Gian](https://github.com/PaulDiGian) on his blog:
[Split a slice or array in a defined number of chunks in golang](https://pauldigian.com/split-a-slice-or-array-in-a-defined-number-of-chunks-in-golang-but-any-language-really)

<!-- TOC -->
* [Batch](#batch)
  * [Installation](#installation)
  * [Usage](#usage)
  * [Usage with Cloud Run Jobs](#usage-with-cloud-run-jobs)
  * [Rationale](#rationale)
<!-- TOC -->

## Installation

Requires Go 1.18 or later.

add `github.com/veggiemonk/batch` to your `go.mod` file

then run the following command:

```bash
go mod tidy
```

## Usage

**Note**: you might better off just copying the function into your codebase. 
It is less 10 lines of code.

See [Go Proverbs](https://go-proverbs.github.io/) for more details.

> A little copying is better than a little dependency.


```go
package main

import (
	"fmt"

	"github.com/veggiemonk/batch"
)

func main() {
    s :=  []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}

    // Split the slice into 3 even parts
    chunks := batch.BatchSlice(s, 3)

    // Print the chunks
    fmt.Println(chunks)
    // length      3       3        4
    // output: [[1 2 3] [4 5 6] [7 8 9 10]]
    // the size of each batch has variation of max 1 item
    // this can spread the load evenly amongst workers
}
```

## Usage with Cloud Run Jobs

```go

batchID = uuid.New().String()
taskCount, _ = strconv.Atoi(os.Getenv("CLOUD_RUN_TASK_COUNT"))
taskIndex, _ = strconv.Atoi(os.Getenv("CLOUD_RUN_TASK_INDEX"))

tt, err := requestToTasks(request)
if err != nil {
	return fmt.Errorf("failed to get list of tasks (id:%s): %w", batchID, err)
}

if len(tt) == 0 {
	return fmt.Errorf("no tasks found (id:%s): %w", batchID, ErrNoTaskFound)
}

batches := batch.BatchSlice(tt, taskCount)
if taskIndex >= len(batches) || taskIndex < 0 {
	return fmt.Errorf("index (%d) out of bounds (max: %d), (id:%s): %w", taskIndex, len(batches), batchID, ErrTaskIndexOutOfBounds)
}

b := batches[taskIndex]

err = process(b)
if err != nil {
    return fmt.Errorf("failed to process batch (id:%s): %w", batchID, err)
}

```

## Rationale

Having evenly sized batch is useful when you want to distribute the workload evenly across multiple workers.

As opposed to defining the _size of each batch_, we define the _number of batch we want_ to have.

Here a **counter** example:

```go
package main
import "fmt"

func main() {
	array := []int{1, 2, 3, 4, 5, 6, 7, 8, 9, 10}
	chunkSize := 3
    var result [][]int
	
	for i := 0; i < len(array); i += chunkSize {
		end := i + chunkSize

		if end > len(array) {
			end = len(array)
		}

		result = append(result, array[i:end])
	}
	
	fmt.Println(result)
	// length       4    |    4    |  2 
	// output: [[1 2 3 4] [5 6 7 8] [9 10]]
	// 2 workers will do double the work of the last worker.
}
```

This is not ideal when you want to distribute the workload evenly across multiple workers.



[//]: # (can be played with here: https://go.dev/play/p/-ULiql4tOTc)