๐Ÿ“ฆ BurntSushi / aho-corasick

A fast implementation of Aho-Corasick in Rust.

โ˜… 1.2k stars โ‘‚ 108 forks ๐Ÿ‘ 1.2k watching โš–๏ธ The Unlicense
aho-corasickfinite-state-machinesearchsubstring-matchingtext-processing
๐Ÿ“ฅ Clone https://github.com/BurntSushi/aho-corasick.git
HTTPS git clone https://github.com/BurntSushi/aho-corasick.git
SSH git clone git@github.com:BurntSushi/aho-corasick.git
CLI gh repo clone BurntSushi/aho-corasick
Andrew Gallant Andrew Gallant ci: use older version of `log` on pinned build 88e4966 2 months ago ๐Ÿ“ History
๐Ÿ“‚ master View all commits โ†’
๐Ÿ“ .github
๐Ÿ“ .vim
๐Ÿ“ aho-corasick-debug
๐Ÿ“ benchmarks
๐Ÿ“ fuzz
๐Ÿ“ src
๐Ÿ“„ .gitignore
๐Ÿ“„ Cargo.toml
๐Ÿ“„ COPYING
๐Ÿ“„ DESIGN.md
๐Ÿ“„ LICENSE-MIT
๐Ÿ“„ README.md
๐Ÿ“„ rustfmt.toml
๐Ÿ“„ UNLICENSE
๐Ÿ“„ README.md

aho-corasick ============ A library for finding occurrences of many patterns at once with SIMD acceleration in some cases. This library provides multiple pattern search principally through an implementation of the Aho-Corasick algorithm, which builds a finite state machine for executing searches in linear time. Features include case insensitive matching, overlapping matches, fast searching via SIMD and optional full DFA construction and search & replace in streams.

Build status crates.io

Dual-licensed under MIT or the UNLICENSE.

Documentation

https://docs.rs/aho-corasick

Usage

Run cargo add aho-corasick to automatically add this crate as a dependency in your Cargo.toml file.

Example: basic searching

This example shows how to search for occurrences of multiple patterns simultaneously. Each match includes the pattern that matched along with the byte offsets of the match.

use aho_corasick::{AhoCorasick, PatternID};

let patterns = &["apple", "maple", "Snapple"];
let haystack = "Nobody likes maple in their apple flavored Snapple.";

let ac = AhoCorasick::new(patterns).unwrap();
let mut matches = vec![];
for mat in ac.find_iter(haystack) {
    matches.push((mat.pattern(), mat.start(), mat.end()));
}
assert_eq!(matches, vec![
    (PatternID::must(1), 13, 18),
    (PatternID::must(0), 28, 33),
    (PatternID::must(2), 43, 50),
]);

Example: ASCII case insensitivity

This is like the previous example, but matches Snapple case insensitively using AhoCorasickBuilder:

use aho_corasick::{AhoCorasick, PatternID};

let patterns = &["apple", "maple", "snapple"];
let haystack = "Nobody likes maple in their apple flavored Snapple.";

let ac = AhoCorasick::builder()
    .ascii_case_insensitive(true)
    .build(patterns)
    .unwrap();
let mut matches = vec![];
for mat in ac.find_iter(haystack) {
    matches.push((mat.pattern(), mat.start(), mat.end()));
}
assert_eq!(matches, vec![
    (PatternID::must(1), 13, 18),
    (PatternID::must(0), 28, 33),
    (PatternID::must(2), 43, 50),
]);

Example: replacing matches in a stream

This example shows how to execute a search and replace on a stream without loading the entire stream into memory first.

``rust,ignore use aho_corasick::AhoCorasick; let patterns = &["fox", "brown", "quick"]; let replace_with = &["sloth", "grey", "slow"]; // In a real example, these might be std::fs::Files instead. All you need to // do is supply a pair of std::io::Read and std::io::Write implementations. let rdr = "The quick brown fox."; let mut wtr = vec![]; let ac = AhoCorasick::new(patterns).unwrap(); ac.stream_replace_all(rdr.as_bytes(), &mut wtr, replace_with) .expect("stream_replace_all failed"); assert_eq!(b"The slow grey sloth.".to_vec(), wtr); %%CODEBLOCK2%%rust use aho_corasick::AhoCorasick; let patterns = &["Samwise", "Sam"]; let haystack = "Samwise"; let ac = AhoCorasick::new(patterns).unwrap(); let mat = ac.find(haystack).expect("should have a match"); assert_eq!("Sam", &haystack[mat.start()..mat.end()]); %%CODEBLOCK3%%rust use aho_corasick::{AhoCorasick, MatchKind}; let patterns = &["Samwise", "Sam"]; let haystack = "Samwise"; let ac = AhoCorasick::builder() .match_kind(MatchKind::LeftmostFirst) .build(patterns) .unwrap(); let mat = ac.find(haystack).expect("should have a match"); assert_eq!("Samwise", &haystack[mat.start()..mat.end()]); ` In addition to leftmost-first semantics, this library also supports leftmost-longest semantics, which match the POSIX behavior of a regular expression alternation. See MatchKind in the docs for more details. ### Minimum Rust version policy This crate's minimum supported rustc version is 1.60.0. The current policy is that the minimum Rust version required to use this crate can be increased in minor version updates. For example, if crate 1.0 requires Rust 1.20.0, then crate 1.0.z for all values of z will also require Rust 1.20.0 or newer. However, crate 1.y for y > 0` may require a newer minimum version of Rust.

In general, this crate will be conservative with respect to the minimum supported version of Rust.

FFI bindings

is a Python wrapper for this library. wrapper for this library.