There is a binary serialization called BorshSerialize. And in its Rust implementation, when alloc is enabled, it allocates a Vector with initial capacity of 1024 before each serialization. I thought to myself, I can do this better by allocating the exact number of bytes to avoid any extra bytes overhead and resizing for bigger structs. Here is my attempt:
use borsh::BorshSerialize;
pub trait FastBorshSerialize: BorshSerialize + BorshSize {
fn fast_serialize(&self) -> Vec<u8> {
let mut buf = Vec::with_capacity(self.borsh_size());
self.serialize(&mut buf).expect("Serialization must not fail");
buf
}
}
impl<T> FastBorshSerialize for T
where
T: BorshSerialize + BorshSize
{
}
pub trait BorshSize {
fn borsh_size(&self) -> usize;
}
The idea is, on top of this, I implement the BorshSize
for all base types (u8, u16, [u8; 32], Vec) etc. And then implement a derive macro to automatically implement the BorshSize
trait by calling borsh_size
method on all of its fields and sum them up. IMO this should be optimized quite a lot by compiler since it will most likely be inlined, and basic-types sizes are just constant values (u32 => 4). I already did this and it is in this repo. Not sharing it here because I think it is irrelevant.
I wrote a micro-benchmark to compare my implementation to regular borsh::to_vec
implementation and ran it with cargo test --release
:
#[derive(Default, BorshSerialize, BorshSize)]
struct Strukt {
a: u32,
b: u64,
c: [u8; 32],
d: Vec<u8>,
}
#[test]
fn t() {
let mut s = Strukt::default();
s.d = vec![5; 900];
let r1 = borsh::to_vec(&s).unwrap();
let r2 = s.fast_serialize();
assert_eq!(r1, r2);
dbg!(r1.len());
dbg!(r1.capacity());
dbg!(r2.len());
dbg!(r2.capacity());
let n = 100000;
let start = std::time::Instant::now();
for _ in 0..n {
s.fast_serialize();
}
println!("elapsed fast: {}", start.elapsed().as_micros());
let start = std::time::Instant::now();
for _ in 0..n {
borsh::to_vec(&s).unwrap();
}
println!("elapsed borsh: {}", start.elapsed().as_micros());
}
Here is the output I get:
---- t stdout ----
[fastborsh/tests/serialize.rs:19:5] r1.len() = 948
[fastborsh/tests/serialize.rs:20:5] r1.capacity() = 1024
[fastborsh/tests/serialize.rs:21:5] r2.len() = 948
[fastborsh/tests/serialize.rs:22:5] r2.capacity() = 948
elapsed fast: 4864
elapsed borsh: 2683
I calculated the required capacity correctly and allocated exactly that, but still got almost 2x worse results. I know this is a micro-benchmark in the end but I still wouldn't expect consistent 2x worse result despite fast version allocates smaller size. I am quite sure that the rest of the serialization logic is the same.
I just went and changed the s.d = vec![5; 900];
line to s.d = vec![5; 1000];
. New output is:
[fastborsh/tests/serialize.rs:19:5] r1.len() = 1048
[fastborsh/tests/serialize.rs:20:5] r1.capacity() = 2048
[fastborsh/tests/serialize.rs:21:5] r2.len() = 1048
[fastborsh/tests/serialize.rs:22:5] r2.capacity() = 1048
elapsed fast: 2917
elapsed borsh: 10089
Not only it beats the borsh original implementation (as expected especially borsh::to_vec now has to resize its buffer), it also beats itself when it allocated 900 elements instead of 1000.
Anyone have any idea why this could be the case :)?
If you want to replicate the benchmark check out this.
I tested on Macbook M3 Chip 16GB RAM.
Source: View source