Arithmetic coding in C
by @nclack.
Implementation after Amir Said's Algorithms 22-29(1). This was mostly a learning exercise. It hasn't been optimized.
Encoding/decoding to/from variable symbol alphabets.
Implicit use of a STOP symbol means that you don't need to know the number of symbols in the decoded message in order to decode something.
Can encoded messages stored as signed or unsigned chars, shorts, longs or long longs. Keep in mind, however, that the number of encodable symbols may be limiting. You can't encode 2^64 different integers, sorry.
Can encode to streams of variable symbol width; either 1,4,8, or 16 bits. There is are two tradeoffs here.
void encode()
{ unsigned char *input, *output=0; // input and output buffer
size_t countof_input, countof_output; // number of symbols in input and output buffer
float *cdf=0;
size_t nsym; // number of symbols in the input alphabet
// somehow load the data into input array
cdf_build(&cdf,&nsym,input,countof_input);
encode_u1_u8( // encode unsigned chars to a string of bits (1 bit per output symbol)
(void**)&out,&countof_output,
input, countof_input,
cdf, nsym);
// do something with the output
free(out);
free(cdf);
}
Said, A. "Introduction to Arithmetic Coding - Theory and Practice."
Hewlett Packard Laboratories Report: 2004-2076.