associated Journal of Open Source Software paper now published and should be used as reference.
calibrate_LLR() changed argument position to make it more compatible with the tidy approach. Now the data to calibrate is the first argument while the calibration dataset is the second argument.
The calibration dataset is now optional for calibrate_LLR(). If absent, the calibration is done leave-one-out. This is useful to see the LLRs when testing a method on a training dataset.
contentmask() also accepts sentence-tokenised corpora (the outputs of tokenize_sents()) as input, this option also allowing parallel processing.
chunk_texts() now outputs texts that keep the same spaces present in the original and no longer outputs texts with spaces around the punctuation marks.
chunk_texts() now also accepts tokens objects as input and in that case returns chunks of sentences for which the total length is equal or greater than the one specified.
posterior() accepts now optionally also accepts prior probabilities as input from the user.
vectorize() and two functions that call it (delta() and ngram_tracing()) now have a new argument called 'cross_boundaries'. If FALSE, n-grams do not cross sentence boundaries (which was the default behaviour in previous versions). This change simply means that the user can now choose to cross sentence boundaries when making n-grams if they wish. The behaviour of these functions is therefore also now clearer.
the progress bar is now optional for all authorship analysis functions (but default is set to TRUE).
minor bug fixes
contentmask() no longer has the option to replace ASCII; removed dependency on textclean package.
contentmask() used with the "frames" algorithm now adopts the Universal POS-tags, making it more compatible with other languages.
create_corpus() tests for the correct syntax of the file names and returns an error if not correct (plus showing which file names are incorrect).
create_corpus() includes an argument to specify the encoding of the texts.
minor bug fixes
concordance() now can take sentences as input and will also show sentence boundaries
lambdaG_visualize() can now the text heatmap either with sentences ordered by lambdaG values (default) or by the original order of the sentences in the text
lambdaG_visualize() can now visualize negative lambdaG values in an html file
ngram_tracing() contained a major bug when performing tests with multiple known authors which would lead to anomalously high and incorrect performance statistics. This has been fixed.
performance() progress bar now can be optional
performance() can run leave-one-out by author rather than just by text