A trainer learns the function f(x)=y, or weights W, of the following form to predict a label y where x is a feature vector. y=f(x)=Wx

Without a bias clause (or regularization), f(x) cannot make a hyperplane that divides (1,1) and (2,2) becuase f(x) crosses the origin point (0,0).

With bias clause b, a trainer learns the following f(x). f(x)=Wx+b Then, the predicted model considers bias existing in the dataset and the predicted hyperplane does not always cross the origin.

add_bias() of Hivemall, adds a bias to a feature vector. To enable a bias clause, use addbias() for both(important!) training and test data as follows. The bias _b is a feature of "0" ("-1" in before v0.3) by the default. See AddBiasUDF for the detail.

Note that Bias is expressed as a feature that found in all training/testing examples.

# Adding a bias clause to test data

create table e2006tfidf_test_exploded as
select
rowid,
target,
split(feature,":")[0] as feature,
cast(split(feature,":")[1] as float) as value
-- extract_feature(feature) as feature, -- hivemall v0.3.1 or later
-- extract_weight(feature) as value     -- hivemall v0.3.1 or later
from
e2006tfidf_test LATERAL VIEW explode(add_bias(features)) t AS feature;


# Adding a bias clause to training data

create table e2006tfidf_pa1a_model as
select
feature,
avg(weight) as weight
from
(select