A trainer learns the function f(x)=y, or weights W, of the following form to predict a label y where x is a feature vector. y=f(x)=Wx
Without a bias clause (or regularization), f(x) cannot make a hyperplane that divides (1,1) and (2,2) becuase f(x) crosses the origin point (0,0).
With bias clause b, a trainer learns the following f(x). f(x)=Wx+b Then, the predicted model considers bias existing in the dataset and the predicted hyperplane does not always cross the origin.
add_bias() of Hivemall, adds a bias to a feature vector. To enable a bias clause, use addbias() for both(important!) training and test data as follows. The bias _b is a feature of "0" ("-1" in before v0.3) by the default. See AddBiasUDF for the detail.
Note that Bias is expressed as a feature that found in all training/testing examples.
Adding a bias clause to test data
create table e2006tfidf_test_exploded as select rowid, target, split(feature,":") as feature, cast(split(feature,":") as float) as value -- extract_feature(feature) as feature, -- hivemall v0.3.1 or later -- extract_weight(feature) as value -- hivemall v0.3.1 or later from e2006tfidf_test LATERAL VIEW explode(add_bias(features)) t AS feature;
Adding a bias clause to training data
create table e2006tfidf_pa1a_model as select feature, avg(weight) as weight from (select pa1a_regress(add_bias(features),target) as (feature,weight) from e2006tfidf_train_x3 ) t group by feature;