—————————
Fluxについての学習 #その1
—————————
Model-Building Basics
<微分Gradients>
1 2 3 4 5 6 7 8 9 10 11 12 13 |
using Flux.Tracker f(x) = 3x^2 + 2x + 1 f´(x) = Tracker.gradient(f, x; nest = true)[1] f´(2) Out: 14.0 (tracked) f´´(x) = Tracker.gradient(f´, x; nest = true)[1] f´´(1) Out: 6.0 (tracked) |
tracked 追跡済?
複数のパラメータでは、
1 2 3 4 |
f(W, b, x) = W * x + b Tracker.gradient(f, 2, 3, 4) Out: (4.0 (tracked), 1.0 (tracked), 2.0 (tracked)) |
この意味は、W=2で、x=3, b=4のとき、W, b, xで微分したときの値。
もっとパラメータが増えてきたときの対策として、
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
using Flux W = param(2) Out: 2.0 (tracked) b = param(3) Out: 3.0 (tracked) f(x) = W * x + b; grads = Tracker.gradient(() -> f(4), params(W, b)); grads[W] Out: 4.0 (tracked) grads[b] Out: 1.0 (tracked) |
Simple Models
単純な直線回帰モデル
1 2 3 4 5 6 7 8 9 10 11 |
W = rand(2, 5) Out: 2×5 Array{Float64,2}: 0.368175 0.345883 0.210067 0.209612 0.652851 0.618021 0.960951 0.615957 0.603506 0.0434167 b = rand(2) Out: 2-element Array{Float64,1}: 0.8313598055168152 0.20773406571335795 predict(x) = W*x .+ b |
\begin{eqnarray}\begin{pmatrix} y_{1} & y_{2}\end{pmatrix} =\end{eqnarray}
\begin{eqnarray}\begin{pmatrix} 0.438872 & 0.46899 & 0.317547 & 0.576811 & 0.0981931 \\ 0.907715 & 0.0509864 & 0.673098 & 0.304938 & 0.560767\end{pmatrix}\begin{pmatrix} x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \\ x_{5}\end{pmatrix} +\begin{pmatrix} 0.8503088744469345 & 0.9632808669284363\end{pmatrix}\end{eqnarray}
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 |
function loss(x, y) #損失関数loss(x, y)を定義 ŷ = predict(x) sum((y .- ŷ).^2) end x, y = rand(5), rand(2) # Dummy data Out:([0.4900917806716063, 0.049676987051323396, 0.7784717370467997, 0.6835206026694831, 0.596004491003387], [0.7969464514327647, 0.4078909566193709]) ŷ = W*x .+ b Out:2-element Array{Float64,1}: 1.724889563681388 1.4762491451321187 ŷ Out:2-element Array{Float64,1}: 1.724889563681388 1.4762491451321187 sum((y .- ŷ).^2) Out: 2.002467638531901 loss(x,y) Out: 2.002467638531901 W = param(W) Out: Tracked 2×5 Array{Float64,2}: 0.368175 0.345883 0.210067 0.209612 0.652851 0.618021 0.960951 0.615957 0.603506 0.0434167 b = param(b) Out:Tracked 2-element Array{Float64,1}: 0.8313598055168152 0.20773406571335795 gs = Tracker.gradient(() -> loss(x, y), Params([W, b])) using Flux.Tracker: update! Δ = gs[W] #勾配を求める Out: Tracked 2×5 Array{Float64,2}: 0.909555 0.0921948 1.44475 1.26854 1.10612 1.04719 0.106146 1.66337 1.46049 1.27349 Δ = gs[b] #勾配を求める Out: Tracked 2-element Array{Float64,1}: 1.8558862244972465 2.1367163770254956 update!(W, -0.1Δ) Out:Tracked 2×5 Array{Float64,2}: 0.27722 0.336663 0.0655918 0.0827582 0.542239 0.513303 0.950337 0.44962 0.457457 -0.0839325 update!(b, -0.1Δ) Out: Tracked 2-element Array{Float64,1}: 0.6457711830670906 -0.00593757198919162 W Out: Tracked 2×5 Array{Float64,2}: 0.27722 0.336663 0.0655918 0.0827582 0.542239 0.513303 0.950337 0.44962 0.457457 -0.0839325 b Out: Tracked 2-element Array{Float64,1}: 0.6457711830670906 -0.00593757198919162 ŷ = W*x .+ b Out: Tracked 2-element Array{Float64,1}: 1.229164208216012 0.9055113102647928 y Out: 2-element Array{Float64,1}: 0.7969464514327647 0.4078909566193709 sum((y .- ŷ).^2) Out: 0.43443820564093705 (tracked) loss(x, y) Out: 0.43443820564093705 (tracked) |
\begin{eqnarray}\begin{pmatrix} 0.7969464514327647 & 0.4078909566193709\end{pmatrix} \simeq\end{eqnarray}
\begin{eqnarray}\begin{pmatrix} 1.229164208216012 & 0.9055113102647928\end{pmatrix} =\end{eqnarray}
\begin{eqnarray}\begin{pmatrix} 0.27722 & 0.336663 & 0.0655918 & 0.0827582 & 0.542239 \\ 0.513303 & 0.950337 & 0.44962 & 0.457457 & -0.0839325\end{pmatrix}\begin{pmatrix} 0.4900917806716063 \\ 0.049676987051323396 \\ 0.7784717370467997 \\ 0.6835206026694831 \\ 0.596004491003387\end{pmatrix} +\begin{pmatrix} 0.6457711830670906 & -0.00593757198919162\end{pmatrix}\end{eqnarray}
Building Layers
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
using Flux W1 = param(rand(3, 5)) Out:Tracked 3×5 Array{Float64,2}: 0.773465 0.714468 0.568461 0.982047 0.0679719 0.865974 0.233583 0.455581 0.225429 0.967652 0.0708245 0.665327 0.263582 0.522675 0.95801 b1 = param(rand(3)) Out: Tracked 3-element Array{Float64,1}: 0.7294227441407048 0.41228416005560464 0.788171998794877 layer1(x) = W1 * x .+ b1 |
\begin{eqnarray}\begin{pmatrix} y_{1} & y_{2} & y_{3}\end{pmatrix} =\end{eqnarray}
\begin{eqnarray}\begin{pmatrix} 0.773465 & 0.714468 & 0.568461 & 0.982047 & 0.0679719 \\ 0.865974 & 0.233583 & 0.455581 & 0.225429 & 0.967652 \\ 0.0708245 & 0.665327 & 0.263582 & 0.522675 & 0.95801 \end{pmatrix}\begin{pmatrix} x_{1} \\ x_{2} \\ x_{3} \\ x_{4} \\ x_{5}\end{pmatrix} +\begin{pmatrix} 0.7294227441407048 & 0.41228416005560464 & 0.788171998794877 \end{pmatrix}\end{eqnarray}
次には、2層のモデルにトライ:
1 2 3 4 5 6 7 8 9 10 11 |
W2 = param(rand(2, 3)) b2 = param(rand(2)) layer2(x) = W2 * x .+ b2 model(x) = layer2(σ.(layer1(x))) model(rand(5)) Out: Tracked 2-element Array{Float64,1}: 2.632974806180786 2.2360280611397636 |
この2層モデルをさらに関数モデルを用いて、一般化:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 |
function linear(in, out) W = param(randn(out, in)) b = param(randn(out)) x -> W * x .+ b end linear1 = linear(5, 3) linear2 = linear(3, 2) model(x) = linear2(σ.(linear1(x))) model(rand(5)) Tracked 2-element Array{Float64,1}: -0.5030427334239609 0.40162437099458065 |
アフィン層affine layer:全結合層の考えで整理すれば:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 |
struct Affine W b end Affine(in::Integer, out::Integer) = Affine(param(randn(out, in)), param(randn(out))) (m::Affine)(x) = m.W * x .+ m.b a = Affine(10, 5) Out:Affine([-0.31227747893202246 1.292685236954316 … 1.0304461118143955 -1.503329047910319; 0.6161861992585083 -0.609953709706361 … 0.07759425892843008 2.4881415237364073; … ; -2.3251472373791984 1.448069304707333 … -1.6988902800087287 0.2268416911182631; -0.052827431559308115 0.8805940136326247 … -0.8753138959973339 -0.0033639097265230804] (tracked), [1.0373442618857796, -0.4047970525546537, -0.6542470412309978, 1.334938583025174, -2.5248120646544416] (tracked)) a(rand(10)) Out: Tracked 5-element Array{Float64,1}: 0.10161522017079772 0.43170782669591345 3.06508614044054 1.2698903577416822 -3.22528259736402 x Out: 5-element Array{Float64,1}: 0.4900917806716063 0.049676987051323396 0.7784717370467997 0.6835206026694831 0.596004491003387 a.W Out: Tracked 5×10 Array{Float64,2}: -0.312277 1.29269 -1.34575 … -1.80024 1.03045 -1.50333 0.616186 -0.609954 -1.0504 1.03372 0.0775943 2.48814 0.905837 1.01937 0.425142 1.45138 -0.968728 0.162431 -2.32515 1.44807 -0.941748 -0.371327 -1.69889 0.226842 -0.0528274 0.880594 -1.17656 -0.165127 -0.875314 -0.00336391 a.b Out: Tracked 5-element Array{Float64,1}: 1.0373442618857796 -0.4047970525546537 -0.6542470412309978 1.334938583025174 -2.5248120646544416 |
Flux.Dense ? Type.
Dense(in::Integer, out::Integer, σ = identity)
Creates a traditional Dense layer with parameters W and b.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 |
using Flux layers = [Dense(10, 5, σ), Dense(5, 2), softmax] Out: 3-element Array{Any,1}: Dense(10, 5, σ) Dense(5, 2) NNlib.softmax model(x) = foldl((x, m) -> m(x), layers, init = x) model(rand(10)) Out:Tracked 2-element Array{Float32,1}: 0.3659178f0 0.6340822f0 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
model2 = Chain( Dense(10, 5, σ), Dense(5, 2), softmax) Out: Chain(Dense(10, 5, σ), Dense(5, 2), softmax) model2(rand(10)) Out: Tracked 2-element Array{Float32,1}: 0.82144165f0 0.17855836f0 m = Dense(5, 2) ? Dense(10, 5, σ) m(rand(10)) Out: Tracked 2-element Array{Float32,1}: 0.43607265f0 0.34412792f0 |
ここで、Chain()は連鎖反応を実行関数:
1 2 3 4 5 6 7 8 |
m = Chain(x -> x^2, x -> x+1) m(5) # => 26 Out: 26 m = Chain(x -> x^3 , x -> x^2, x -> x+1) m(2) # 2 => 8 -> 64 -> 65 Out: 65 |