Skip to content

Influence function

http://arxiv.org/abs/2402.04333

  1. Purpose

    Given just a a handful of examples embodying a specific capability, how can we effectively select relevant fine-tuning data from a large collection of instruction datasets?

  2. Influence function with Adam(LESS)

    • Tayor expansion

      (z;θt+1)(z;θt)+(z;θt),θt+1θtθt+1θt=ηt(z;θt)(z;θt+1)(z;θt)ηt(z;θt),(z;θt)
    • Trajectory influence

      InfSGD(z,z)i=1Nη¯i(z;θi),(z;θi)
    • Extension to Adam

      θt+1θt=ηtΓ(z,θt)Γ(z,θt)mt+1vt+1+ϵmt+1=(β1mt+(1β1)(z;θt))/(1β1t)vt+1=(β2vt+(1β2)(z;θt)2)/(1β2t)
    • Adam Influence

      InfAdam (z,z)i=1Nη¯i(z;θi),Γ(z,θi)(z;θi)Γ(z,θi)
  3. Problems&Improvements

    • Random Projection (JL lemma)

      img

      img

    • Lora

      53f9bf31-eb27-4aa7-b1e1-4904fc87e543