wolffd@0
|
1 <HTML>
|
wolffd@0
|
2 <HEAD>
|
wolffd@0
|
3 <META HTTP-EQUIV="Content-Type" CONTENT="text/html; charset=windows-1252">
|
wolffd@0
|
4 <META NAME="Generator" CONTENT="Microsoft FrontPage 6.0">
|
wolffd@0
|
5 <TITLE>svm_light_faq</TITLE>
|
wolffd@0
|
6 <META NAME="Version" CONTENT="8.0.3514">
|
wolffd@0
|
7 <META NAME="Date" CONTENT="11/26/96">
|
wolffd@0
|
8 <META NAME="Template" CONTENT="C:\Programme\Microsoft Office\Office\HTML.DOT">
|
wolffd@0
|
9 </HEAD>
|
wolffd@0
|
10 <BODY LINK="#0000ff" VLINK="#800080" BGCOLOR="#ffffff">
|
wolffd@0
|
11
|
wolffd@0
|
12 <TABLE CELLSPACING=0 BORDER=0 CELLPADDING=5>
|
wolffd@0
|
13 <TR><TD WIDTH="14%" VALIGN="TOP">
|
wolffd@0
|
14 <H2><a TARGET="_top" HREF="http://www-ai.cs.uni-dortmund.de/"><IMG SRC="eier_graybg.gif" BORDER=0 WIDTH=100 HEIGHT=81></A></H2></TD>
|
wolffd@0
|
15 <TD WIDTH="75%" VALIGN="TOP">
|
wolffd@0
|
16 <H1 ALIGN="CENTER">SVM<I><SUP>light</SUP> &</I> SVM<sup><i>struct</i></sup><I>
|
wolffd@0
|
17 </I> </H1>
|
wolffd@0
|
18 <H1 ALIGN="CENTER">FAQ</H1>
|
wolffd@0
|
19 <FONT COLOR="#000000"><P ALIGN="CENTER">Author: </FONT><a TARGET="_top" HREF="http://www.joachims.org/">Thorsten Joachims</A><FONT COLOR="#000000"> <</FONT><A HREF="mailto:thorsten@joachims.org">thorsten@joachims.org</A><FONT COLOR="#000000">> <BR>
|
wolffd@0
|
20 </FONT><a TARGET="_top" HREF="http://www.cornell.edu/">Cornell University</A><FONT COLOR="#000000"> <BR>
|
wolffd@0
|
21 </FONT><a TARGET="_top" HREF="http://www.cs.cornell.edu/">Department of Computer Science</A><FONT COLOR="#000000"> </P>
|
wolffd@0
|
22 <P ALIGN="CENTER">Developed at: <BR>
|
wolffd@0
|
23 </FONT><a TARGET="_top" HREF="http://www.uni-dortmund.de/">University of Dortmund</A><FONT COLOR="#000000">, </FONT><a TARGET="_top" HREF="http://www.informatik.uni-dortmund.de/">Informatik</A><FONT COLOR="#000000">, </FONT><a TARGET="_top" HREF="http://www-ai.informatik.uni-dortmund.de/">AI-Unit</A><FONT COLOR="#000000"> <BR>
|
wolffd@0
|
24 </FONT><a TARGET="_top" HREF="http://www.sfb475.uni-dortmund.de/">Collaborative Research Center on 'Complexity Reduction in Multivariate Data' (SFB475)</A><FONT COLOR="#000000"> </FONT> </P>
|
wolffd@0
|
25 <P ALIGN="CENTER"><FONT COLOR="#000000"> Date: 25. May, 2009</FONT></P></TD>
|
wolffd@0
|
26 <TD WIDTH="11%" VALIGN="TOP">
|
wolffd@0
|
27 <H2><IMG SRC="http://www.joachims.org/images/culogo_125.gif" WIDTH=80 HEIGHT=80></H2></TD>
|
wolffd@0
|
28 </TR>
|
wolffd@0
|
29 </TABLE>
|
wolffd@0
|
30
|
wolffd@0
|
31 <H2>How do I compile on a Windows PC?</H2>
|
wolffd@0
|
32 <UL>
|
wolffd@0
|
33 <LI>The following instructions apply to SVM<I><SUP>light</SUP></I>, SVM<sup><i>struct</i></sup>, SVM<sup><i>perf</i></sup>, SVM<sup><i>cfg</i></sup>, SVM<sup><i>multiclass</i></sup>,
|
wolffd@0
|
34 and SVM<sup><i>hmm</i></sup>. </LI>
|
wolffd@0
|
35 <LI>The easiest is to install <a href="http://www.cygwin.com">CYGWIN</a> and
|
wolffd@0
|
36 use the gcc compiler that comes with the Cygwin distribution. Just open a
|
wolffd@0
|
37 cygwin command window, change to the directory that contains file "Makefile" and type
|
wolffd@0
|
38 <br>
|
wolffd@0
|
39 make<br>
|
wolffd@0
|
40 at the command prompt. This creates executables that run under Cygwin.</LI>
|
wolffd@0
|
41 <LI>If you want execuables that do not need Cygwin to run, follow the same
|
wolffd@0
|
42 instructions as above, but make sure that you download the MinGW package as
|
wolffd@0
|
43 part of Cygwin. Then use the command <br>
|
wolffd@0
|
44 make 'SFLAGS=-mno-cygwin'<br>
|
wolffd@0
|
45 to create the executables.</LI>
|
wolffd@0
|
46 <LI>You can also use the Visual C compiler, but it will be much more of a
|
wolffd@0
|
47 hassle than using Cygwin. To use Visual C, you need to build the appropriate project file according to the provided <TT>"Makefile"</TT>.
|
wolffd@0
|
48 Make sure you do not include the file "svm_loqo.c" in your project, unless you
|
wolffd@0
|
49 want to use the PL_LOQO optimizer instead of the built-in optimizer.</p>
|
wolffd@0
|
50 </UL>
|
wolffd@0
|
51
|
wolffd@0
|
52 <H2>How do I compile on a PowerMac using Code Warrior?</H2>
|
wolffd@0
|
53
|
wolffd@0
|
54 <UL>
|
wolffd@0
|
55 <LI>This applies to SVM<I><SUP>light</SUP></I>, but should also work for SVM<sup><i>struct</i></sup>, SVM<sup><i>perf</i></sup>, SVM<sup><i>cfg</i></sup>, SVM<sup><i>multiclass</i></sup>,
|
wolffd@0
|
56 and SVM<sup><i>hmm</i></sup>. </LI>
|
wolffd@0
|
57 <LI>You need to modify the source code a little (as suggested by Jewgeni Starikow). Use <TT>#include "console.h"</TT> to emulate a UNIX shell. Then add <TT>argc=ccommand(&argv)</TT> as the first instruction of each <TT>main()</TT>. </LI>
|
wolffd@0
|
58 <LI>CPU-timing might cause another problem. The timing routines are used to calculated the runtime of the program. If you do not need this feature, remove the body of <TT>get_runtime()</TT> in <TT>svm_common.c</TT>. Otherwise replace the body with the appropriate Mac routines from <TT>'time.h'</TT>. </LI></UL>
|
wolffd@0
|
59 <H2>How do I integrate SVM<SUP>light</SUP> into C++ code?</H2>
|
wolffd@0
|
60
|
wolffd@0
|
61 <UL>
|
wolffd@0
|
62 <LI>Compile <TT>"svm_learn.c"</TT>, <TT>"svm_common.c"</TT>, and <TT>"svm_hideo.c"</TT> as C code.</LI>
|
wolffd@0
|
63 <LI>The C++ program you want to call <TT>svm_learn/8</TT> and <TT>classify_example/2 </TT>(or <TT>classify_example_linear/2</TT>) from needs to include the following headers:<BR>
|
wolffd@0
|
64 <FONT FACE="Courier New" SIZE=2>extern "C" {<BR>
|
wolffd@0
|
65 # include "svm_common.h"<BR>
|
wolffd@0
|
66 # include "svm_learn.h"<BR>
|
wolffd@0
|
67 } </FONT> </LI>
|
wolffd@0
|
68 <LI>Link <TT>"svm_learn.o"</TT>, <TT>"svm_common.o"</TT>, and <TT>"svm_hideo.o" </TT>to your program.</LI></UL>
|
wolffd@0
|
69
|
wolffd@0
|
70 <h2>Is there an option for doing multi-class classification?</h2>
|
wolffd@0
|
71 <ul>
|
wolffd@0
|
72 <li>Not in SVM<I><SUP>light</SUP></I>. If you do not use kernel, you can
|
wolffd@0
|
73 use <a href="svm_multiclass.html">SVM<sup><i>multiclass</i></sup></a> to
|
wolffd@0
|
74 do multi-class classification. Otherwise, I recommend to split the task
|
wolffd@0
|
75 into multiple binary classification tasks for multi-class or multi-label classification. For
|
wolffd@0
|
76 example, you can do one-class-against-the-rest classification, or pairwise classification. Refer to Chapter 2 in [<a href="index.html#References">Joachims,
|
wolffd@0
|
77 2002a</a>].</li>
|
wolffd@0
|
78 </ul>
|
wolffd@0
|
79
|
wolffd@0
|
80 <h2>Is there an option for doing cross-validation?</h2>
|
wolffd@0
|
81 <ul>
|
wolffd@0
|
82 <li>Yes, there is such an option in SVM<I><SUP>light</SUP></I>, but not in SVM<sup><i>struct</i></sup>.
|
wolffd@0
|
83 By setting the option "-x 1", SVM<I><SUP>light</SUP></I> computes the
|
wolffd@0
|
84 leave-one-out estimates of the prediction error, recall, precision, and F1. By
|
wolffd@0
|
85 default, it also outputs the XiAlpha-Estimates of these quantities. They are
|
wolffd@0
|
86 strictly conservative approximations to the leave-one-out (i.e. they never
|
wolffd@0
|
87 underestimate the leave-one-out error). Refer to Chapter 5 in [<a href="index.html#References">Joachims,
|
wolffd@0
|
88 2002a</a>].</li>
|
wolffd@0
|
89 </ul>
|
wolffd@0
|
90 <h2>How can I output the dual variables at the solution?</h2>
|
wolffd@0
|
91 <ul>
|
wolffd@0
|
92 <li>Use the option -a <filename>. The file contains the value of the
|
wolffd@0
|
93 dual variables (multiplied by the label y) for each training example in the
|
wolffd@0
|
94 order in which the examples appeared in the training file.</li>
|
wolffd@0
|
95 </ul>
|
wolffd@0
|
96 <h2>How can I get the weight vector of the hyperplane for a linear SVM?</h2>
|
wolffd@0
|
97 <ul>
|
wolffd@0
|
98 <li>There is a small <a href="svm2weight.pl.txt">PERL script to
|
wolffd@0
|
99 compute the weight vector</a> (<a href="svm2weight.py.txt">similar script in
|
wolffd@0
|
100 PYTHON by Ori Cohen</a>) based on the model file output by svm_learn
|
wolffd@0
|
101 and svm_perf_learn.
|
wolffd@0
|
102 Of course, it works only in the linear case and not for kernels. All the
|
wolffd@0
|
103 script does is compute the weighted sum of the support vectors (first
|
wolffd@0
|
104 element in line is alpha*y, what follows is the feature vector). For
|
wolffd@0
|
105 further info, see the comment in the model file for its format.
|
wolffd@0
|
106 </li>
|
wolffd@0
|
107 </ul>
|
wolffd@0
|
108
|
wolffd@0
|
109 <h2>What is the format of the learned model file?</h2>
|
wolffd@0
|
110 <ul>
|
wolffd@0
|
111 <li>The model files produced by svm_learn and svm_perf_learn are in ASCII and
|
wolffd@0
|
112 easy to read. The first few lines contain parameters. The comments in the
|
wolffd@0
|
113 model file are self-explanatory. The last of these is the threshold b of the
|
wolffd@0
|
114 learned hyperplane classifier sign(w*x - b). The following lines each contain
|
wolffd@0
|
115 a support vector, and the first element of the line is its coefficient (alpha_i
|
wolffd@0
|
116 * y_i) in the kernel expansion. The support vectors are listed in random
|
wolffd@0
|
117 order. Note that the term support vector has a different meaning in SVM<sup>perf</sup>
|
wolffd@0
|
118 compared to SVM<SUP>light</SUP>.</li>
|
wolffd@0
|
119 </ul>
|
wolffd@0
|
120
|
wolffd@0
|
121
|
wolffd@0
|
122 <h2>How can I implement my own Kernel?</h2>
|
wolffd@0
|
123 <ul>
|
wolffd@0
|
124 <li>You can write your own kernel by extending the function "double
|
wolffd@0
|
125 custom_kernel(KERNEL_PARM *kernel_parm, SVECTOR *a, SVECTOR *b)" in "kernel.h"
|
wolffd@0
|
126 and then select it via the "-t 4" option. The "a" and "b" are pointers to the
|
wolffd@0
|
127 examples you are computing the kernel for.</li>
|
wolffd@0
|
128 <ul>
|
wolffd@0
|
129 <li>If the data you are working with is vectorial, then you can use the
|
wolffd@0
|
130 existing sparse vector operations defined in svm_common.c. For example,<br>
|
wolffd@0
|
131 pow(sprod_ss(a->words,b->words),2.0) <br>
|
wolffd@0
|
132 implements the homogeneous polynomial kernel of degree 2.</li>
|
wolffd@0
|
133 <li>If the data is non-vectorial (e.g. strings), then you can use the
|
wolffd@0
|
134 following functionality. In the training and test data files, whatever is
|
wolffd@0
|
135 written behind the "#" is copied into the "userdefined" field of the
|
wolffd@0
|
136 internal datastructure "SVECTOR" as a string. So, for the line<br>
|
wolffd@0
|
137 +1 #abcdefg<br>
|
wolffd@0
|
138 you will find "abcdefg" in "svector->userdefined".</li>
|
wolffd@0
|
139 </ul>
|
wolffd@0
|
140 </ul>
|
wolffd@0
|
141
|
wolffd@0
|
142
|
wolffd@0
|
143 <H2>Error messages and known problems:</H2>
|
wolffd@0
|
144
|
wolffd@0
|
145 <UL>
|
wolffd@0
|
146 <TT><LI>ERROR: terminating optimizer - choldc failed, matrix not positive definite</TT> </LI>
|
wolffd@0
|
147
|
wolffd@0
|
148 <UL>
|
wolffd@0
|
149 <LI>If the program terminates after this message, get the lastest version of PR_LOQO and SVM-Light V2.01 (or later). </LI>
|
wolffd@0
|
150 <LI>It the program continues after this error message, don't worry :-) </LI></UL>
|
wolffd@0
|
151
|
wolffd@0
|
152 <I><LI>The CPU-time is negative or looks bogus:</I> </LI>
|
wolffd@0
|
153
|
wolffd@0
|
154 <UL>
|
wolffd@0
|
155 <LI>To be compatible with Windows, I used the <TT>clock/0</TT> function to measure CPU-time. This function returns a long integer and counts microseconds, so that it wraps around after about 30 minutes. You could use timing routines more appropriate for your system. For example on Solaris, you might want to use the routines from <TT>'sys/times.h'</TT>.</LI></UL>
|
wolffd@0
|
156 </UL>
|
wolffd@0
|
157
|
wolffd@0
|
158 <H2>The program hangs when ...</H2>
|
wolffd@0
|
159
|
wolffd@0
|
160 <UL>
|
wolffd@0
|
161 <I><LI>... reading in the examples.</I> </LI>
|
wolffd@0
|
162
|
wolffd@0
|
163 <UL>
|
wolffd@0
|
164 <LI>Get version 3.02 or later. </LI></UL>
|
wolffd@0
|
165 </UL>
|
wolffd@0
|
166
|
wolffd@0
|
167 <H2>Convergence during learning is very slow!</H2>
|
wolffd@0
|
168
|
wolffd@0
|
169 <UL>
|
wolffd@0
|
170 <I><LI>In verbose mode 2 I observe that <TT>max violation</TT> bounces around and does not really converge.</I> </LI>
|
wolffd@0
|
171
|
wolffd@0
|
172 <UL>
|
wolffd@0
|
173 <LI>Use a smaller value for the option -n (default n=q). This makes sure that only n new variables enter the working set in each iteration. This can prevent zig-zagging behavior.</LI>
|
wolffd@0
|
174 <LI>Use a smaller or larger value for the size of the working set (option -q). Reasonable values are in the range [2:50].</LI>
|
wolffd@0
|
175 <LI>You might be using an excessively large value of C (option -c) in relation to your data. The value of C should typically be less than 1000 times 1.0/(the maximum squared Euclidian length of your feature vectors). </LI>
|
wolffd@0
|
176 <LI>You have weired data and the convergence simply IS very slow <FONT FACE="Wingdings">J</FONT>
|
wolffd@0
|
177 . Sorry, not much you can do about it. </LI></UL>
|
wolffd@0
|
178
|
wolffd@0
|
179 <I><LI>Nearly all my training examples end up as support vectors.</I> </LI>
|
wolffd@0
|
180
|
wolffd@0
|
181 <UL>
|
wolffd@0
|
182 <LI>Use a "stiffer" kernel (e g. a lower value of gamma for the RBF-kernel). If you pick a kernel which is very far away from the optimum, you will not get good generalization performance anyway. </LI>
|
wolffd@0
|
183 <LI>Your data is really difficult to separate. Think about a better representation of your data. It is a bad idea - for example - if different features have values in very much different orders of magnitude. You might want to normalize all features to the range [-1,+1]. </LI></UL>
|
wolffd@0
|
184 </UL>
|
wolffd@0
|
185
|
wolffd@0
|
186 <H2>It does not converge!</H2>
|
wolffd@0
|
187
|
wolffd@0
|
188 <UL>
|
wolffd@0
|
189 <LI>If you are using the built-in HIDEO optimizer, get the version 3.50 or later. There used to be problems for low dimensional data sets for the old HIDEO optimizer.</LI>
|
wolffd@0
|
190 <LI>It should always converge, unless there are numerical problems :-) </LI>
|
wolffd@0
|
191 <LI>Numerical problems. </LI>
|
wolffd@0
|
192
|
wolffd@0
|
193 <UL>
|
wolffd@0
|
194 <LI>Make sure your data is properly scaled. It is a bad idea - for example - if different features have values in different orders of magnitude. You might want to normalize all features to the range [-1,+1], especially for problems with more than 100 features. Or even better, normalize all feature vectors to Euclidian length 1.</LI></UL>
|
wolffd@0
|
195 </UL>
|
wolffd@0
|
196
|
wolffd@0
|
197 <h2>It crashes!</h2>
|
wolffd@0
|
198 <ul>
|
wolffd@0
|
199 <li><i>It crashes during learning when using a non-linear kernel.</i>
|
wolffd@0
|
200 <ul>
|
wolffd@0
|
201 <li>Get version 5.00 or later. There was an initialization bug in the
|
wolffd@0
|
202 kernel cache.</li>
|
wolffd@0
|
203 </ul>
|
wolffd@0
|
204 </li>
|
wolffd@0
|
205 </ul>
|
wolffd@0
|
206 <h2>The results look bogus!</h2>
|
wolffd@0
|
207 <ul>
|
wolffd@0
|
208 <li><i>When doing transductive learning, for some training set sizes it does
|
wolffd@0
|
209 not obey the -p parameter and the results look bogus.</i>
|
wolffd@0
|
210 <ul>
|
wolffd@0
|
211 <li>Get version 5.00 or later. There was a bug introduced in version 4.00.</li>
|
wolffd@0
|
212 </ul>
|
wolffd@0
|
213 </li>
|
wolffd@0
|
214 </ul>
|
wolffd@0
|
215
|
wolffd@0
|
216 <FONT COLOR="#000000">
|
wolffd@0
|
217 <P>Last modified July 1st, 2007 by </FONT><a TARGET="_top" HREF="http://www.joachims.org">Thorsten Joachims</A><FONT COLOR="#000000"> <</FONT><a HREF="mailto:thorsten@joachims.org">thorsten@joachims.org</a><FONT COLOR="#000000">></P></FONT></BODY>
|
wolffd@0
|
218 </HTML>
|