DaveM@6: function featureVector = rfFeatureSelection(data, labels, numFeatures, iterMethod, numTrees, featureVector) DaveM@4: % rfFeatureSelection(data, labels, numFeatures, iterMethod, numTrees) DaveM@3: % DaveM@4: % using random forests to perform feature selection for a given data set DaveM@4: % data has size (x,y), where x is the number of labels and y, the number of DaveM@4: % features. DaveM@4: % labels is the set of labels for the data DaveM@4: % numFeatures is the dimension of the output vector (default 5) DaveM@4: % iterMethod is the method for which the features are cut down DaveM@5: % * 'onePass' will simply select the top (numFeatures) features and DaveM@5: % report them DaveM@5: % * 'cutX' will iteratively cut the bottom X percent of DaveM@5: % features out, and perform random forest feature selection on the DaveM@5: % new set, until the desired number of features has been returned DaveM@5: % * 'oobErr' will do something with the out-of-bag error, and return DaveM@4: % that in some way, but this has not been implemented yet. DaveM@5: % * 'featureDeltaErr' will do something with the feature importance DaveM@5: % prediction error, and return that in some way, but this has not DaveM@5: % been implemented yet. The OOBPermutedVarDeltaError property is a DaveM@5: % numeric array of size 1-by-Nvars containing a measure of importance DaveM@5: % for each predictor variable (feature). For any variable, the DaveM@5: % measure is the increase in prediction error if the values of that DaveM@5: % variable are permuted across the out-of-bag observations. This DaveM@5: % measure is computed for every tree, then averaged over the entire DaveM@5: % ensemble and divided by the standard deviation over the entire DaveM@5: % ensemble. DaveM@6: % featureVector is a list of the features to use, for recursive purposes. DaveM@3: DaveM@3: if(length(labels) ~= size(data,1)) DaveM@3: error('labels and data do not match up'); DaveM@3: end DaveM@3: DaveM@3: if(nargin < 2) DaveM@3: error('must pass data and labels into function') DaveM@3: end DaveM@3: if(nargin < 3) DaveM@3: numFeatures = 5; DaveM@3: end DaveM@3: if(nargin < 4) DaveM@3: iterMethod = 'onePass'; DaveM@3: end DaveM@3: if(nargin < 5) DaveM@3: numTrees = 200; DaveM@3: end DaveM@6: if(nargin < 5) DaveM@6: featureVector = 1:size(data,2); DaveM@6: end DaveM@3: DaveM@3: DaveM@6: if(length(featureVector) > numFeatures) DaveM@6: options = statset('UseParallel', true); DaveM@6: b = TreeBagger(numTrees, data(:,featureVector), labels,'OOBVarImp','On',... DaveM@6: 'SampleWithReplacement', 'Off','FBoot', 0.632,'Options', options); DaveM@6: [FI,I] = sort(b.OOBPermutedVarDeltaError,'descend'); DaveM@6: featureVector = featureVector(I); DaveM@3: DaveM@6: if(strcmp(iterMethod,'onePass')) DaveM@6: disp('onePass') DaveM@6: featureVector = featureVector(1:numFeatures); DaveM@6: elseif(strcmp(iterMethod(1:3),'cut')) DaveM@6: disp(iterMethod) DaveM@6: cutPercentage = str2double(iterMethod(4:end)); DaveM@6: cutSize = max(floor(length(featureVector)*cutPercentage/100),1); DaveM@6: if(length(featureVector) - cutSize < numFeatures) DaveM@6: cutSize = length(featureVector) - numFeatures; DaveM@6: end DaveM@6: featureVector = featureVector(1:end-cutSize); DaveM@6: % data = data(:,sort(featureVector)); DaveM@6: featureVector = rfFeatureSelection(data, labels, numFeatures, iterMethod, numTrees, featureVector); DaveM@6: elseif(strcmp(iterMethod,'oobErr')) DaveM@6: warning('This method has not been implemented yet, using onePass to return results') DaveM@6: featureVector = featureVector(1:numFeatures); DaveM@6: elseif(strcmp(iterMethod,'featureDeltaErr')) DaveM@6: warning('This method has not been implemented yet, using onePass to return results') DaveM@6: % this will use variable FI DaveM@6: featureVector = featureVector(1:numFeatures); DaveM@6: end DaveM@3: end DaveM@3: end