annotate toolboxes/MIRtoolbox1.3.2/somtoolbox/som_read_data.m @ 0:cc4b1211e677 tip

initial commit to HG from Changeset: 646 (e263d8a21543) added further path and more save "camirversion.m"
author Daniel Wolff
date Fri, 19 Aug 2016 13:07:06 +0200
parents
children
rev   line source
Daniel@0 1 function sData = som_read_data(filename, varargin)
Daniel@0 2
Daniel@0 3 %SOM_READ_DATA Read data from an ascii file in SOM_PAK format.
Daniel@0 4 %
Daniel@0 5 % sD = som_read_data(filename, dim, [missing])
Daniel@0 6 % sD = som_read_data(filename, [missing])
Daniel@0 7 %
Daniel@0 8 % sD = som_read_data('system.data');
Daniel@0 9 % sD = som_read_data('system.data',10);
Daniel@0 10 % sD = som_read_data('system.data','*');
Daniel@0 11 % sD = som_read_data('system.data',10,'*');
Daniel@0 12 %
Daniel@0 13 % Input and output arguments ([]'s are optional):
Daniel@0 14 % filename (string) input file name
Daniel@0 15 % dim (scalar) input space dimension
Daniel@0 16 % [missing] (string) string which indicates a missing component
Daniel@0 17 % value, 'NaN' by default
Daniel@0 18 %
Daniel@0 19 % sD (struct) data struct
Daniel@0 20 %
Daniel@0 21 % Reads data from an ascii file. The file must be in SOM_PAK format,
Daniel@0 22 % except that it may lack the input space dimension from the first
Daniel@0 23 % line.
Daniel@0 24 %
Daniel@0 25 % For more help, try 'type som_read_data' or check out online documentation.
Daniel@0 26 % See also SOM_WRITE_DATA, SOM_READ_COD, SOM_WRITE_COD, SOM_DATA_STRUCT.
Daniel@0 27
Daniel@0 28 %%%%%%%%%%%%% DETAILED DESCRIPTION %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Daniel@0 29 %
Daniel@0 30 % som_read_data
Daniel@0 31 %
Daniel@0 32 % PURPOSE
Daniel@0 33 %
Daniel@0 34 % Reads data from an ascii file in SOM_PAK format.
Daniel@0 35 %
Daniel@0 36 % SYNTAX
Daniel@0 37 %
Daniel@0 38 % sD = som_read_data(filename)
Daniel@0 39 % sD = som_read_data(..., dim)
Daniel@0 40 % sD = som_read_data(..., 'missing')
Daniel@0 41 % sD = som_read_data(..., dim, 'missing')
Daniel@0 42 %
Daniel@0 43 % DESCRIPTION
Daniel@0 44 %
Daniel@0 45 % This function is offered for compatibility with SOM_PAK, a SOM software
Daniel@0 46 % package in C. It reads data from a file in SOM_PAK format.
Daniel@0 47 %
Daniel@0 48 % The SOM_PAK data file format is as follows. The first line must
Daniel@0 49 % contain the input space dimension and nothing else. The following
Daniel@0 50 % lines are comment lines, empty lines or data lines. Unlike programs
Daniel@0 51 % in SOM_PAK, this function can also determine the input dimension
Daniel@0 52 % from the first data lines, if the input space dimension line is
Daniel@0 53 % missing. Note that the SOM_PAK format is not fully supported: data
Daniel@0 54 % vector 'weight' and 'fixed' properties are ignored (they are treated
Daniel@0 55 % as labels).
Daniel@0 56 %
Daniel@0 57 % Each data line contains one data vector and its labels. From the beginning
Daniel@0 58 % of the line, first are values of the vector components separated by
Daniel@0 59 % whitespaces, then labels also separated by whitespaces. If there are
Daniel@0 60 % missing values in the vector, the missing value marker needs to be
Daniel@0 61 % specified as the last input argument ('NaN' by default). The missing
Daniel@0 62 % values are stored as NaNs in the data struct.
Daniel@0 63 %
Daniel@0 64 % Comment lines start with '#'. Comment lines as well as empty lines are
Daniel@0 65 % ignored, except if the comment lines that start with '#n' or '#l'. In that
Daniel@0 66 % case the line should contain names of the vector components or label names
Daniel@0 67 % separated by whitespaces.
Daniel@0 68 %
Daniel@0 69 % NOTE: The minimum value Matlab is able to deal with (realmax)
Daniel@0 70 % should not appear in the input file. This is because function sscanf is
Daniel@0 71 % not able to read NaNs: the NaNs are in the read phase converted to value
Daniel@0 72 % realmax.
Daniel@0 73 %
Daniel@0 74 % REQUIRED INPUT ARGUMENTS
Daniel@0 75 %
Daniel@0 76 % filename (string) input filename
Daniel@0 77 %
Daniel@0 78 % OPTIONAL INPUT ARGUMENTS
Daniel@0 79 %
Daniel@0 80 % dim (scalar) input space dimension
Daniel@0 81 % missing (string) string used to denote missing components (NaNs);
Daniel@0 82 % default is 'NaN'
Daniel@0 83 %
Daniel@0 84 % OUTPUT ARGUMENTS
Daniel@0 85 %
Daniel@0 86 % sD (struct) the resulting data struct
Daniel@0 87 %
Daniel@0 88 % EXAMPLES
Daniel@0 89 %
Daniel@0 90 % The basic usage is:
Daniel@0 91 % sD = som_read_data('system.data');
Daniel@0 92 %
Daniel@0 93 % If you know the input space dimension beforehand, and the file does
Daniel@0 94 % not contain it on the first line, it helps if you specify it as the
Daniel@0 95 % second argument:
Daniel@0 96 % sD = som_read_data('system.data',9);
Daniel@0 97 %
Daniel@0 98 % If the missing components in the data are marked with some other
Daniel@0 99 % characters than with 'NaN', you can specify it with the last argument:
Daniel@0 100 % sD = som_read_data('system.data',9,'*')
Daniel@0 101 % sD = som_read_data('system.data','NaN')
Daniel@0 102 %
Daniel@0 103 % Here's an example data file:
Daniel@0 104 %
Daniel@0 105 % 5
Daniel@0 106 % #n one two three four five
Daniel@0 107 % #l ID
Daniel@0 108 % 10 2 3 4 5 1stline label
Daniel@0 109 % 0.4 0.3 0.2 0.5 0.1 2ndline label1 label2
Daniel@0 110 % # comment line: missing components are indicated by 'x':s
Daniel@0 111 % 1 x 1 x 1 3rdline missing_components
Daniel@0 112 % x 1 2 2 2
Daniel@0 113 % x x x x x 5thline emptyline
Daniel@0 114 %
Daniel@0 115 % SEE ALSO
Daniel@0 116 %
Daniel@0 117 % som_write_data Writes data structs/matrices to a file in SOM_PAK format.
Daniel@0 118 % som_read_cod Read a map from a file in SOM_PAK format.
Daniel@0 119 % som_write_cod Writes data struct into a file in SOM_PAK format.
Daniel@0 120 % som_data_struct Creates data structs.
Daniel@0 121
Daniel@0 122 % Copyright (c) 1997-2000 by the SOM toolbox programming team.
Daniel@0 123 % http://www.cis.hut.fi/projects/somtoolbox/
Daniel@0 124
Daniel@0 125 % Version 1.0beta ecco 221097
Daniel@0 126 % Version 2.0beta ecco 060899, juuso 151199
Daniel@0 127
Daniel@0 128 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Daniel@0 129 %% check arguments
Daniel@0 130
Daniel@0 131 error(nargchk(1, 3, nargin)) % check no. of input args is correct
Daniel@0 132
Daniel@0 133 dont_care = 'NaN'; % default don't care string
Daniel@0 134 comment_start = '#'; % the char a SOM_PAK command line starts with
Daniel@0 135 comp_name_line = '#n'; % string denoting a special command line,
Daniel@0 136 % which contains names of each component
Daniel@0 137 label_name_line = '#l'; % string denoting a special command line,
Daniel@0 138 % which contains names of each label
Daniel@0 139 block_size = 1000; % block size used in file read
Daniel@0 140
Daniel@0 141 kludge = num2str(realmax, 100); % used in sscanf
Daniel@0 142
Daniel@0 143
Daniel@0 144 % open input file
Daniel@0 145
Daniel@0 146 fid = fopen(filename);
Daniel@0 147 if fid < 0
Daniel@0 148 error(['Cannot open ' filename]);
Daniel@0 149 end
Daniel@0 150
Daniel@0 151 % process input arguments
Daniel@0 152
Daniel@0 153 if nargin == 2
Daniel@0 154 if isstr(varargin{1})
Daniel@0 155 dont_care = varargin{1};
Daniel@0 156 else
Daniel@0 157 dim = varargin{1};
Daniel@0 158 end
Daniel@0 159 elseif nargin == 3
Daniel@0 160 dim = varargin{1};
Daniel@0 161 dont_care = varargin{2};
Daniel@0 162 end
Daniel@0 163
Daniel@0 164 % if the data dimension is not specified, find out what it is
Daniel@0 165
Daniel@0 166 if nargin == 1 | (nargin == 2 & isstr(varargin{1}))
Daniel@0 167
Daniel@0 168 fpos1 = ftell(fid); c1 = 0; % read first non-comment line
Daniel@0 169 while c1 == 0,
Daniel@0 170 line1 = strrep(fgetl(fid), dont_care, kludge);
Daniel@0 171 [l1, c1] = sscanf(line1, '%f ');
Daniel@0 172 end
Daniel@0 173
Daniel@0 174 fpos2 = ftell(fid); c2 = 0; % read second non-comment line
Daniel@0 175 while c2 == 0,
Daniel@0 176 line2 = strrep(fgetl(fid), dont_care, kludge);
Daniel@0 177 [l2, c2] = sscanf(line2, '%f ');
Daniel@0 178 end
Daniel@0 179
Daniel@0 180 if (c1 == 1 & c2 ~= 1) | (c1 == c2 & c1 == 1 & l1 == 1)
Daniel@0 181 dim = l1;
Daniel@0 182 fseek(fid, fpos2, -1);
Daniel@0 183 elseif (c1 == c2)
Daniel@0 184 dim = c1;
Daniel@0 185 fseek(fid, fpos1, -1);
Daniel@0 186 warning on
Daniel@0 187 warning(['Automatically determined data dimension is ' ...
Daniel@0 188 num2str(dim) '. Is it correct?']);
Daniel@0 189 else
Daniel@0 190 error(['Invalid header line: ' line1]);
Daniel@0 191 end
Daniel@0 192 end
Daniel@0 193
Daniel@0 194 % check the dimension is valid
Daniel@0 195
Daniel@0 196 if dim < 1 | dim ~= round(dim)
Daniel@0 197 error(['Illegal data dimension: ' num2str(dim)]);
Daniel@0 198 end
Daniel@0 199
Daniel@0 200 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Daniel@0 201 %% read data
Daniel@0 202
Daniel@0 203 sData = som_data_struct(zeros(1, dim), 'name', filename);
Daniel@0 204 lnum = 0; % data vector counter
Daniel@0 205 data_temp = zeros(block_size, dim);
Daniel@0 206 labs_temp = cell(block_size, 1);
Daniel@0 207 comp_names = sData.comp_names;
Daniel@0 208 label_names = sData.label_names;
Daniel@0 209 form = [repmat('%g',[1 dim-1]) '%g%[^ \t]'];
Daniel@0 210
Daniel@0 211 limit = block_size;
Daniel@0 212 while 1,
Daniel@0 213 li = fgetl(fid); % read next line
Daniel@0 214 if ~isstr(li), break, end; % is this the end of file?
Daniel@0 215
Daniel@0 216 % all missing vectors are replaced by value realmax because
Daniel@0 217 % sscanf is not able to read NaNs
Daniel@0 218 li = strrep(li, dont_care, kludge);
Daniel@0 219 [data, c, err, n] = sscanf(li, form);
Daniel@0 220 if c < dim % if there were less numbers than dim on the input file line
Daniel@0 221 if c == 0
Daniel@0 222 if strncmp(li, comp_name_line, 2) % component name line?
Daniel@0 223 li = strrep(li(3:end), kludge, dont_care); i = 0; c = 1;
Daniel@0 224 while c
Daniel@0 225 [s, c, e, n] = sscanf(li, '%s%[^ \t]');
Daniel@0 226 if ~isempty(s), i = i + 1; comp_names{i} = s; li = li(n:end); end
Daniel@0 227 end
Daniel@0 228
Daniel@0 229 if i ~= dim
Daniel@0 230 error(['Illegal number of component names: ' num2str(i) ...
Daniel@0 231 ' (dimension is ' num2str(dim) ')']);
Daniel@0 232 end
Daniel@0 233 elseif strncmp(li, label_name_line, 2) % label name line?
Daniel@0 234 li = strrep(li(3:end), kludge, dont_care); i = 0; c = 1;
Daniel@0 235 while c
Daniel@0 236 [s, c, e, n] = sscanf(li, '%s%[^ \t]');
Daniel@0 237 if ~isempty(s), i = i + 1; label_names{i} = s; li = li(n:end); end
Daniel@0 238 end
Daniel@0 239 elseif ~strncmp(li, comment_start, 1) % not a comment, is it error?
Daniel@0 240 [s, c, e, n] = sscanf(li, '%s%[^ \t]');
Daniel@0 241 if c
Daniel@0 242 error(['Invalid vector on input file data line ' ...
Daniel@0 243 num2str(lnum+1) ': [' deblank(li) ']']),
Daniel@0 244 end
Daniel@0 245 end
Daniel@0 246 else
Daniel@0 247 error(['Only ' num2str(c) ' vector components on input file data line ' ...
Daniel@0 248 num2str(lnum+1) ' (dimension is ' num2str(dim) ')']);
Daniel@0 249 end
Daniel@0 250
Daniel@0 251 else
Daniel@0 252
Daniel@0 253 lnum = lnum + 1; % this was a line containing data vector
Daniel@0 254 data_temp(lnum, 1:dim) = data'; % add data to struct
Daniel@0 255
Daniel@0 256 if lnum == limit % reserve more memory if necessary
Daniel@0 257 data_temp(lnum+1:lnum+block_size, 1:dim) = zeros(block_size, dim);
Daniel@0 258 [dummy nl] = size(labs_temp);
Daniel@0 259 labs_temp(lnum+1:lnum+block_size,1:nl) = cell(block_size, nl);
Daniel@0 260 limit = limit + block_size;
Daniel@0 261 end
Daniel@0 262
Daniel@0 263 % read labels
Daniel@0 264
Daniel@0 265 if n < length(li)
Daniel@0 266 li = strrep(li(n:end), kludge, dont_care); i = 0; n = 1; c = 1;
Daniel@0 267 while c
Daniel@0 268 [s, c, e, n_new] = sscanf(li(n:end), '%s%[^ \t]');
Daniel@0 269 if c, i = i + 1; labs_temp{lnum, i} = s; n = n + n_new - 1; end
Daniel@0 270 end
Daniel@0 271 end
Daniel@0 272 end
Daniel@0 273 end
Daniel@0 274
Daniel@0 275 % close input file
Daniel@0 276 if fclose(fid) < 0, error(['Cannot close file ' filename]);
Daniel@0 277 else fprintf(2, '\rdata read ok \n'); end
Daniel@0 278
Daniel@0 279 % set values
Daniel@0 280 data_temp(data_temp == realmax) = NaN;
Daniel@0 281 sData.data = data_temp(1:lnum,:);
Daniel@0 282 sData.labels = labs_temp(1:lnum,:);
Daniel@0 283 sData.comp_names = comp_names;
Daniel@0 284 sData.label_names = label_names;
Daniel@0 285
Daniel@0 286 return;
Daniel@0 287
Daniel@0 288 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%