Chris@909: # encoding: utf-8 Chris@909: module CodeRay Chris@909: module Scanners Chris@909: Chris@909: # Clojure scanner by Licenser. Chris@909: class Clojure < Scanner Chris@909: Chris@909: register_for :clojure Chris@909: file_extension 'clj' Chris@909: Chris@909: SPECIAL_FORMS = %w[ Chris@909: def if do let quote var fn loop recur throw try catch monitor-enter monitor-exit . Chris@909: new Chris@909: ] # :nodoc: Chris@909: Chris@909: CORE_FORMS = %w[ Chris@909: + - -> ->> .. / * <= < = == >= > accessor aclone add-classpath add-watch Chris@909: agent agent-error agent-errors aget alength alias all-ns alter alter-meta! Chris@909: alter-var-root amap ancestors and apply areduce array-map aset aset-boolean Chris@909: aset-byte aset-char aset-double aset-float aset-int aset-long aset-short Chris@909: assert assoc assoc! assoc-in associative? atom await await-for bases bean Chris@909: bigdec bigint binding bit-and bit-and-not bit-clear bit-flip bit-not bit-or Chris@909: bit-set bit-shift-left bit-shift-right bit-test bit-xor boolean boolean-array Chris@909: booleans bound-fn bound-fn* bound? butlast byte byte-array bytes case cast char Chris@909: char-array char-escape-string char-name-string char? chars class class? Chris@909: clear-agent-errors clojure-version coll? comment commute comp comparator Chris@909: compare compare-and-set! compile complement concat cond condp conj conj! Chris@909: cons constantly construct-proxy contains? count counted? create-ns Chris@909: create-struct cycle dec decimal? declare definline defmacro defmethod defmulti Chris@909: defn defn- defonce defprotocol defrecord defstruct deftype delay delay? Chris@909: deliver denominator deref derive descendants disj disj! dissoc dissoc! Chris@909: distinct distinct? doall doc dorun doseq dosync dotimes doto double Chris@909: double-array doubles drop drop-last drop-while empty empty? ensure Chris@909: enumeration-seq error-handler error-mode eval even? every? extend Chris@909: extend-protocol extend-type extenders extends? false? ffirst file-seq Chris@909: filter find find-doc find-ns find-var first float float-array float? Chris@909: floats flush fn fn? fnext for force format future future-call future-cancel Chris@909: future-cancelled? future-done? future? gen-class gen-interface gensym get Chris@909: get-in get-method get-proxy-class get-thread-bindings get-validator hash Chris@909: hash-map hash-set identical? identity if-let if-not ifn? import in-ns Chris@909: inc init-proxy instance? int int-array integer? interleave intern Chris@909: interpose into into-array ints io! isa? iterate iterator-seq juxt key Chris@909: keys keyword keyword? last lazy-cat lazy-seq let letfn line-seq list list* Chris@909: list? load load-file load-reader load-string loaded-libs locking long Chris@909: long-array longs loop macroexpand macroexpand-1 make-array make-hierarchy Chris@909: map map? mapcat max max-key memfn memoize merge merge-with meta methods Chris@909: min min-key mod name namespace neg? newline next nfirst nil? nnext not Chris@909: not-any? not-empty not-every? not= ns ns-aliases ns-imports ns-interns Chris@909: ns-map ns-name ns-publics ns-refers ns-resolve ns-unalias ns-unmap nth Chris@909: nthnext num number? numerator object-array odd? or parents partial Chris@909: partition pcalls peek persistent! pmap pop pop! pop-thread-bindings Chris@909: pos? pr pr-str prefer-method prefers print print-namespace-doc Chris@909: print-str printf println println-str prn prn-str promise proxy Chris@909: proxy-mappings proxy-super push-thread-bindings pvalues quot rand Chris@909: rand-int range ratio? rationalize re-find re-groups re-matcher Chris@909: re-matches re-pattern re-seq read read-line read-string reduce ref Chris@909: ref-history-count ref-max-history ref-min-history ref-set refer Chris@909: refer-clojure reify release-pending-sends rem remove remove-all-methods Chris@909: remove-method remove-ns remove-watch repeat repeatedly replace replicate Chris@909: require reset! reset-meta! resolve rest restart-agent resultset-seq Chris@909: reverse reversible? rseq rsubseq satisfies? second select-keys send Chris@909: send-off seq seq? seque sequence sequential? set set-error-handler! Chris@909: set-error-mode! set-validator! set? short short-array shorts Chris@909: shutdown-agents slurp some sort sort-by sorted-map sorted-map-by Chris@909: sorted-set sorted-set-by sorted? special-form-anchor special-symbol? Chris@909: split-at split-with str string? struct struct-map subs subseq subvec Chris@909: supers swap! symbol symbol? sync syntax-symbol-anchor take take-last Chris@909: take-nth take-while test the-ns thread-bound? time to-array to-array-2d Chris@909: trampoline transient tree-seq true? type unchecked-add unchecked-dec Chris@909: unchecked-divide unchecked-inc unchecked-multiply unchecked-negate Chris@909: unchecked-remainder unchecked-subtract underive update-in update-proxy Chris@909: use val vals var-get var-set var? vary-meta vec vector vector-of vector? Chris@909: when when-first when-let when-not while with-bindings with-bindings* Chris@909: with-in-str with-local-vars with-meta with-open with-out-str Chris@909: with-precision xml-seq zero? zipmap Chris@909: ] # :nodoc: Chris@909: Chris@909: PREDEFINED_CONSTANTS = %w[ Chris@909: true false nil *1 *2 *3 *agent* *clojure-version* *command-line-args* Chris@909: *compile-files* *compile-path* *e *err* *file* *flush-on-newline* Chris@909: *in* *ns* *out* *print-dup* *print-length* *print-level* *print-meta* Chris@909: *print-readably* *read-eval* *warn-on-reflection* Chris@909: ] # :nodoc: Chris@909: Chris@909: IDENT_KIND = WordList.new(:ident). Chris@909: add(SPECIAL_FORMS, :keyword). Chris@909: add(CORE_FORMS, :keyword). Chris@909: add(PREDEFINED_CONSTANTS, :predefined_constant) Chris@909: Chris@909: KEYWORD_NEXT_TOKEN_KIND = WordList.new(nil). Chris@909: add(%w[ def defn defn- definline defmacro defmulti defmethod defstruct defonce declare ], :function). Chris@909: add(%w[ ns ], :namespace). Chris@909: add(%w[ defprotocol defrecord ], :class) Chris@909: Chris@909: BASIC_IDENTIFIER = /[a-zA-Z$%*\/_+!?&<>\-=]=?[a-zA-Z0-9$&*+!\/_?<>\-\#]*/ Chris@909: IDENTIFIER = /(?!-\d)(?:(?:#{BASIC_IDENTIFIER}\.)*#{BASIC_IDENTIFIER}(?:\/#{BASIC_IDENTIFIER})?\.?)|\.\.?/ Chris@909: SYMBOL = /::?#{IDENTIFIER}/o Chris@909: DIGIT = /\d/ Chris@909: DIGIT10 = DIGIT Chris@909: DIGIT16 = /[0-9a-f]/i Chris@909: DIGIT8 = /[0-7]/ Chris@909: DIGIT2 = /[01]/ Chris@909: RADIX16 = /\#x/i Chris@909: RADIX8 = /\#o/i Chris@909: RADIX2 = /\#b/i Chris@909: RADIX10 = /\#d/i Chris@909: EXACTNESS = /#i|#e/i Chris@909: SIGN = /[\+-]?/ Chris@909: EXP_MARK = /[esfdl]/i Chris@909: EXP = /#{EXP_MARK}#{SIGN}#{DIGIT}+/ Chris@909: SUFFIX = /#{EXP}?/ Chris@909: PREFIX10 = /#{RADIX10}?#{EXACTNESS}?|#{EXACTNESS}?#{RADIX10}?/ Chris@909: PREFIX16 = /#{RADIX16}#{EXACTNESS}?|#{EXACTNESS}?#{RADIX16}/ Chris@909: PREFIX8 = /#{RADIX8}#{EXACTNESS}?|#{EXACTNESS}?#{RADIX8}/ Chris@909: PREFIX2 = /#{RADIX2}#{EXACTNESS}?|#{EXACTNESS}?#{RADIX2}/ Chris@909: UINT10 = /#{DIGIT10}+#*/ Chris@909: UINT16 = /#{DIGIT16}+#*/ Chris@909: UINT8 = /#{DIGIT8}+#*/ Chris@909: UINT2 = /#{DIGIT2}+#*/ Chris@909: DECIMAL = /#{DIGIT10}+#+\.#*#{SUFFIX}|#{DIGIT10}+\.#{DIGIT10}*#*#{SUFFIX}|\.#{DIGIT10}+#*#{SUFFIX}|#{UINT10}#{EXP}/ Chris@909: UREAL10 = /#{UINT10}\/#{UINT10}|#{DECIMAL}|#{UINT10}/ Chris@909: UREAL16 = /#{UINT16}\/#{UINT16}|#{UINT16}/ Chris@909: UREAL8 = /#{UINT8}\/#{UINT8}|#{UINT8}/ Chris@909: UREAL2 = /#{UINT2}\/#{UINT2}|#{UINT2}/ Chris@909: REAL10 = /#{SIGN}#{UREAL10}/ Chris@909: REAL16 = /#{SIGN}#{UREAL16}/ Chris@909: REAL8 = /#{SIGN}#{UREAL8}/ Chris@909: REAL2 = /#{SIGN}#{UREAL2}/ Chris@909: IMAG10 = /i|#{UREAL10}i/ Chris@909: IMAG16 = /i|#{UREAL16}i/ Chris@909: IMAG8 = /i|#{UREAL8}i/ Chris@909: IMAG2 = /i|#{UREAL2}i/ Chris@909: COMPLEX10 = /#{REAL10}@#{REAL10}|#{REAL10}\+#{IMAG10}|#{REAL10}-#{IMAG10}|\+#{IMAG10}|-#{IMAG10}|#{REAL10}/ Chris@909: COMPLEX16 = /#{REAL16}@#{REAL16}|#{REAL16}\+#{IMAG16}|#{REAL16}-#{IMAG16}|\+#{IMAG16}|-#{IMAG16}|#{REAL16}/ Chris@909: COMPLEX8 = /#{REAL8}@#{REAL8}|#{REAL8}\+#{IMAG8}|#{REAL8}-#{IMAG8}|\+#{IMAG8}|-#{IMAG8}|#{REAL8}/ Chris@909: COMPLEX2 = /#{REAL2}@#{REAL2}|#{REAL2}\+#{IMAG2}|#{REAL2}-#{IMAG2}|\+#{IMAG2}|-#{IMAG2}|#{REAL2}/ Chris@909: NUM10 = /#{PREFIX10}?#{COMPLEX10}/ Chris@909: NUM16 = /#{PREFIX16}#{COMPLEX16}/ Chris@909: NUM8 = /#{PREFIX8}#{COMPLEX8}/ Chris@909: NUM2 = /#{PREFIX2}#{COMPLEX2}/ Chris@909: NUM = /#{NUM10}|#{NUM16}|#{NUM8}|#{NUM2}/ Chris@909: Chris@909: protected Chris@909: Chris@909: def scan_tokens encoder, options Chris@909: Chris@909: state = :initial Chris@909: kind = nil Chris@909: Chris@909: until eos? Chris@909: Chris@909: case state Chris@909: when :initial Chris@909: if match = scan(/ \s+ | \\\n | , /x) Chris@909: encoder.text_token match, :space Chris@909: elsif match = scan(/['`\(\[\)\]\{\}]|\#[({]|~@?|[@\^]/) Chris@909: encoder.text_token match, :operator Chris@909: elsif match = scan(/;.*/) Chris@909: encoder.text_token match, :comment # TODO: recognize (comment ...) too Chris@909: elsif match = scan(/\#?\\(?:newline|space|.?)/) Chris@909: encoder.text_token match, :char Chris@909: elsif match = scan(/\#[ft]/) Chris@909: encoder.text_token match, :predefined_constant Chris@909: elsif match = scan(/#{IDENTIFIER}/o) Chris@909: kind = IDENT_KIND[match] Chris@909: encoder.text_token match, kind Chris@909: if rest? && kind == :keyword Chris@909: if kind = KEYWORD_NEXT_TOKEN_KIND[match] Chris@909: encoder.text_token match, :space if match = scan(/\s+/o) Chris@909: encoder.text_token match, kind if match = scan(/#{IDENTIFIER}/o) Chris@909: end Chris@909: end Chris@909: elsif match = scan(/#{SYMBOL}/o) Chris@909: encoder.text_token match, :symbol Chris@909: elsif match = scan(/\./) Chris@909: encoder.text_token match, :operator Chris@909: elsif match = scan(/ \# \^ #{IDENTIFIER} /ox) Chris@909: encoder.text_token match, :type Chris@909: elsif match = scan(/ (\#)? " /x) Chris@909: state = self[1] ? :regexp : :string Chris@909: encoder.begin_group state Chris@909: encoder.text_token match, :delimiter Chris@909: elsif match = scan(/#{NUM}/o) and not matched.empty? Chris@909: encoder.text_token match, match[/[.e\/]/i] ? :float : :integer Chris@909: else Chris@909: encoder.text_token getch, :error Chris@909: end Chris@909: Chris@909: when :string, :regexp Chris@909: if match = scan(/[^"\\]+|\\.?/) Chris@909: encoder.text_token match, :content Chris@909: elsif match = scan(/"/) Chris@909: encoder.text_token match, :delimiter Chris@909: encoder.end_group state Chris@909: state = :initial Chris@909: else Chris@909: raise_inspect "else case \" reached; %p not handled." % peek(1), Chris@909: encoder, state Chris@909: end Chris@909: Chris@909: else Chris@909: raise 'else case reached' Chris@909: Chris@909: end Chris@909: Chris@909: end Chris@909: Chris@909: if [:string, :regexp].include? state Chris@909: encoder.end_group state Chris@909: end Chris@909: Chris@909: encoder Chris@909: Chris@909: end Chris@909: end Chris@909: end Chris@909: end