Find text in office files .docx .xlsx .odt .ods: sfk ofind

How to do things
AI Noob vs. Pro

List biggest files
List newest files
Show subdir sizes
Search in files
Replace word in files
List dir differences
Send files in LAN

Free Open Source:

Swiss File Knife

a command line
multi function tool.

remove tabs
list dir sizes
find text
filter lines
find in path
collect text
instant ftp or
http server
file transfer
send text
patch text
patch binary
run own cmd
convert crlf
dup file find
md5 lists
fromto clip
hexdump
split files
list latest
compare dirs
save typing
trace http
echo colors
head & tail
dep. listing
find classes
speed shell
zip search
zip dir list

Depeche View
Source Research
First Steps

windows GUI
automation

command line
file encryption

free external tools,
zero install effort,
usb stick compliant:

zip and unzip
diff and merge
reformat xml
reformat source

cpp sources

log tracing
mem tracing
hexdump
using printf

articles

embedded
stat. c array
stat. java array
var. c array
var. java array
view all text
as you type
surf over text
find by click
quick copy
multi view
find nearby
fullscreen
bookmarks
find by path
expressions
location jump
skip accents
clip match
filter lines
edit text
highlight
load filter
hotkey list
receive text
send in C++
send in Java
smooth scroll
touch scroll
fly wxWidgets
fly over Qt
search Java

Find text in office files like .docx .xlsx .odt .ods on the command line with the free Open Source tool Swiss File Knife.

Download the free Swiss File Knife Base from Sourceforge.
Open the Windows CMD command line, Mac OS X Terminal or Linux shell.
OS X : type mv sfk-mac-64.exe sfk and chmod +x sfk then ./sfk
Linux: type mv sfk-linux-64.exe sfk and chmod +x sfk then ./sfk. OS X and Linux syntax may differ, check the help within the tool.

sfk ofind singleDirName "/searchtext/"
sfk ofind singleFileName "/searchtext/" [options]
sfk ofind -dir mydir -file .docx .xlsx -text "/from/[totext/]"

search in office files like .docx .xlsx .ods .odt
and in plain text files using wildcards * and ?
as well as SFK Simple Expressions in brackets [].

the search text must be surrounded by a delimiter like / or _
or any other character not part of the search text.

by default, full text lines containing hits are shown.
use option -pure to show only the found text.

search text can be followed by a totext to reformat output.

subdirectories are included by default
   the sfk default for most commands is to process the given directories,
   as well as all subdirs within them. specify -nosub to disable this.

options
   -nosub        do not include files in subdirectories.
   -verbose      always show which file is currently read.
   -justoffice   search only in office files, not in plain text etc.
   -case         case-sensitive text comparison. default is insensitive.
                 for details type: sfk help nocase
   -text         starts a list of search patterns of the form /src/ or
                 /src/totext/ where / is the separator char, src the text
                 to search for, and totext a mask to reformat output.
                 any separator char can be used which is not part of the
                 search text, i.e. /foo/ or _foo_ both search "foo".
                 -text is not required if a single filename is given.
   -pat          the same as -text, starting a pattern list.
   -bylist x.txt read search patterns from a file x.txt, supporting
                 multiple lines per pattern. (add -full for more.)
   -bylinelist x read /from/to/ or just /from/ patterns from a file x
                 with one pattern per line. (add -full for more.)
                 -by(line)list does not support sfk variables.
                 to use variables in patterns create an sfk script
                 with patterns as parameters. "sfk script" for more.
   -recsize      set input record size for processing (default=100k).
                 xreplace, xfind and xhexfind extend this automatically
                 based on the largest search patterns.
   -firsthit     show only first found pattern match per file.
   -utfin        with -utfout only: search text is already given
                 as UTF-8, do not convert internally for search.
   -tracesel     tell in detail which files are searched or ignored.
   -quiet        do not show progress infos.
   -names        list only names of files containing at least one hit.
   -notnames     list only names of files not containing any hit.
   -justrc       print no search results, just set return code on hits.
   -full         print full help text telling about -bylist pattern files,
                 special character case sensitivity and nested or repeated
                 replace behaviour.

output options
   -utfout       keep raw UTF-8 encoding on output, to use it
                 with further commands requiring UTF-8 data.
   -conlines=n1  show n lines of context around search hits. by default
                 only text lines containing one or more hits are shown.
                 all lines together cannot hold more than:
   -conchars=n2  max. number of characters of all context lines together.
                 default is 240 or n1*160. cannot be larger than 32000.
   -conresline   show full result line but no further context (default)
   -sep[arator]  show "---" separator between hits within a file.
   -septext s    use separator text s (supports slash patterns \n etc.)
   -nosep        do not show "---" separator between hits within a file.
   -indent=n     set n chars of indentation for result display.
   -pure         extract only searched data, same as -context=0.
                 you may also set an environment variable:
                 set SFK_CONFIG=xfind:pure,xfindbin:pure
                 use -pure -tofile x to extract binary content as is.
   -fill=c       replace binary null and other unprintable characters
                 with character c. default is a dot "."
   -hex          print output as hex dump instead of plain text.
   -showle       highlight CR/LF line endings in hex dump output
   -nofile       do not insert :file header lines in output.
   -crlf, -lf    for file headers and default totext: force crlf or lf
                 line endings instead of system default
   -filehead s   file header to insert on every matching file.
                 only [file.name] surrounded by text can be used.
                 default is -filehead ":file [file.name]" unless a
                 single file is searched. cannot be used with xhexfind.
                 to get result and name in the same line use [file.name]
                 in the expression, like: sfk xfind -pure -nofile mydir
                 "/foo*bar/[file.name]: [all]\n/"
   -sep s        define separator s between hits in a file
   -rawterm      on output to terminal do not strip codes below 32.
                 null bytes are always stripped.
   -to dir\$file write output files to given path. for details about
                 output file masks, type "sfk help opt" or "sfk run".
   -tofile x     write output data to a single output filename x
                 (which is not interpreted as a mask but taken as is).
   +tofile x     as last parameter (command chaining): write text as
                 displayed on terminal to a file x.
   -more[n]      pause output every 30 or n lines.
   -showhits     list matching and missing search patterns.
   -showjusthit  or -showmiss lists only matching or missing patterns.

return codes for batch files
   0 = no matches, 1 = matches found, >1 = major error occurred.
   see also "sfk help opt" on how to influence error processing.

quoted multi line parameters are supported in scripts
   using full trim. type "sfk script" for details.

wildcards and SFK expressions
   SFK Expressions are simple patterns containing literal text,
   wildcards * and ? and character classes in square brackets [].
   basically, the syntax provides extended wilcards but no
   further logic and is not related to regular expressions.

   search patterns are surrounded by a separator character which
   can be anything not contained in the search text, like / or _

   within a pattern /fromtext/totext/ the fromtext may contain:

     *                       - 0 to 4000 characters in the same
                               text line or paragraph, i.e. all
                               bytes not being CR, LF or NULL.
                               4000 is just a default maximum
                               that can be changed by:
     [0.100000 chars]        - 0 to 100000 characters in the same
                               text line or paragraph, i.e. the
                               same as * but with a larger range.
     ?                       - one character.
     ?????                   - same as [5.5 chars] or [5 chars]
     [bytes]                 - 0 to 4000 bytes (with CR,LF,NULL)
                               i.e. it collects stream text
                               across lines, even in binary data
     **                      - the same as [bytes].
     [0.100 bytes]           - 0 to 100 bytes
     [.100000 bytes]         - up to 100000 bytes
     [1.* bytes]             - 1 to default maximum bytes
     [2 chars]               - exactly 2 chars
     [30 bytes]              - exactly 30 bytes
     [byte of aeiou]         - one vocal (a OR A OR e OR ...),
                               case insensitive by default.
                               "aeiou" is a character list.
     [byte of \\\x2f]        - a backslash \ or forw. slash /
     [bytes of \r\n \t]      - whitespace incl. line ends
     [bytes of (\r\n \t)]    - the same, () are optional
     [bytes not \r\n\0]      - up to 4000 bytes as long as no
                               CR, LF or NULL byte appears
     [chars]                 - the same as [bytes not \r\n\0],
                               i.e. collect text in a line
     [char not ( \t)]        - same as [byte not ( \r\n\0\t)],
                               everything not blanks and tabs
     [char not )( \t]        - not brackets, blanks and tabs,
                               same as not (\(\) \t)
     [chars of a-z0-9]       - means a-zA-Z0-9 as search is
                               case insensitive by default
     [chars of \x61-\x7A]    - search a-z but not A-Z, or use
                               option -case for case search
     [eol]                   - end of line by characters:
                               CRLF or LF or CR

     [white]     = chars of (\t )     - 0 or more whitespaces
     [xwhite]    = bytes of (\t \r\n) - same but across lines
     [1 white]   = byte  of (\t )     - 1 whitespace
     [digit]     = byte  of (0-9)     - 1 digit
     [digits]    = bytes of (0-9)     - 0 or more digits
     [hexdigit]  = byte  of (0-9a-f)  - 1 hexadecimal digit
     [hexdigits]  = bytes of (0-9a-f) - 0 or more hex digits

     special keywords that do not count as tokens:
     [skip]   - at the start of a pattern: skip such text
                completely, do not count it as a search hit.
     [keep]   - search also the following text but keep it
                in the input data, without consuming it.
     [ortext] - foo[ortext]bar searches word foo or bar.
                [ortext] is allowed only between literals.

     anchors that have no length of their own:
     [start]  - start of file
     [end]    - end of file
     [lstart] - line start, i.e. start or CRLF or CR or LF
     [lend]   - logical line end, i.e. eol or end of file.
                to replace line ends use [eol] instead.

     how to search or replace special characters:
     -  to search or replace text containing the literal characters
        * ? \ [ ] then these must be escaped like \* \? \\ \[ \]
     -  ( ) are escaped only within character lists, like \( \)
     -  to search or replace the forward slash '/' type \x2f or use
        another char around from/to text, e.g. _fromtext_totext_
     -  parameters with blanks and non trivial characters need double
        quotes "", see also "about Shell Command Characters" below.

     expansion priorities: (highest first)
     if two search parts are side by side, and the same input
     character matches both, then these priorities apply:

       5:  start, end, lstart, lend
       4:  literal text, eol
       3:  whitelist classes: byte of, bytes of
       2:  blacklist classes: chars not, bytes not
       1:  plain wildcards: ?, *, **, byte, bytes, chars

     this means in "/[bytes]foo/" the [bytes] will stop to collect
     characters as soon as "foo" is found, as "foo" is a literal.
     on same or higher priority the right side stops the left side.

     avoid overlapping character groups. for example, [chars][white]
     cannot work, as space and tab are part of chars. to fix this
     extend chars by relevant exclusions: [chars not ( \t)][white]

   the totext may contain:

     [part 1]            use first text part of the fromtext.
                         e.g. the fromtext /*foo[.100 chars]bar*/
                         contains parts :   1 2         3    4 5
     [part1]             the same (blank is optional).
     [parts 1,2,3]       use parts 1, 2 and 3.
     [parts 1-10]        use parts 1 to 10.
     [strip(part1,\0)]   use part 1 but remove zero bytes.
                         only zero bytes "\0" can be removed.
     [file.name]         full input filename with path
     [file.relname]      input filename without path
     [file.path]         input file's path
     [file.base]         relname without last .extension
     [file.ext]          input filename extension
     [all]               use all parts from fromtext.

     [setvar name]...[endvar]   set variable "name" with data
                                between setvar and endvar.
     [getvar name]              fill in data from variable "name"

     although anchors like lstart, lend count as a separate part
     they need NOT be specified in the totext. this means that
     /[lstart]foo[lend]/bar/ just changes the word "foo".

   if replace looses line endings in output
   - when using [eol] in most cases you should add [part...]
     to the output pattern, to copy the actual found line
     separators, or line endings may get lost.

supported slash patterns
   \t    = TAB
   \r    = CR
   \n    = LF
   \x00  = one byte with code 00 hexadecimal
   \0    = short form for \x00
   \q    = a double quote "
   \\    = the backslash character \ itself
   \[    = the bracket open character [
   \]    = the bracket close character ]
   \*    = the literal star character *
   \?    = the literal question mark  ?
   \-    = to use literal "-" in a command
   Within multi line -bylist files:
   \     = slash+blank is changed to a single blank
   Only within "char of" or "byte not" lists:
   \(    = to use literal character "("
   \)    = to use literal character ")"

SFK expression options
   -showpart(s)  print /from/ part numbers, range statistics
                 and expansion priority points per part.
                 done automatically if a required /to/ text
                 is not given with a command.
   -showbest     if a /from/ pattern finds nothing, use this to
                 see how many parts would match so far, and with
                 up to how many bytes per part. anchors like [lstart]
                 may show a non zero length when matching (CR)LF.
   -showlist     with -bylist, show the internal joined list if
                 commands are spread across multiple lines.
   -showall      show all of the above.
   -xmaxlen=n    set default maximum length for chars or bytes commands,
                 e.g. -xmaxlen=10000 means /foo*bar/ matches with up to
                 10000 characters between foo and bar. the default max
                 length without this option is 4000 characters.

performance notes
 - always use a string literal, or single byte or char, at the start
   of your search expressions, like in /foo*bar/ starting with 'f'.
   Do not use a wildcard like * at the start like in /*foobar/
   when searching huge input data, as your search will slow down by
   factor 256. Use /[lstart]*foobar/ instead.
 - the system may cache output file(s), writing to disk in background
   after sfk has finished. subsequent batch commands may execute slower.

chaining support
   sfk extract output can be sent only to +xed or +xex.
   other commands require an xed conversion step like
   sfk extract ... +xed +view

aliases
   sfk xhexfind is the same as xfind -hex
   to extract unmodified binary data you may use either
   sfk xfind -pure ... -tofile or sfk extract ... -tofile

office file support
   sfk ofind        search in .xml text file contents of
                    office files like .docx .xlsx .ods .odt.
   sfk help office  for more infos and options

see also
   sfk xfind        for more search pattern examples

examples
   sfk ofind mydir "/myword/"
      search office and plain text files in mydir
      containing the word 'myword'.
   sfk ofind mydir "/myword/" -names +copy out
      same as above, but copy the found files
      to a folder 'out'.
   sfk ofind mydir "/foo*bar/"
      search foo followed by bar in the same line.
   sfk ofind -pure mydir "/foo**bar/[part2]\n/"
      search text starting with foo, then several
      text lines, then ending with bar. print
      only the found text between foo and bar.

sfk ofind singleDirName "/searchtext/"
sfk ofind singleFileName "/searchtext/" 
   [options]
sfk ofind -dir mydir -file .docx .xlsx 
                       -text "/from/[totext/
                       ]"

search in office files like .docx .xlsx 
.ods .odt
and in plain text files using wildcards 
* and ? as well as SFK Simple Expressions
in brackets [].

the search text must be surrounded by a 
delimiter like / or _
or any other character not part of the 
search text.

by default, full text lines containing 
hits are shown.
use option -pure to show only the 
found text.

search text can be followed by a totext 
to reformat output.

subdirectories are included by default
   the sfk default for most commands is to 
   process the given directories, as well
   as all subdirs within them. specify
   -nosub to disable this.

options
   -nosub        do not include files in 
                 subdirectories.
   -verbose      always show which file is 
                 currently read.
   -justoffice   search only in office 
                 files, not in plain text
                 etc.
   -case         case-sensitive text 
                 comparison. default is
                 insensitive. for details
                 type: sfk help nocase
   -text         starts a list of search 
                 patterns of the form /src/
                 or /src/totext/ where / is
                 the separator char, src
                 the text to search for,
                 and totext a mask to
                 reformat output. any
                 separator char can be used
                 which is not part of the
                 search text, i.e. /foo/ or
                 _foo_ both search "foo".
                 -text is not required if a
                 single filename is given.
   -pat          the same as -text, 
                 starting a pattern list.
   -bylist x.txt read search patterns from 
    a file x.txt, supporting
                 multiple lines per pattern.
                 (add -full for more.)
   -bylinelist x read /from/to/ or just 
    /from/ patterns from a file x
                 with one pattern per line. 
                 (add -full for more.)
                 -by(line)list does not
                 support sfk variables. to
                 use variables in patterns
                 create an sfk script with
                 patterns as parameters.
                 "sfk script" for more.
   -recsize      set input record size for 
                 processing (default=100k).
                 xreplace, xfind and
                 xhexfind extend this
                 automatically based on the
                 largest search patterns.
   -firsthit     show only first found 
                 pattern match per file.
   -utfin        with -utfout only: search 
                       text is already
                       given
                 as UTF-8, do not convert 
                 internally for search.
   -tracesel     tell in detail which files 
                 are searched or ignored.
   -quiet        do not show progress infos.
   -names        list only names of files 
                 containing at least one
                 hit.
   -notnames     list only names of files 
                 not containing any hit.
   -justrc       print no search results, 
                 just set return code on
                 hits.
   -full         print full help text 
                 telling about -bylist
                 pattern files, special
                 character case sensitivity
                 and nested or repeated
                 replace behaviour.

output options
   -utfout       keep raw UTF-8 encoding on 
                 output, to use it with
                 further commands requiring
                 UTF-8 data.
   -conlines=n1  show n lines of context 
                 around search hits. by
                 default only text lines
                 containing one or more
                 hits are shown. all lines
                 together cannot hold more
                 than:
   -conchars=n2  max. number of characters 
                 of all context lines
                 together. default is 240
                 or n1*160. cannot be
                 larger than 32000.
   -conresline   show full result line but 
                 no further context
                 (default)
   -sep[arator]  show "---" separator 
                 between hits within a file.
                 
   -septext s    use separator text s 
                 (supports slash patterns \
                 n etc.)
   -nosep        do not show "---" 
                 separator between hits
                 within a file.
   -indent=n     set n chars of indentation 
                 for result display.
   -pure         extract only searched data,
                 same as -context=0. you
                 may also set an
                 environment variable: set
                 SFK_CONFIG=xfind:pure,
                 xfindbin:pure use -pure
                 -tofile x to extract
                 binary content as is.
   -fill=c       replace binary null and 
                 other unprintable
                 characters with character
                 c. default is a dot "."
   -hex          print output as hex dump 
                 instead of plain text.
   -showle       highlight CR/LF line 
                 endings in hex dump output
   -nofile       do not insert :file header 
                 lines in output.
   -crlf, -lf    for file headers and 
                 default totext: force crlf
                 or lf line endings instead
                 of system default
   -filehead s   file header to insert on 
                 every matching file. only
                 [file.name] surrounded by
                 text can be used. default
                 is -filehead ":file [file.
                 name]" unless a single
                 file is searched. cannot
                 be used with xhexfind. to
                 get result and name in the
                 same line use [file.name]
                 in the expression, like:
                 sfk xfind -pure -nofile
                 mydir "/foo*bar/[file.
                 name]: [all]\n/"
   -sep s        define separator s between 
                 hits in a file
   -rawterm      on output to terminal do 
                 not strip codes below 32.
                 null bytes are always
                 stripped.
   -to dir\$file write output files to 
    given path. for details about
                 output file masks, type 
                 "sfk help opt" or "sfk
                 run".
   -tofile x     write output data to a 
                 single output filename x
                 (which is not interpreted
                 as a mask but taken as is).
                 
   +tofile x     as last parameter (command 
                 chaining): write text as
                 displayed on terminal to a
                 file x.
   -more[n]      pause output every 30 or n 
                 lines.
   -showhits     list matching and missing 
                 search patterns.
   -showjusthit  or -showmiss lists only 
                     matching or missing
                     patterns.

return codes for batch files
   0 = no matches, 1 = matches found, >1 
   = major error occurred. see also "sfk
   help opt" on how to influence error
   processing.

quoted multi line parameters are supported 
in scripts
   using full trim. type "sfk script" for 
   details.

wildcards and SFK expressions
   SFK Expressions are simple patterns 
   containing literal text, wildcards * and
   ? and character classes in square
   brackets []. basically, the syntax
   provides extended wilcards but no
   further logic and is not related to
   regular expressions.

   search patterns are surrounded by a 
   separator character which can be
   anything not contained in the search
   text, like / or _

within a pattern /fromtext/totext/ the 
fromtext may contain:

  *
     0 to 4000 characters in the same text 
     line or paragraph, i.e. all bytes not
     being CR, LF or NULL. 4000 is just a
     default maximum that can be changed
     by:
  [0.100000 chars]
     0 to 100000 characters in the same 
     text line or paragraph, i.e. the same
     as * but with a larger range.
  ?
     one character. 
  ?????
     same as [5.5 chars] or [5 chars] 
  [bytes]
     0 to 4000 bytes (with CR,LF,NULL) i.e. 
     it collects stream text across lines,
     even in binary data
  **
     the same as [bytes]. 
  [0.100 bytes]
     0 to 100 bytes 
  [.100000 bytes]
     up to 100000 bytes 
  [1.* bytes]
     1 to default maximum bytes 
  [2 chars]
     exactly 2 chars 
  [30 bytes]
     exactly 30 bytes 
  [byte of aeiou]
     one vocal (a OR A OR e OR ...), case 
     insensitive by default. "aeiou" is a
     character list.
  [byte of \\\x2f]
     a backslash \ or forw. slash / 
  [bytes of \r\n \t]
     whitespace incl. line ends 
  [bytes of (\r\n \t)]
     the same, () are optional 
  [bytes not \r\n\0]
     up to 4000 bytes as long as no CR, LF 
     or NULL byte appears
  [chars]
     the same as [bytes not \r\n\0], i.e. 
   collect text in a line
  [char not ( \t)]
     same as [byte not ( \r\n\0\t)], 
   everything not blanks and tabs
  [char not )( \t]
     not brackets, blanks and tabs, same as 
     not (\(\) \t)
  [chars of a-z0-9]
     means a-zA-Z0-9 as search is case 
     insensitive by default
  [chars of \x61-\x7A]
     search a-z but not A-Z, or use option 
     -case for case search
  [eol]
     end of line by characters: CRLF or LF 
     or CR

  [white]
     chars of (\t )     - 0 or more 
                     whitespaces
  [xwhite]
     bytes of (\t \r\n) - same but across 
                          lines
  [1 white]
     byte  of (\t )     - 1 whitespace 
  [digit]
     byte  of (0-9)     - 1 digit 
  [digits]
     bytes of (0-9)     - 0 or more digits 
  [hexdigit]
     byte  of (0-9a-f)  - 1 hexadecimal 
                         digit
  [hexdigits]
     bytes of (0-9a-f) - 0 or more hex 
                        digits

  special keywords that do not count as 
tokens:
  [skip]
     at the start of a pattern: skip such 
     text completely, do not count it as a
     search hit.
  [keep]
     search also the following text but 
     keep it in the input data, without
     consuming it.
  [ortext]
     foo[ortext]bar searches word foo or 
     bar. [ortext] is allowed only between
     literals.

  anchors that have no length of their own:
  [start]
     start of file 
  [end]
     end of file 
  [lstart]
     line start, i.e. start or CRLF or CR 
     or LF
  [lend]
     logical line end, i.e. eol or end of 
     file. to replace line ends use [eol]
     instead.

  how to search or replace special 
characters:
  -  to search or replace text containing 
     the literal characters * ? \ [ ]
     then these must be escaped like \* \?
     \\ \[ \]
  -  ( ) are escaped only within 
     character lists, like \( \)
  -  to search or replace the forward 
     slash '/' type \x2f or use another
     char around from/to text, e.g.
     _fromtext_totext_
  -  parameters with blanks and non 
     trivial characters need double quotes
     "", see also "about Shell Command
     Characters" below.

  expansion priorities: (highest first)
  if two search parts are side by side, and 
the same input character matches both,
then these priorities
apply:

    5:  start, end, lstart, lend
    4:  literal text, eol
    3:  whitelist classes: byte of, bytes of
    2:  blacklist classes: chars not, 
        bytes not
    1:  plain wildcards: ?, *, **, byte, 
                         bytes, chars

  this means in "/[bytes]foo/" the [bytes] 
will stop to collect characters as soon
as "foo" is found, as "foo" is a literal.
on same or higher priority the right side
stops the left side.

  avoid overlapping character groups. for 
example, [chars][white]
  cannot work, as space and tab are part of 
chars. to fix this
  extend chars by relevant exclusions: 
[chars not ( \t)][white]

the totext may contain:

  [part 1]
     use first text part of the fromtext. 
                      e.g. the fromtext 
                      /*foo[.100
                      chars]bar*/ contains
                      parts :  1 2
                      3 4 5
  [part1]
     the same (blank is optional). 
  [parts 1,2,3]
     use parts 1, 2 and 3. 
  [parts 1-10]
     use parts 1 to 10. 
  [strip(part1,\0)]
     use part 1 but remove zero bytes. 
                      only zero bytes "\0" 
                      can be removed.
  [file.name]
     full input filename with path 
  [file.relname]
     input filename without path 
  [file.path]
     input file's path 
  [file.base]
     relname without last .extension 
  [file.ext]
     input filename extension 
  [all]
     use all parts from fromtext. 

  [setvar name]...[endvar]
     set variable "name" with data 
                             between setvar 
                             and endvar.
  [getvar name]
     fill in data from variable "name" 

  although anchors like lstart, lend count 
as a separate part they need NOT be
specified in the totext. this means that /
[lstart]foo[lend]/bar/ just changes the
word "foo".

if replace looses line endings in output
    in output 
- when using [eol] in most cases you 
  should add [part...] to the output
  pattern, to copy the actual found line
  separators, or line endings may get lost.

supported slash patterns
   \t    = TAB
   \r    = CR
   \n    = LF
   \x00  = one byte with code 00 
         hexadecimal
   \0    = short form for \x00
   \q    = a double quote "
   \\    = the backslash character \ 
         itself
   \[    = the bracket open character [
   \]    = the bracket close character ]
   \*    = the literal star character *
   \?    = the literal question mark  ?
   \-    = to use literal "-" in a command
   Within multi line -bylist files:
   \     = slash+blank is changed to a 
         single blank
   Only within "char of" or "byte not" 
   lists: \( = to use literal
   character "(" \) = to use literal
   character ")"

SFK expression options
   -showpart(s)  print /from/ part numbers, 
                 range statistics and
                 expansion priority points
                 per part. done
                 automatically if a
                 required /to/ text is not
                 given with a command.
   -showbest     if a /from/ pattern finds 
                 nothing, use this to see
                 how many parts would match
                 so far, and with up to how
                 many bytes per part.
                 anchors like [lstart] may
                 show a non zero length
                 when matching (CR)LF.
   -showlist     with -bylist, show the 
                       internal joined list
                       if
                 commands are spread across 
                 multiple lines.
   -showall      show all of the above.
   -xmaxlen=n    set default maximum length 
                 for chars or bytes
                 commands, e.g.
                 -xmaxlen=10000 means /
                 foo*bar/ matches with up
                 to 10000 characters
                 between foo and bar. the
                 default max length without
                 this option is 4000
                 characters.

performance notes
 - always use a string literal, or single 
   byte or char, at the start of your
   search expressions, like in /foo*bar/
   starting with 'f'. Do not use a
   wildcard like * at the start like in /
   *foobar/ when searching huge input data,
   as your search will slow down by
   factor 256. Use /[lstart]*foobar/
   instead.
 - the system may cache output file(s), 
   writing to disk in background after sfk
   has finished. subsequent batch commands
   may execute slower.

chaining support
   sfk extract output can be sent only to 
   +xed or +xex. other commands require an
   xed conversion step like sfk extract ...
   +xed +view

aliases
   sfk xhexfind is the same as xfind -hex
   to extract unmodified binary data you 
   may use either sfk xfind -pure ...
   -tofile or sfk extract ... -tofile

office file support
   sfk ofind        search in .xml text 
                      file contents of
                    office files like .docx 
                    .xlsx .ods .odt.
   sfk help office  for more infos and 
                    options

see also
   sfk xfind        for more search 
                    pattern examples


examples
   sfk ofind mydir "/myword/"
      search office and plain text files in 
      mydir containing the word 'myword'.
   sfk ofind mydir "/myword/" -names +copy 
   out
      same as above, but copy the found 
      files to a folder 'out'.
   sfk ofind mydir "/foo*bar/"
      search foo followed by bar in the 
      same line.
   sfk ofind -pure mydir 
   "/foo**bar/[part2]\n/"
      search text starting with foo, then 
      several text lines, then ending with
      bar. print only the found text
      between foo and bar.

you are viewing this page in mobile portrait mode with a limited layout. turn your device right, use a desktop browser or buy the sfk e-book for improved reading.

sfk is a free open-source tool, running instantly without installation efforts. no DLL's, no registry changes - just get sfk.exe from the zip package and use it (binaries for windows, linux and mac are included).

the Daily Landscape image
the Daily Mobile Background