Fast search of a string into a file  

Send By: Radikal (Q3 Team)
Web : http://www.q3.nu
Email: radikal@q3.nu
Date: 20/01/04

Tip accessed 568 times

 


Serve this trick like example of a quick search in a file using readings through a Buffer to hurry.
In short, in this example it is to look for the first time that a string appears in a file (in case it appears, of course) indicating its position from the beginning of the file.
It would be as making a search by means of a Pos(Substring, String), unless instead of looking for in a string, we will be able to read in a file of several gigas, but with the advantage of not having to load it suddenly in memory.
To achieve it, we will go loading the file in a memory Buffer (that is of 8 Kbytes in the example), piece for piece.
The process is like it continues:
  • We load a piece of 8 Kbytes and we look for inside the one.
  • If we find the string in that piece of 8 kbytes, we end up and we show where was found.
  • If the string was not found in that piece, we will repeat the process, that is to say, we will load a new piece and we look for again.
    All this is very well, but we leave ourselves a small detail: What happens if the string is located just between two pieces of those of 8 Kbytes?... because it would pass then our search would fail wretchedly:)
    To avoid it, we rewind the Stream a piece back fair before reading the following one.
    In short, we will rewind so many bytes like the longitude of the looked for chain, we make sure this way that we will find it although it plunders in a middle of two pieces.
    Easily it could be adapted for, for example, to count the times that it is that string inside the file, to substitute a string for other, to build your own command Grep, etc, etc...
    Here is the function and a call example is, everything it content in the OnClick of a TButton anyone:

     procedure TForm1.Button1Click(Sender: TObject);
     var
      EncontradaEn : integer;
    
    
      function BuscaStringEnFichero(const Fichero: string ;const Cadena: string):integer;
      { Busca la primera vez que la cadena 'Cadena' aparece dentro del fichero 'Fichero',
        devolviendo la posición (Offset) en la que se encuentra (contando desde el principio
        del fichero) o bien devuelve un -1 si la cadena no fué encontrada.
        It looks for the first time that the string ' Cadena' appears inside the file ' Fichero',
        returning the position (Offset) in the one that is (counting from the beginning
        of the file) or it returns a -1 if the string was not find
        Radikal Q3 para Trucomania}
    
      const
        {Leeremos de 8K en 8K
        We will read of 8K in 8K }
        CUANTOBUFFER = 8192;
      var
        Corriente  : TFileStream;
        Almacen    : String;
        Donde      : integer;
        Parar      : boolean;
        Posicion   : integer;
      begin
        SetLength(Almacen, CUANTOBUFFER);
        Corriente:=TFileStream.Create(Fichero,fmOpenRead OR fmShareDenyWrite);
        Result:=-1;
        try
          Corriente.Seek(0,soFromBeginning);
          Parar:=FALSE;
          repeat
            {Guardamos el inicio de lo leido, antes de leer
            We keep the beginning of that read, before reading }
            Posicion:=Corriente.Position;
    
            {Parar:=TRUE cuando no haya mas que leer o bien hayamos encontrado la cadena
             Parar(stop):=TRUE when there is not but to read or we have found the string }
            Parar:= ( Corriente.Read(Almacen[1],CUANTOBUFFER) < CUANTOBUFFER );
            {Buscamos la cadena en el Almacen leido
           We look for the string in the read Almacen }
            Donde:=Pos(Cadena, Almacen);
    
            If Donde <> 0 then begin
              Result:=Donde+Posicion;
              {Si la hemos encontrado... tambien paramos
              If we have found it... we also stopped }
              Parar:=TRUE;
            end else begin
              {Rebobinamos un poco por si la cadena estuviera en medio de dos
               páginas de CUANTOBUFFER de longitud:
              We rewind a little for if the string was in a middle of two
               pages of CUANTOBUFFER of longitude }
              Corriente.Seek(Length(Cadena),soFromCurrent);
            end;
          until Parar;
        finally
          Corriente.Free;
        end;
      end;
    
     begin
      {Ejemplo de uso
      Use example }
    
      {Ejecutamos la busqueda
      We execute the search }
      EncontradaEn:=BuscaStringEnFichero('c:\Ejemplo.txt','BuscaMe');
      {Si la ladeca fué encontrada, mostramos donde, sino no
        If the string was find, we show where, but not }
      if EncontradaEn <>-1 then begin
        {Aqui si la encontró
        Here we just found it }
         ShowMessage( 'Cadena encontrada en: '+     // string found in:
                      IntToStr( EncontradaEn )
                      );
      end else begin
         ShowMessage( 'Lo siento, cadena no encontrada en el fichero'+#13+
                      'Im sorry, string not found in the file');
      end;
    
     end;
    




    Updated at 20/01/2004 English translation, thanks to Jorge Fco. Pérez Soto (jperezso@puc.cl)