Chinaunix首页 | 论坛 | 博客
  • 博客访问: 3220833
  • 博文数量: 146
  • 博客积分: 3918
  • 博客等级: 少校
  • 技术积分: 8557
  • 用 户 组: 普通用户
  • 注册时间: 2010-10-17 13:52
个人简介

个人微薄: weibo.com/manuscola

文章分类

全部博文(146)

文章存档

2016年(3)

2015年(2)

2014年(5)

2013年(42)

2012年(31)

2011年(58)

2010年(5)

分类: LINUX

2013-11-18 23:07:31

    上篇博文学习了go语言的对FILE的基本操作,我突然想到,文件一个很常用的场景是逐行处理,比如我们的linux下的神器awk,比如我之前写的KMean++算法处理NBA后卫的数据。对于C语言而言,fgets就解决了这个问题,看下C语言中fgets的接口:    
  1. char *fgets(char *s, int size, FILE *stream);
    当然了首先要fopen,获得文件描述符,然后可以fgets按行获取。
    我给出个C程序,完成基本的cat功能,支持-n选项,带了-n则打印出行号:
  1. manu@manu-hacks:~/code/c/self/readline$ cat mycat.c
  2. #include<stdio.h>
  3. #include<stdlib.h>
  4. #include<string.h>
  5. #include<errno.h>



  6. int num_flag = 0;

  7. int cat(FILE* file)
  8. {
  9.     char buf[1024] = {0};
  10.     int line_no = 1;
  11.     while(fgets(buf,1024,file) != NULL)
  12.     {
  13.         if(num_flag != 0)
  14.         {
  15.             fprintf(stdout,"%5d %s",line_no,buf);
  16.         }
  17.         else
  18.         {
  19.             fprintf(stdout,"%s",buf);
  20.         }
  21.         line_no++;
  22.     }

  23. }

  24. int main(int argc,char* argv[])
  25. {
  26.     int i = 0 ;
  27.     int j = 0 ;
  28.     int file_exist = 0;
  29.     FILE* file = NULL;

  30.     for(i = 1; i < argc;i++)
  31.     {
  32.         if(strcmp(argv[i],"-n") == 0)
  33.         {
  34.             num_flag = 1;
  35.             break;
  36.         }
  37.     }

  38.     for(j = 1; j<argc ;j++)
  39.     {
  40.         if(j==i)
  41.             continue;

  42.         file_exist = 1;

  43.         file = fopen(argv[j],"rb");
  44.         if(file == NULL)
  45.         {
  46.             fprintf(stderr,"%s:err reading from %s:%s\n",
  47.                     argv[0],argv[j],strerror(errno));
  48.             continue;
  49.         }

  50.         cat(file);
  51.     }

  52.     if(file_exist == 0)
  53.     {
  54.         cat(stdin);
  55.     }
  56. }
    golang怎么办?
    golang 提供了package bufio。bufio.NewReader()创建一个默认大小的readbuf,当然,也可以bufio.NewReaderSize
  1. func NewReader(rd io.Reader) *Reader
  2.     NewReader returns a new Reader whose buffer has the default size(4096).


  3. func NewReaderSize(rd io.Reader, size int) *Reader
  4.     NewReaderSize returns a new Reader whose buffer has at least the
  5.     specified size. If the argument io.Reader is already a Reader with large
  6.     enough size, it returns the underlying Reader.
    bufio提供

  1. func (b *Reader) ReadByte() (c byte, err error)
  2.     ReadByte reads and returns a single byte. If no byte is available,
  3.     returns an error.

  4. func (b *Reader) ReadBytes(delim byte) (line []byte, err error)
  5.     ReadBytes reads until the first occurrence of delim in the input,
  6.     returning a slice containing the data up to and including the delimiter.
  7.     If ReadBytes encounters an error before finding a delimiter, it returns
  8.     the data read before the error and the error itself (often io.EOF).
  9.     ReadBytes returns err != nil if and only if the returned data does not
  10.     end in delim. For simple uses, a Scanner may be more convenient.

  11. func (b *Reader) ReadString(delim byte) (line string, err error)
  12.     ReadString reads until the first occurrence of delim in the input,
  13.     returning a string containing the data up to and including the
  14.     delimiter. If ReadString encounters an error before finding a delimiter,
  15.     it returns the data read before the error and the error itself (often
  16.     io.EOF). ReadString returns err != nil if and only if the returned data
  17.     does not end in delim. For simple uses, a Scanner may be more
  18.     convenient.
    ReadByte这个接口,和C语言中fgetc很接近,每次读取一个字节。ReadBytes和ReadString都可以实现逐行读取,只要delim设置为'\n'.
    看一下go语言实现的简易mycat:    
  1. manu@manu-hacks:~/code/go/self$ cat mycat.go
  2. package main
  3. import "fmt"
  4. import "os"
  5. import "io"
  6. import "flag"
  7. import "bufio"

  8. var num_flag = flag.Bool("n",false,"num each line")

  9. func usage(){
  10.     fmt.Printf("%s %s\n",os.Args[0],"filename")
  11. }



  12. func cat(r *bufio.Reader){
  13.     i := 1
  14.     for {
  15.         //buf,err := r.ReadBytes('\n')
  16.         buf,err := r.ReadString('\n')
  17.         if err == io.EOF{
  18.             break
  19.         }

  20.         if *num_flag{
  21.             fmt.Fprintf(os.Stdout,"%5d %s",
  22.                         i,buf)
  23.             i++
  24.         }else{
  25.             fmt.Fprintf(os.Stdout,"%s",buf)
  26.         }

  27.     }
  28.     return
  29. }


  30. func main(){

  31.     flag.Parse()
  32.     if(flag.NArg() == 0){
  33.         cat(bufio.NewReader(os.Stdin))
  34.     }

  35.     for i:=0;i<flag.NArg();i++{
  36.         f,err := os.OpenFile(flag.Arg(i),os.O_RDONLY,0660)
  37.         if err != nil{
  38.             fmt.Fprintf(os.Stderr,"%s err read from %s : %s\n",
  39.             os.Args[0],flag.Arg(0),err)
  40.             continue
  41.         }

  42.         cat(bufio.NewReader(f))
  43.         f.Close()
  44.     }
  45. }
    单纯考虑逐行读取,line by line, bufio的文档也说
  1. For simple uses, a Scanner may be more convenient.
    先看文档:
  1. func NewScanner(r io.Reader) *Scanner
  2.     NewScanner returns a new Scanner to read from r. The split function
  3.     defaults to ScanLines.

  4. func (s *Scanner) Text() string
  5.     Text returns the most recent token generated by a call to Scan as a
  6.     newly allocated string holding its bytes.

  7. func (s *Scanner) Err() error
  8.     Err returns the first non-EOF error that was encountered by the Scanner.

  9. func (s *Scanner) Scan() bool
  10.     Scan advances the Scanner to the next token, which will then be
  11.     available through the Bytes or Text method. It returns false when the
  12.     scan stops, either by reaching the end of the input or an error. After
  13.     Scan returns false, the Err method will return any error that occurred
  14.     during scanning, except that if it was io.EOF, Err will return nil.
    怎么用Scanne呢?   
  1. func cat(scanner *bufio.Scanner) error{

  2.     for scanner.Scan(){
  3.         fmt.Println(scanner.Text())    
  4.       //fmt.Fprintf(os.Stdout,"%s\n",scanner.Text())
  5.     }

  6.     return scanner.Err()
  7. }
    注意,为啥执行Scan,Text()函数就能返回下一行呢?因为默认的分割函数就是ScanLines.如你有特殊的需求来分割,func (s *Scanner) Split(split SplitFunc)
这个函数可以制定SplitFunc。你可以定制自己的分割函数。
    需要注意的是,Scan会将分割符号\n去除,如果Fprintf输出的话,不添加\n打印,会出现没有换行的现象,如下所示
  1. fmt.Fprintf(os.Stdout,"%s",scanner.Text())
  1. manu@manu-hacks:~/code/go/self$ go run mycat_v2.go test.txt
  2. this is test file created by goif not existed ,please create this fileif existed, Please write appendhello world,hello gothis is test file created by goif not existed ,please create this fileif existed, Please write appendhello world,hello gomanu@manu-hacks:~/code/go/self$ cat test.txt
  3. this is test file created by go
  4. if not existed ,please create this file
  5. if existed, Please write append
  6. hello world,hello go
  7. this is test file created by go
  8. if not existed ,please create this file
  9. if existed, Please write append
  10. hello world,hello go
    调用部分的代码如下:
  1.         f,err := os.OpenFile(flag.Arg(i),os.O_RDONLY,0660)
  2.                  ...
  3.         error := cat(bufio.NewScanner(f))
  4.         if err != nil{
  5.             fmt.Fprintf(os.Stderr,"%s err read from %s : %s\n",
  6.             os.Args[0],flag.Arg(i),error)
  7.         }
    推荐使用Scanner,使用比较简单。
参考文献:
1 godoc bufio
golang: read text file into string array (and write)    
阅读(21619) | 评论(1) | 转发(1) |
给主人留下些什么吧!~~

lmnos2013-11-21 11:19:19

彬兄也开始捣鼓go语言了,我以前也有所了解,但是没有深入