(?-x:。 - frei

正規表現を少し追ってみてたら、

訳わかんない箇所があって、

会社でオライリーな本達で調べたので、メモ。

・Text::ParseWords


	($quote, $quoted, undef, $unquoted, $delim, undef) =
	    $line =~ m/^(["'])                 # a $quote
                        ((?:\\[\000-\377]|(?!\1)[^\\])*)  # and $quoted text
                        \1 		       # followed by the same quote
                        ([\000-\377]*)	       # and the rest
		       |                       # --OR--
                       ^((?:\\[\000-\377]|[^\\"'])*?)     # an $unquoted text
		      (\Z(?!\n)|(?-x:$delimiter)|(?!^)(?=["']))
                                               # plus EOL, delimiter, or quote
                      ([\000-\377]*)	       # the rest
		      /x;		       # extended layout

この中の (?-x:$delimiter) が、わかんなかったんだけど

(?-x: は拡張コンポーネントで、

正規表現全体にかかってる、最後の x（空白文字とコメントを許す）を

この (? から ) の間だけ、打ち消してるんだねー。

ついでに [\000-\377] は8進数のASCII文字コードで

"aaa","bbb",

なんて行を処理すると、以下のような結果に。

1回目： $quote が "、 $quoted が aaa、 $unquoted と $delim は空。

2回目： $quote と $quoted が空、 $unquoted が , で $delim は空。

3回目： $quote が "、 $quoted が bbb、 $unquoted と $delim は空。

4回目： $quote と $quoted が空、 $unquoted が , で $delim は空。

・・・・・。んー、この正規表現が良い物なのか、どうなのかすらわからない orz

と言うか、こんな風に書かないといけない理由が、よくわからない orz

うーーーーーーーーーーーむ。