php - preg_split regex lookback for multiple matches -


the goal of regex split on unicode whitespace, excluding newline , ensure that newline character appended previous non unicode whitespace character. seeing work, single whitespace characters before \n.

using current regex:

    $data  = "the\nquick\n brown fox jumped     \nover lazy dog.";     $tokenized = preg_split("~(?<=\n)|\p{z}+(?!\n)~u", $data, -1, preg_split_offset_capture); 

current result (i have added \n "\n" character present):

array (     [0] => array         (             [0] => the\n              [1] => 0         )      [1] => array         (             [0] => quick\n              [1] => 4         )      [2] => array         (             [0] =>              [1] => 10         )      [3] => array         (             [0] => brown             [1] => 11         )      [4] => array         (             [0] => fox             [1] => 17         )      [5] => array         (             [0] => jumped             [1] => 21         )      [6] => array         (             [0] =>  \n             [1] => 31         )      [7] => array         (             [0] => on             [1] => 33         )      [8] => array         (             [0] =>             [1] => 38         )      [9] => array         (             [0] => lazy             [1] => 42         )      [10] => array         (             [0] => dog.             [1] => 47         ) ) 

expected result:

array (     [0] => array         (             [0] => the\n             [1] => 0         )      [1] => array         (             [0] => quick\n             [1] => 4         )      [2] => array         (             [0] => brown             [1] => 10         )      [3] => array         (             [0] => fox             [1] => 16         )      [4] => array         (             [0] => jumped\n             [1] => 20         )      [5] => array         (             [0] => on             [1] => 27         )      [6] => array         (             [0] =>             [1] => 32         )      [7] => array         (             [0] => lazy             [1] => 36         )      [8] => array         (             [0] => dog.             [1] => 41         ) ) 

any advice appreciated. thanks.


Comments

Popular posts from this blog

java - Oracle EBS .ClassNotFoundException: oracle.apps.fnd.formsClient.FormsLauncher.class ERROR -

c# - how to use buttonedit in devexpress gridcontrol -

nvd3.js - angularjs-nvd3-directives setting color in legend as well as in chart elements -