Perl - split command with regex - split numeric and strings -


my data follows:

20110627 abc dbe efg  217722 1425 1767 0.654504367955466 0.811585416264778 -0.157081048309312  

i trying split in such way keep numeric values in 1 cell, , strings in 1 cell.

thus, want "20110627" in 1 cell, "abc dbe efg" in another, "0.811585416264778" in another, "-0.157081048309312" in another, etc.

i have following split command in perl regex

my @fld = split(/[\d+][\s][\w+]/, $_); 

but doesn't seem want.. can tell me regex use? in advance

edit : following vks suggestion, changed regex little bit rid of whitespace, take account string might have commas (,) or slash (/) or dash (-) negative sign (-) seems taken separate token in numbers:

(-?\d+(\.\d+)?)|([\/?,?\.?\-?a-za-z\/ ]+)  20110627 b c  217722 1425 1767 0.654504367955466 0.811585416264778 -0.157081048309312  19950725 c  16458 63 91 0.38279256288735 0.552922590837283 -0.170130027949933  19980323 g c /de/ 20130516 - e, inc.  33019 398 197 1.205366607105 0.596626184923832 0.608740422181168  20130516 - e, inc.  24094 134 137 0.556155059350876 0.56860629202291 -0.0124512326720345  19960327 f c /de 38905 503 169 1.29289294435163 0.434391466392495 0.858501477959131  

expected output : 20110627 in 1 token b c in 1 token -0.170130027949933 in 1 token g c /de/ in 1 token - e, inc. in 1 token.. (of course other should in separate tokens, in other words strings in 1 token , numbers in 1 token.. cannot write every single 1 of them think it straightforward)

2nd edit:

brian found right regex: /(-?\d+(?:.\d+)?)|([/,.-a-za-z]+(?:\s+[/,.-a-za-z]+)*)/ (see below). brian ! have follow question: writing results of regex split excel file, using following code:

use warnings; use strict; use spreadsheet::writeexcel; use scalar::util qw(looks_like_number); use spreadsheet::parseexcel;  use spreadsheet::parseexcel::saveparser; use spreadsheet::parseexcel::workbook;  if (($#argv < 1) || ($#argv > 2)) {     die("usage: tab2xls tabfile.txt newfile.xls\n"); }; open (tabfile, $argv[0]) or die "$argv[0]: $!";  $workbook  = spreadsheet::writeexcel->new($argv[1]); $worksheet = $workbook->add_worksheet(); $row = 0; $col = 0;  while (<tabfile>) {     chomp;     # split     @fld = split(/(-?\d+(?:\.\d+)?)|([\/,.\-a-za-z]+(?:\s+[\/,.\-a-za-z]+)*)/, $_);      $col = 0;     foreach $token (@fld) {             $worksheet->write($row, $col, $token);             $col++;         }             $row++;     } 

the problem empty cells when use code:

> "empty cell" "1000" "empty cell" "empty cell" "abc deg" "empty cell" > "2500" "empty cell" "empty cell" "1500" "3500" 

why getting these empty cells? way avoid that? lot

using revised requirements allow /, ,, -, etc., here's regex capture numeric tokens in capture group #1 , alpha in capture group #2:

(-?\d+(?:\.\d+)?)|([\/,.\-a-za-z]+(?:\s+[\/,.\-a-za-z]+)*) 

(see regex101 example)

breakdown:

(-?\d+(?:\.\d+)?) (capture group #1) matches numbers, possible negative sign , possible decimal places (in non-capturing group)

([\/,.\-a-za-z]+(?:\s+[\/,.\-a-za-z]+)*) (capture group #2) matches alpha strings possible embedded whitespace


Comments

Popular posts from this blog

java - Oracle EBS .ClassNotFoundException: oracle.apps.fnd.formsClient.FormsLauncher.class ERROR -

c# - how to use buttonedit in devexpress gridcontrol -

How do you convert a timestamp into a datetime in python with the correct timezone? -