Tag: regex

Using regular expressions in Groovy script to retrieve data from html pages

Posted by – March 31, 2008

I have been working with regular expressions in Java, regular expressions are very useful to retrieve some data based on document structure. In my example I’m extracting cellular model and brand based on particular html document structure, take a look on html code below:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
<html>
  <head>
     <title>Regular expressions sample data
  </title></head>
  <body>
      <font style="font-size: 8pt;" face="Verdana"><br />
      </font><font face="Verdana" size="1"><b>Audiovox:<br />
      </b></font>
      <a href="http://cellular.com/audiovox9500.shtml">
      <font size="1">CDM 9500</font></a><font size="1">
      |
      </font>
      <a href="http://cellular.com/thera.shtml">
      <font size="1">PDA - (Thera)</font></a><font size="1">
      |
      <a href="http://cellular.com/audiovoxppc6600.shtml">
      PDA - PPC6600 (Harrier)</a>.<br />
      <br />
      </font>
      <font style="font-weight: 700;"
         face="Verdana" size="1">Cyberbank:<br />
      </font>
      <a href="http://cellular.com/cyberbankpoz.shtml">
      <font size="1">CB 0870 BR (PoZ)</font></a><font size="1">
      |
      </font>
      <a href="http://cellular.com/cyberbank_cb880.shtml">
      <font size="1">CB 0880 BR (Triton)</font></a><font size="1">
      |
      <a href="http://cellular.com/cyberbank_x315.shtml">
      CP X315 BR (PoZ EVDO)</a>.<br />
      <br />
      </font><b><font face="Verdana" size="1">Compaq:<br />
      </font></b><font size="1">
      <a href="http://cellular.com/compaq_ipac3700.shtml">
      IPAC-3700</a>.<br />
      <br />
  </font></body>
</html>

More

Share