How to …

2Aug/110

How to find the first position of a number in a string (Teradata)



We had a situation where we had a column of data in our Teradata database that had some ugly data in it.  Someone had created "unique" user_ids by some (seemingly random) combination of the user's lastname and a number.  Like this:

  • Barker01
  • Bradley1
  • Burch32
  • Burch33
  • Smith001
  • Smith24
  • Ming12
  • Flemming1

We needed to extract the last name (and essentially drop the ending numbers).

The way we did this, without writing a UDF, was with the Char2HexInt function.

Numerals, once converted to Hex, always start with 003 (0030-0039).  So, we took the string, converted it to Hex, then found the first occurrence of '003'.

The code below actually gives the position of the final "letter" byte in the string (the byte to the left of the first numeral):

sel (position('003' in char2hexint(stringfield))-1)/4 as num_posn from table1

stringfieldis the field containing the string for which you want to find the first position of a numeral.

The math portion is important...you divide by 4 because each byte, when converted to Hex, now takes up 4 spots.

Thus: 'abc123' becomes '006100620063003100320033', and the first occurrence of 003 starts at byte 13. Take away 1 for 12, divided by 4 gives you 3, which is the position of the last letter. (To get the position of the first number, just add 1 to the result of the "position" code above.)

ONE NOTE: the source table for stringfield must have stringfield in the Unicode character set. If it is in Latin character set, the Char2HexInt will not return the full four-byte values. Instead, 'abc123' becomes '616263313233', which won't work. If you don't have control over the source table, create a volatile table with the field having a Unicode character set.

[Slashdot] [Digg] [Reddit] [del.icio.us] [Facebook] [Technorati] [Google] [StumbleUpon]