Quirky little gotcha when using the mb_* suite of PHP functions, such as mb_substr(), with UTF-8 data. If you don't explicitly set the encoding to UTF-8, then unicode characters will be chopped up incorrectly if handled by a mb_* function which is expecting a two-byte character rather than a three-byte character. Found this out the hard way when constructing the excerpt() function for the Francotoile search results page. So, before using the mb_* functions with your UTF-8 data, remember to set the encoding:
mb_internal_encoding("UTF-8");