ACPI administration advocacy advocacy advocacy opinion alsa amarok apache apple apt aptitude audio audo authentication automount avi awk bash BIOS boot business cache calendar calibre cdr cdrecord censorship commandline computerscience console convert cron cut database date debian degree design desktop development disk dpkg dvd economics education emacs email europe exim faad ffmpeg file files firefox firewall flash foss freedom ftp fun fuse git gnumeric graphics grep growisofs grub gtkpod hardware hardware html idiocy image imagemagick images installation ip iphone ipod iptables iso itunes ivman kde kernel keyboard knoppix lame laptop latex linux locale lockin longlines m4a microsoft mimetypes minitab mount mp3 mp4 mplayer multimedia music mysql network nfs nfs4 nmap openbox openoffice opinion opinion partition pdf perl php politics postgresql printing privacy programming rant remote rhythmbox rss rsync rxvt scp screengrab screenshot script scripting scsi security sed server shell siteadmin sitenews sitesoftware skype skype slackware sound sox spam spreadsheet ssh statistics subversion sudo svk swap t23 t43 terminal text thinkpad thunderbird time timezone ubuntu udev upgrade usb usbmount users uuid versioncontrol vfat video vnc windows wine wordpress wordprocessing X40 xwindows xwindows youtube
UTF or unicode is the best way of encoding text documents. Unfortunately, support in TeX is difficult to understand. So, I put together this little script to convert certain utf characters to the appropriate TeX control sequences. Currenlty, it handles Czech, Slovak, French, German and the Scandinavian languages.
There are many characters that this script doesn't handle mostly for Hungarian and Turkish. I will add most of them in time. The output of t,d and l with caron is ugly as it would be better if TeX created the correct apostrophe-like soft sign rather than have the caron sit on top of the letter. TeX doesn't seem to be able to do that - if it can please let me know.
As far as I know Polish support will never be complete because TeX doesn't handle the ogonek without which you can't write Polish. Apparently, the ogonek is difficult to typeset because it drops below the letter, but then so does the cedilla and TeX supports that. If I'm wrong on this please let me know.
A couple of things to note. The 'C1' switch tells perl to expect input on STDIN to be in UTF-8, the script won't work without it. If we were reading data from another file handle we could use the 'open' pragma to do the same thing. The script hasn't told perl that the output will be in utf8, because we expect the output to be in TeX-compliant ascii. Hence, if a utf8 character does slip out perl will complain.
Finally, here's the script:
#!/usr/bin/perl -C1
use strict;
use warnings;
use charnames ':full';
while (<>) {
### carky
s/\N{LATIN SMALL LETTER A WITH ACUTE}/\\' a/g;
s/\N{LATIN CAPITAL LETTER A WITH ACUTE}/\\' A/g;
s/\N{LATIN SMALL LETTER E WITH ACUTE}/\\' e/g;
s/\N{LATIN CAPITAL LETTER E WITH ACUTE}/\\' E/g;
s/\N{LATIN SMALL LETTER I WITH ACUTE}/\\'{\\i}/g;
s/\N{LATIN CAPITAL LETTER I WITH ACUTE}/\\' I/g;
s/\N{LATIN SMALL LETTER O WITH ACUTE}/\\' o/g;
s/\N{LATIN CAPITAL LETTER O WITH ACUTE}/\\' O/g;
s/\N{LATIN SMALL LETTER U WITH ACUTE}/\\' u/g;
s/\N{LATIN CAPITAL LETTER U WITH ACUTE}/\\' U/g;
s/\N{LATIN SMALL LETTER Y WITH ACUTE}/\\' y/g;
s/\N{LATIN CAPITAL LETTER Y WITH ACUTE}/\\' Y/g;
### hacky
s/\N{LATIN SMALL LETTER C WITH CARON}/\\v c/g;
s/\N{LATIN CAPITAL LETTER C WITH CARON}/\\v C/g;
s/\N{LATIN SMALL LETTER D WITH CARON}/\\v d/g;
s/\N{LATIN CAPITAL LETTER D WITH CARON}/\\v D/g;
s/\N{LATIN SMALL LETTER E WITH CARON}/\\v e/g;
s/\N{LATIN CAPITAL LETTER E WITH CARON}/\\v E/g;
s/\N{LATIN SMALL LETTER L WITH CARON}/\\v l/g;
s/\N{LATIN CAPITAL LETTER L WITH CARON}/\\v L/g;
s/\N{LATIN SMALL LETTER N WITH CARON}/\\v{n}/g;
s/\N{LATIN CAPITAL LETTER N WITH CARON}/\\v{N}/g;
s/\N{LATIN SMALL LETTER R WITH CARON}/\\v r/g;
s/\N{LATIN CAPITAL LETTER R WITH CARON}/\\v R/g;
s/\N{LATIN SMALL LETTER S WITH CARON}/\\v s/g;
s/\N{LATIN CAPITAL LETTER S WITH CARON}/\\v S/g;
s/\N{LATIN SMALL LETTER T WITH CARON}/\\v t/g;
s/\N{LATIN CAPITAL LETTER T WITH CARON}/\\v T/g;
s/\N{LATIN SMALL LETTER Z WITH CARON}/\\v z/g;
s/\N{LATIN CAPITAL LETTER Z WITH CARON}/\\v Z/g;
## krouzky
s/\N{LATIN SMALL LETTER U WITH RING ABOVE}/\\accent23u{}/g;
s/\N{LATIN CAPITAL LETTER U WITH RING ABOVE}/\\accent23U{}/g;
##### EXTRA STUFF FOR FRENCH
## circonflex
s/\N{LATIN SMALL LETTER A WITH CIRCUMFLEX}/\\^ a/g;
s/\N{LATIN CAPITAL LETTER A WITH CIRCUMFLEX}/\\^ A/g;
s/\N{LATIN SMALL LETTER E WITH CIRCUMFLEX}/\\^ e/g;
s/\N{LATIN CAPITAL LETTER E WITH CIRCUMFLEX}/\\^ E/g;
s/\N{LATIN SMALL LETTER I WITH CIRCUMFLEX}/\\^ i/g;
s/\N{LATIN CAPITAL LETTER I WITH CIRCUMFLEX}/\\^ I/g;
s/\N{LATIN SMALL LETTER O WITH CIRCUMFLEX}/\\^ o/g;
s/\N{LATIN CAPITAL LETTER O WITH CIRCUMFLEX}/\\^ O/g;
s/\N{LATIN SMALL LETTER U WITH CIRCUMFLEX}/\\^ u/g;
s/\N{LATIN CAPITAL LETTER U WITH CIRCUMFLEX}/\\^ U/g;
##accent grave
s/\N{LATIN SMALL LETTER A WITH GRAVE}/\\` a/g;
s/\N{LATIN CAPITAL LETTER A WITH GRAVE}/\\` A/g;
s/\N{LATIN SMALL LETTER E WITH GRAVE}/\\` e/g;
s/\N{LATIN CAPITAL LETTER E WITH GRAVE}/\\` E/g;
s/\N{LATIN SMALL LETTER U WITH GRAVE}/\\` u/g;
s/\N{LATIN CAPITAL LETTER U WITH GRAVE}/\\` U/g;
## cedille
s/\N{LATIN SMALL LETTER C WITH CEDILLA}/\\c c/g;
s/\N{LATIN CAPITAL LETTER C WITH CEDILLA}/\\c C/g;
## le trema
s/\N{LATIN SMALL LETTER A WITH DIAERESIS}/\\" a/g;
s/\N{LATIN CAPITAL LETTER A WITH DIAERESIS}/\\" A/g;
s/\N{LATIN SMALL LETTER E WITH DIAERESIS}/\\" e/g;
s/\N{LATIN CAPITAL LETTER E WITH DIAERESIS}/\\" E/g;
s/\N{LATIN SMALL LETTER I WITH DIAERESIS}/\\" i/g;
s/\N{LATIN CAPITAL LETTER I WITH DIAERESIS}/\\" I/g;
## OE et AE
s/\N{LATIN SMALL LETTER AE}/\\ae/g;
s/\N{LATIN CAPITAL LETTER AE}/\\AE/g;
s/\N{LATIN SMALL LIGATURE OE}/\\oe/g;
s/\N{LATIN CAPITAL LIGATURE OE}/\\OE/g;
### German stuff
##umlaut
s/\N{LATIN SMALL LETTER O WITH DIAERESIS}/\\" o/g;
s/\N{LATIN CAPITAL LETTER O WITH DIAERESIS}/\\" O/g;
s/\N{LATIN SMALL LETTER U WITH DIAERESIS}/\\" u/g;
s/\N{LATIN CAPITAL LETTER U WITH DIAERESIS}/\\" U/g;
## scharfes s
s/\N{LATIN SMALL LETTER SHARP S}/\\ss/g;
## Scandinavian Extra Stuff
s/\N{LATIN SMALL LETTER A WITH RING ABOVE}/\\aa/g;
s/\N{LATIN CAPITAL LETTER A WITH RING ABOVE}/\\AA/g;
s/\N{LATIN SMALL LETTER O WITH STROKE}/\\o/g;
s/\N{LATIN CAPITAL LETTER O WITH STROKE}/\\O/g;
print;
}