9/21/09
What
exactly is ATSC?
(The
original TV technology is called analog.
It is also called NTSC (National Television System Committee),
which are the people who defined it. The
NTSC spec was created in 1946, updated for color in 1953, and updated for
stereo in 1984. Both of these updates
were backward compatible, rendering nobody’s TV set obsolete. But the new digital standard is totally
different. The only thing it has in
common with NTSC is the 6 megahertz channel width.)
ATSC
(Advanced Television Systems Committee) is the name of the technical standard
that defines the digital TV (DTV) that the FCC has chosen for terrestrial TV
stations. ATSC employs MPEG-2, a
data compression standard. MPEG-2
typically achieves a 50-to-1 reduction in data.
It achieves this by not retransmitting areas of the screen that have not
changed since the previous frame.
Digital
cable TV systems and DBS systems like DirecTV have devised their own standards
that differ somewhat from ATSC. Their
high-def set top boxes (STBs) conform to ATSC at their output connectors. Those systems use MPEG-2 or MPEG-4.
ATSC
has 18 different formats. All TVs must
be able to receive all of these formats and display them. The broadcaster chooses the format. Most TV sets will display only 1 or 2 of
these formats, but will convert the other formats into these. All 18 formats are shown in the following
table.
spec |
Horizontal pixels |
Vertical pixels |
Aspect ratio |
Monitor interface |
Format name |
Frames per sec |
Fields per sec |
Transmitted
interlaced |
|
ATSC |
1920 |
1080 |
16:9 |
1080i |
1080 60i |
30 |
60 |
yes |
|
1080 30p |
30 |
30 |
no |
||||||
1080 24p |
24 |
24 |
no |
||||||
1280 |
720 |
16:9 |
720p |
720 60p |
60 |
60 |
no |
||
720 30p |
30 |
30 |
no |
||||||
720 24p |
24 |
24 |
no |
||||||
704 |
480 |
16:9 |
480p |
480 60p |
60 |
60 |
no |
||
480i |
480 60i |
30 |
60 |
yes |
|||||
480 30p |
30 |
30 |
no |
||||||
480 24p |
24 |
24 |
no |
||||||
704 |
480 |
4:3 |
480p |
480 60p |
60 |
60 |
no |
||
480i |
480 60i |
30 |
60 |
yes |
|||||
480 30p |
30 |
30 |
no |
||||||
480 24p |
24 |
24 |
no |
||||||
640 |
480 |
4:3 |
480p |
480 60p |
60 |
60 |
no |
||
480i |
480 60i |
30 |
60 |
yes |
|||||
480 30p |
30 |
30 |
no |
||||||
480 24p |
24 |
24 |
no |
||||||
NTSC |
»640 |
483 |
4:3 |
Note 1 |
NTSC |
30 |
60 |
yes |
|
Note 1: Some people refer to NTSC as 480i.
When converting NTSC to digital, about 640 pixels are required to reproduce the image nicely even though the true resolution of NTSC is roughly 400 pixels horizontal.
The term interlacing refers to the practice of drawing all of the odd numbered lines on the CRT, and then drawing all of the even numbered lines, which are drawn interspersed with the odd numbered lines. For 1080i, the 540 odd numbered lines are one field, and the 540 even numbered lines are the other field. When interlacing is employed, there are always two fields per frame. Progressive scan means that interlacing is not employed.
One
advantage of interlacing is that, for a given bandwidth, it allows higher
resolution (more pixels). Another
advantage is that it reduces flicker:
A bright white area of the screen will flicker (pulsate rapidly) if that
area is drawn only 30 times per second.
Drawing 60 fields per second mostly prevents that. Live action interlacing is usually captured
by a camera that samples the scene 60 times per second, not 30, and the
resulting images portray motion much better than one would expect of 30 frames
per second. A disadvantage of
interlacing is that data compression is not as efficient.
1080i
and 480i are interlaced formats, while 720p and 480p are progressive formats.
The
receiver reduces the 18 formats to 4.
The display monitor only has to deal with at most four formats. Most receivers let you select the output
format, which you must match to what the monitor can do.
If
you look at the second ATSC format in the above table, 1080 30p, you will note
that it is transmitted in progressive format, but the receiver will convert it
into 1080i, an interlaced format. Why? That is because most CRT TV sets must draw
this image interlaced to prevent flicker.
(CRT sets that can draw 1080 lines at 60 frames per second are
very uncommon.)
Presently
there are only four defined interface formats: 480i, 480p, 720p, and
1080i. There could be more, and there
can be monitors that can benefit from something else. But presently such a monitor will have to
have a built-in receiver. (1080p60
and 1080p24 are becoming more common monitor interface formats, but the wisdom
in them can be questioned.)
(The
term “bandwidth” means “minimum required channel size”. Thus if a random binary data stream is fed
through a 2 MHz-wide channel, and if that channel could handle twice that much
data, then the bandwidth of that data stream is said to be 1 MHz.)
The
bandwidth for NTSC is always 6 MHz.
Without data compression, the bandwidth for 1080i would be 300 MHz. With MPEG-2 data compression the bandwidth
varies according to how fast the image changes.
For 480i the bandwidth rarely goes above 1 MHz. For 1080i and 720p the bandwidth rarely goes
above 3 MHz.
Thus
it is possible to put six 480i programs or two 1080i programs in a 6 MHz
channel. The FCC allows this. Thus terrestrial DTV stations have
sub-channels. It is up to the station
managers how many sub-channels to have and what programming will air on those
sub-channels. Note that a sub-channel
showing a static image (e.g. a weather map or bulletin board) requires almost
no bandwidth despite being at high resolution.
ATSC
is an imperfect standard in that occasionally the bandwidth requirement will
exceed the channel size. When this
happens, the picture can get blurry or jumpy.
Jumpiness occurs when frames are deleted. Blurriness is preferred because if momentary
it is less noticeable. Transmission
encoders have improved gradually and hopefully will continue to do so. In the future perhaps they will fail in a
completely unnoticeable manner.
·
1080i
and 720p require about the same bandwidth when showing live action: A 1080i image has twice as many pixels, while
720p shows twice as many frames per second.
·
While
showing films at 24 frames per second, 720p requires about half the bandwidth
of 1080i.
·
A
common opinion is that 720p is better for sporting events, while 1080i looks
better for documentaries, dramas, and most things that come 24 frames per
second.
Unfortunately
the networks are picking one format for all their shows. ABC, ESPN, and FOX have chosen 720p. All other networks are using 1080i. Hopefully some day they will choose the
format according to the content.
You
can find many websites where it is argued that one format is superior. Those who favor 720p are especially
strident. They always overlook the fact
that many images are stills or have little motion, and will look better in
1080i. They go to great lengths to
explain the problems with interlace and flicker. But few people notice these problems
(assuming they are sitting at the correct distance, and assuming that rescaling
hasn’t introduced gross errors).
1080i and 720p are called High Definition TV (HDTV). 480p is called Enhanced Definition TV (EDTV). 480i is Standard Definition TV (SDTV).
Present
DBS systems (DirecTV and Dish Network) have a bandwidth problem: too many channels. These companies have resorted to some
filtering to reduce the bandwidth per program.
This allows them to carry more channels, but it gives the images a
slightly blurry look. They call it
“noise filtering”, but in effect they have reduced the resolution to below
640x480. Exactly what this resolution is
has not been stated (550x400? Nobody
knows.) On a 17 inch TV this problem is
not very noticeable. But the larger the
set is, the more offensive it is. You
might find it to be a compelling reason to put an antenna on your roof.
This
filtering has been applied only to standard-definition channels. The satellite companies claim that the HDTV
channels are uncompromised. (Verifying such claims
is close to impossible.)
DVD
images are usually 720x480 pixels, 24 frames per second. DVD quality is a step up from NTSC because:
1.
digital
technology is noise-free.
2.
the
horizontal resolution is better. NTSC is
equivalent to about 400 pixels.
3.
when
a progressive scan monitor is used, any remaining flicker is eliminated.
4.
the
colors are better. NTSC has an
“overlapping sidebands” problem.
“Overlapping
sidebands” is a compromise in NTSC that works most of the time. It will cause wrong colors to appear when
showing diagonal lines or fabrics with tweed patterns. Special comb filters improve the image
slightly, but DVDs avoid the problem altogether. (Comb filters are only for NTSC.)
Of course, this improvement is lost if the DVD output is converted to NTSC. Many DVD owners have been buying monitors that have component video inputs, thus avoiding NTSC. DVD quality is essentially 480p (EDTV).
2.
If
1080p is to 1080i what 480p is to 480i then 1080p is a 60 frames per second
monitor interface format. 1080 60p is
becoming a common interface format. But
since there are no 1080 60p sources, there is no need for it. (When a 24 frame source is converted to
1080i, no information is lost. A smart
monitor can convert this 1080i into a perfect 1080 60p or 120p.)
3.
When
the maker of a digital display finds a way to improve upon 1080i he will
usually say his display does 1080p. The
improvement is often just a way to reduce flicker. The sets internally do 1080p but only accept
1080i at the interface.
If you want to be understood unambiguously, you should
refrain from using the term 1080p.
Instead use 1080p60, etc.
When
a 24 frames/second source is converted to 60p, judder (described below) is introduced. A 120Hz display will have to remove that
judder, making conversion to 60p seem counterproductive.
A
justification often used for 1080 60p is that if the STB and the monitor can
both do the conversion to 60p then the viewer can select the one that does it
better.
Live
action 720p looks a little better when it is converted directly to 1080p, not
going through 1080i.
If
Hollywood ever decides to make movies at 60 frames/second then 1080 60p will
become an essential monitor interface format.
But there is presently no indication that they might do this. In fact they tend to consider the flaws in
film to be an artistic enhancement (the “film look”).
In
this process a computer in the receiver turns a 24 or 30 frames/sec image into
a true 60 or 120 frames/sec image. The
motion vectors (described below) are used for creating the missing frames. This is probably the best hope for truly
smooth motion for 1080i or films. But it
requires the networks and studios to make maximum use of motion vectors, which
may or may not happen.
The
motion vectors are not sent over the HDMI interface. So motion compensated processing must be done
by the receiver or the DVD player. The
monitor’s internal motion compensated processing works only with its internal
receiver.
Many
1080p monitors employ motion adaptive de-interlacing, which does not use
the motion vectors. To create the
missing frames, the set first divides the image into regions of motion and
regions without motion. Areas without
motion are de-interlaced by combining with the previous field. For areas with motion the missing scan lines
are created by averaging the adjacent lines above and below. This is all pretty new and some sets are
noticeably better than others. When done
well, motion adaptive de-interlacing produces an image about as good as 1080i
but without the flicker.
But
on the minus side, format conversions sometimes make interlace errors worse.
Theater
film is usually 24 frames per second while TV monitors usually operate at 30 or
60 frames per second. 3:2 pull-down,
also called telecining, is the process of converting a 24 frames per
second image into a 60 frames or fields per second image. It will normally happen one of two ways:
Thus
the TV frames are 3 copies of a film frame, followed by 2 copies of the next
frame, then 3, then 2, 3, 2, 3, 2, etcetera.
This works well. (It has a minor
flaw: During each second, 12 film frames
are stretched slightly in time, while the other 12 are shrunken. The stretched frames dominate slightly,
producing a slightly jerky image when motion is portrayed. This is called 12-cycle judder. Judder is seldom obvious. The only reliable way to see it is to watch the
credits roll up the screen at the end of a program.)
Thus
3:2 pull-down is nearly the same for interlaced and progressive scan. Given the way the brain works, they will look
about the same. Does the progressive
scan version look better? Many people
believe so. This author believes
progressive scan is superior by an increment so small that it is not generally
noticeable. (Keep in mind, less than 5%
of the face of the CRT is lit at any instant.
That it is fully lit is an illusion.)
What
is not in disagreement are the problems of de-interlacing an interlaced
signal. Nearly all DVDs contain data
that is already telecined and interlaced.
To convert an interlaced image into a 60-frames/sec image you can simply
combine successive fields and display them twice. But if you do this with film material, the
following happens:
A
moving object will turn blurry 6 times per second, which is quite
noticeable. To fix this, the
de-interlacer must be smart enough to match fields originally from the same
film frame. That process is called reverse
3:2 pull-down, and sometimes cadence detection. Manufacturers sometimes call it 3:2
pull-down detection or sometimes just 3:2 pull-down (which is
obviously wrong).
If
you want to watch DVDs on a progressive monitor, make sure you have reverse 3:2
pull-down. De-interlacing is also called
line doubling or line scaling.
The
best monitors show the image 120 times per second. 120 is an exact multiple of 24, 30, and
60. Thus in theory the monitor can show
any program without introducing judder.
But will the monitor actually do this?
Consider the following questions:
The
answer to question 2c is generally no.
Since the motion vectors are not available to the monitor, the DVD
player would have to do the motion compensated processing. But monitors that can accept 1080 120p are
very rare, and 60p will introduce judder.
You will likely have to choose either motion compensated processing or
judder-free display, whichever looks better on your system.
120
Hertz technology is often touted as a fix for the slowness of LCDs. This is a dubious claim. The improvement is usually miniscule, often
not discernable.
How 1080i is converted into
720p
The
remainder of this page is nonessential reading.
You should skip it unless you are curious about how ATSC works.
A pixel is just a pixel, right?
For computer monitors each pixel is described by 3 colors (red, green, and blue), with 8, 10, 12, or 16 bits per color. Adjacent pixels are independent.
For ATSC MPEG-2 the colors are represented as Y, Pr, and Pb, which are defined as:
Y = Red+Green+Blue (Y is also called intensity
or luminance and is sometimes depicted as white.)
Pr = Red-Y
Pb = Blue-Y (Pr and Pb are the
color information, or chrominance.)
There
is only one Pr and Pb pixel for every four Y pixels. Thus 720p has 1280*720=921,600 Y pixels plus 230,400 Pr
pixels plus 230,400 Pb pixels. In the
data realm, the image layout would be a repetition of this:
Although
the color information is at a lower resolution, human eyes can rarely sense
this at the correct sitting distance.
(Computer users sit closer.)
The
Y information is encoded as an 8-bit number.
Pr and Pb likewise are 8-bit numbers.
The monitor will eventually convert YPrPb into RGB. The number of bits per visible pixel averages
out to 12, not 24.
A
block is an 8-by-8 array of colorless pixels. Thus a block is 64 8-bit numbers.
A
macro-block is a 16-by-16 array of complete pixels. A macro-block is made up of four Y blocks
plus one Pr block and one Pb block.
A
720p image is 80 macro-blocks wide and 45 macro-blocks high. Each of these 3600
macro-blocks has an address. With each
new frame, only the macro-blocks that change are transmitted.
In
the transmitted data, a row or partial row of consecutive macro-blocks is
called a slice.
Usually
each pixel in a block is subtracted from the same pixel from the previous
frame. Thus a transmitted block is a
block of change values, and gets added to the image in the receiver. But if there is motion, the pixel is subtracted
from a nearby pixel in the previous frame, and a motion vector is
transmitted with each block. The
objective here is to transmit as many zero-valued pixels as possible.
Next
each block (64 8-bit numbers) is further compressed by the following three processes:
The
process is slightly more complicated when interleaved images are sent.
This
description of MPEG-2 has been extremely brief.
There is a more detailed description on the BBC website.
A
discrete cosine transform is a lot like a Fourier transform. A Fourier transform converts a time function
into a frequency function. A DCT converts
a spatial function into a “spatial frequencies” function. It converts 64 pixel values into 64 DCT
coefficients.
In
theory, each DCT coefficient is computed by the formula
Thus
all 64 pixels make a contribution to each DCT coefficient. These transforms are reversible. The receiver must perform an Inverse DCT
on the coefficients to obtain the 64 pixel values.
A
complete understanding of what transforms do and how they do it is a
challenging mathematical topic. But
consider this simplification. In the
following diagram, a line of 8 pixels is shown along with their 1-dimensional
DCT coefficients.
Only
two coefficients are non-zero, a considerable reduction in data. Coefficient c0 is called the D.C. coefficient
because it represents the average height of the 8 pixels. In the receiver the other 7 coefficients
would specify the magnitudes of 7 cosine waves.
The receiver will just add together those cosines and the D.C., and the
result is exactly the original 8 pixels.
If
the 8 pixel values were completely random then the coefficients would be too,
and there would be no data reduction.
But common images are not random, so there are usually more zeroes in
the transform than in the pixels.
Here,
for your consideration, is an assortment of 8x8 pixel blocks and their DCT
coefficients:
ATSC
and 8-VSB are defined by document A/53 on the ATSC website. The video data is put through the following
sequence of processes:
All of these steps are reversible. To recover the original data the receiver must reverse them all in reverse order.
This
produces a stream of data consisting of video packets, audio packets, and
ancillary data packets. This stream is
called an MPEG-2 transport stream and is compatible with the streams
produced by DVD, satellite, and cable systems.
Ancillary
data includes
1. Closed caption data for the hearing
impaired
2. PSIP data (explained below)
3. Data-casting (business opportunities
not related to TV broadcasting)
Null packets are added to the data stream as necessary to make the data rate a constant 19.28 megabits per second. The data stream is re-divided into data segments that are all 187 data bytes long. This output is called the payload.
Data randomizer
All of the payload data is randomized by exclusive-ORing it with a pseudo-random data pattern. This is done to keep the spectrum of the transmitted signal flat.
Reed-Solomon encoder
Twenty Reed-Solomon bytes are added to the end of each data segment, so the segments now have 207 bytes. These added bytes are for error correction of data corrupted during transmission. This is also called Forward Error Correction (FEC). The receiver can correct up to 10-byte errors per data segment.
Convolutional interleaver
Next the data segments are grouped into groups of 52 segments. The bytes of each segment are moved to different segments, distributed evenly among the group of 52.
Suppose during transmission a long string of consecutive bits is corrupted. When the receiver puts the bytes back in their correct order, this long string is converted into many short errors, which can all be fixed by Reed-Solomon error correction. Thus ATSC is unaffected by most shot noise, a common type of interference. Noise bursts of up to 193 microseconds will be fully corrected.
Trellis encoder
Trellis coding is currently the best method known for sending digital data in a channel containing Gaussian (white) noise. The improvement is equivalent to using four times as much transmitter power (assuming the receiver is well designed). First the data stream is divided into symbols. Initially each symbol is two bits. Then the trellis coder recodes the symbol (folding in some previous data) and adds a third bit. With all the data added to the segments, the data rate is now over 32 megabits per second.
VSB modulator
Next the 3 bits of each symbol are converted into an 8-level signal. A 4-symbol sync is added to each segment.
After every group of 312 data segments a 313th segment is inserted. This 313th segment is a fixed pattern that the receiver will look for (to know where a 52-segment group starts). The symbol rate is 10.76 million symbols per second.
Multiplying this 8-level signal by a high frequency sine wave will result in AM modulation. If the 8 levels are as often negative as positive, the resulting AM will have no carrier. To prevent this, a small DC level is added to the 8-level signal. The resulting small carrier is called a pilot.
Finally, filters are used to remove all but the carrier and the first 6 megahertz of the upper sideband.
The diagrams above show the average signal density for each channel. The mathematics behind amplitude modulation will not be explained here. The diagrams are presented for the benefit of people who already have a working knowledge of AM.
RF up-converter and transmitter
This step is the same for ATSC as for NTSC. The intermediate frequency signal is converted to the final frequency, boosted to a high voltage, and sent to the antenna.
_______________________________________
PSIP Data (Program and System Information Protocol)
PSIP data is ancillary data, either binary data or text (never audio or video data). Some PSIP data is essential, but most is just helpful. PSIP text employs Huffman code, which is a variable length code for characters. (The most common characters are the shortest.)
The data is arranged as tables with optional sub-tables. The four PSIP tables are:
1. Virtual Channel Table – This table lists all the sub-channels and their attributes. This table is transmitted 2.5 times per second. The table includes:
A. The two-part channel number
B. The sub-channel short name (up to 7 characters)
C. The associated NTSC channel number
D. The FCC-issued TSID (a signal ID, not the call sign)
E. The MPEG-2 program number
F. The type of service (TV, audio only, data only)
G. A link to the EITs for this sub-channel
H. All the packet IDs (PIDs)
2. System Time Table – This small table just contains the current time. This table is transmitted once per second.
3. Rating Region Table – This small table names the program rating system in use. In the U.S. this would be the TVPG system. This table is transmitted once per minute.
4. Master Guide Table – This table links to the sub-tables. This table is transmitted 7 times per second. The sub-tables are:
A. Event Information Tables: EIT-0, EIT-1, EIT-2, … EIT-127. Each EIT covers a three-hour period, and describes all the programs (events) for that period. EIT tables 0-3, which are required, will describe 12 hours of events. EIT tables 4-127, which are optional, will extend this description to 16 days. There will be multiple EITs 0-127 if there are multiple sub-channels. The EIT contains the following information for each event:
1. Event start time.
2. Event duration
3. Event title
4. Pointer to an ETT describing the program
5. Program content advisory
6. Caption options
7. Audio options
The required repetition rates are:
· EIT-0 is sent twice per second
· EIT-1 is sent once every 3 seconds
· EIT-2 to EIT-127 are sent once every minute
B. Extended Text Tables – This table will describe a program. Out of the EIT and ETT tables the receiver can build a complete Program Guide for the channel. ETTs can also be used to describe the purpose of a sub-channel.
The PSIP standard is document A/65C on the ATSC website.
If you actually read all of this, consider yourself an Honorary Engineer.
This page is part of “An HDTV Primer”, which
starts at www.hdtvprimer.com