How to Tackle a Transcript

From OTPedia
Jump to: navigation, search

A TEDx transcript is a form of same-language subtitles or captions. In addition to containing the words spoken by the speaker, the transcript must additionally be divided into subtitle lines and then spotted (cued, timed) to match the flow of the recorded talk. Like closed captions, TEDx transcripts also contain sound information for Deaf and hard-of-hearing viewers.

TEDx transcripts are created by volunteers working along a few standards for subtitle length and reading speed. In order to start contributing transcripts, you must sign up for an Amara account on the TED team. This video tutorial explains how to do that. If you're working on an English transcript, make sure to read the English Style Guide.

IMPORTANT: before you start working on a transcript, make sure that the video is part of the TED team on Amara, using this guide (if you are a TEDx organizer, you can also add your videos using the form linked to at the end of that guide).

Contents

How the transcription project works

TEDx talk videos are uploaded to YouTube. Subtitles for those videos are created using an online tool created by our subtitling partner, Amara. This solution also handles the organization of the transcription effort. Users who log in can search for untaken transcription tasks by using a number of filters and search terms. Once a transcript has been completed, it must be reviewed by another volunteer and then approved. Approved transcripts can be viewed on YouTube. The transcriber and reviewer are also credited for their work on their TED.com profile.

Why transcribe talks and not translate them directly?

Transcripts are important for a couple of reasons:

  1. Talks reach millions of viewers, but with transcripts (same language subtitles) it's possible to reach even wider audience - the deaf and the hard of hearing.
  2. People are not often confident enough to just translate by hearing, so written transcript provides accurate baseline - sometimes a speaker uses acronims, or short word forms, idioms, phrases that are not easily translated. Having it all in written form makes it easier to research and translate accurately. Transcript is the most important part of translating.

If you have transcribed and/or translated TEDx talks before they were available on the TED team on Amara, please fill in this form. The purpose of the form is to keep track of everyone who contributed in order to credit them properly. Every transcriber/translator is credited on their TED profile. If you have worked on a talk but don't have a TED/Amara profile, see below how to sign up.

How to sign up

  1. Create a profile on TED and register with Amara, our subtitling partner.
  2. Once your application is approved, find the talk you want to transcribe. Go to the TED team section on Amara, choose the TEDxTalks project, go to the Tasks tab and for the first filter choose “Transcribe.” Once you have found the talk, click “Perform task.”
  3. This document explains how to use the Amara transcription interface.
  4. Consider joining the "I transcribe TEDx talks" Facebook group and the Facebook group for TED translators, and/or a TED translator group for your specific language. You can find the list of groups here. Translators are very friendly and can help you with anything and answer all your questions.

How to find talks to transcribe

You can learn how to use the Amara interface in order to find talks to transcribe by watching this video tutorial. The general selection of TEDxTalks can be found in the TEDxTalks project, while "Best of TEDxTalks" contains suggestions of best talks selected from several languages. Remember to set the search filters to "Transcribe."

Please make sure you only search within the TED team projects and not Amara public search. Do not transcribe TEDx videos that are not part of the TED Team, because we may unable to use your work. Instead, notify us using this form, so that we can add the talk properly. To learn how to make sure that the TEDxTalk is officially part of the TED Team, use this guide.

Subtitling offline

The Transcription Project is mainly executed in the interface provided by our transcription/subtitling partner, Amara. Amara serves two functions - getting transcription tasks assigned to the right people (transcriber, reviewer, approver) and providing a transcription tool. If you are used to an offline subtitling solution, you still need to start by going to Amara to find a transcription task and using the system to get the task assigned to you. Afterwards, if you prefer and have used subtitling software before, you can work offline and then upload the draft to Amara. You will still need to edit the title and description in the online editor.

When reviewing a transcript offline, bear in mind that if you upload a file with a different number of subtitles than the original transcript, the subtitles may become desynchronized. Always check the uploaded file against the previous revision (in the Revisions tab). A good rule of thumb is, if the last subtitle is displayed at a similar time in the uploaded revision as in the original transcript, the subtitles should be OK and you can proceed to editing the title and description and completing the review task online. If the subtitles you uploaded are desynchronized, roll back to the previous revision, go into the online editor to add or remove subtitles until the number is the same as in the file you are trying to upload (the location of the subtitles is not important, but add them between existing subtitles so that they have time codes, and you fill them with dummy text). Afterwards, the subtitles you upload should be synchronized properly.

You can find links to various offline and online subtitling tools in the External Links section.

Title and description standard

Each TEDx talk comes with a title and description added by the TEDx organizer. However, these sometimes contain too little or too much information. Sometimes, the talk title might be missing, for various reasons (the event happened years ago, or the organizers simply didn't title the talks). In those cases it's OK to just leave the speaker's name, but you can also try contacting the organizer or speaker and asking for a title or you can come up with a title on your own.

The language of the title and description should match the language of the talk. Do not put English titles and descriptions on non-English talks. Using a dash instead of the colon is also fine. The word "at" should be translated. Ideally, the description should contain a 1-2 sentence overview of the talk, without links, and the speaker's biography can be left and translated. When transcribing or translating the talk on Amara, the text explaining what the TEDx program is should be left out. For example:

On being a young entrepreneur: Christophe Van Doninck at TEDxFlanders

The new Media uploader for TEDx organizers changed the way talks get uploaded to YouTube. Titles do not contain the word "at" anymore and look like this:

On being a young entrepreneur | Christophe Van Doninck | TEDxFlanders

These talks contain a disclaimer in the description that should be left and translated ("This talk was given at a local TEDx event, produced independently of the TED Conferences.") The speaker's bio can also be included, but the text explaining what the TEDx program is should be left out. The language of the title and description should match the language of the talk. Do not put English titles and descriptions on non-English talks.

Note: do not add the year/date of the event.

How to divide the text into subtitles

One subtitle is the text that is displayed on the screen at a given time. One subtitle can contain up to two lines, with a line break inbetween. To learn more about the technical aspects of subtitles (length, reading speed), watch this video tutorial.

When deciding how to divide the text into subtitles, you should consider the following points (all described in more detail in sections that follow):

1. How long can the subtitle stay on the screen?
Based on this, the text in the subtitle can be shorter or longer (when there is more time, people can read a longer subtitle more easily).

2. Is the subtitle long enough to break it into two lines?
If the text you will have in a subtitle is over 42 characters in length, you should break it into two different lines (two lines in the same subtitle).

3. Is the text that I'm entering too long to work as a single subtitle?
If the text you are entering is longer than 84 characters, you should create two subtitles instead.

4. Do the lines and the whole subtitle end neatly in "linguistic wholes"?
You should take care to break the lines and end the subtitles after linguistic wholes, e.g. not after an article.

5. Is the reading speed no more than 21 characters/second?
The maximum reading speed for subtitles is 21 characters/second. If your subtitle exceeds this, consider editing the timing. To preserve a good reading speed, you can have the subtitle can run a little into the time the next sentence is spoken (however, don't start the subtitle more than about 100 ms before the equivalent bit of speech is heard). Otherwise, consider compressing/reducing the text (e.g. removing "fluff" like "Well," "right?", removing repetitions, etc.) For more information on compressing subtitles, see this guide. Remember that good reading speed is very important, because your transcript will often serve as the starting point for translations, and the equivalent subtitle can become much longer in the target language, raising the reading speed.

6. Am I including stuff that should be considered "noise"?
Broken phrases ("I wanted to--No, this is what I'll talk about"), repetitions ("Thank you, thank you, thank you, thank you") and empty syllables ("erm," "umm" etc.) should not usually be reflected in the subtitle at all (unless they are crucial to what the speaker is trying to convey, e.g. they later refer to how they broke a few phrases at the beginning of their talk due to stress). Also, do not include obvious errors, like when the speaker says "We thinks" instead of "We think." Use the correct form of the word in the subtitle. On rare occasions, if you believe that the change is obvious but technically changes the meaning of the sentence, put it in square brackets (to indicate "editing").

7. Do I really have to cut the sentence up into this many subtitles?
If the above points had been considered, you may want to make sure that you don't cut up the speaker's sentences into too many subtitles. Try to keep subtitles and clauses together in one subtitle. It will be easier for translators later on to translate bigger chunks of one sentence than smaller ones, since not everything will divide up easily in the same way in the target language as it does in the original. For this reason, provided you can do it without breaking the other rules (e.g. making the subtitle over 84 characters long or getting the reading speed up to over 21 characters per second), try to keep bigger parts of one sentence together in one subtitle. Important: Don't include parts of another sentence in the same subtitle as the end of the previous sentence (e.g. "this is why./And another idea").

8. Did I include all of the sound information essential to understanding the talk?
Include all of the sound information essential to understanding the talk (e.g. non-verbal sounds that the speaker refers to, off-screen speaker changes), as well instances of clear laughter and applause from the audience (with the exception of applause heard at the beginning of the talk).

9. Did I include on-screen text?
If possible without overlapping other subtitles and going over the subtitle length and reading speed limits, include on-screen text that is part of the talk (e.g. text on slides or embedded subtitles in a video played on the stage). This will allow this text to be translated into other languages. In order to signify that this is on-screen text and not something the speaker is saying, put the representation of on-screen text between square brackets.

10. Does the subtitle go over a cut in the video for no reason?
If a subtitle is displayed over a cut in the video, it suggests that the consecutive scenes are somehow connected. For this reason, it is important to make sure you are not adding those connections where there shouldn't be any. Keeping to this rule with the fast-paced editing in some talks may be difficult, but remember that this is most important in cases where synchronizing changes in the video with changes in subtitles is crucial to what happens in the talk (e.g. very often something that reveals what's in a slide should not show up before the slide shows up on the screen).

Cueing/timing the subtitles

Because a TEDx transcript is meant to work as subtitles, the content of the transcript must be broken up into subtitle lines, and these lines must be synchronized with the video. This process is referred to as cueing, spotting or timing. The main objective in timing the subtitles is to present the viewer with a line of text displayed on the screen for a period of time that will be sufficient for them to read and understand the text, i.e. with a reading speed that is no more than 21 characters per second.

On the other hand, the subtitles are only one part of the visual content that the viewer must take in at any given time, and for this reason, the subtitle line cannot be too long, because the viewer must be given enough time to look at and comprehend the video. Additionally, hearing viewers watching the talk with subtitles (e.g. translated into their language) must also have enough time to listen to the speaker's voice (the intonation and emotion in the voice / prosodic features) and other ambient sounds.

Line length

A single subtitle in a TEDx transcript may consist of up to 84 characters. A longer subtitle is difficult to read, and some offline players may automatically break it up to form three or more single lines, covering up to half of the screen. A subtitle that is longer than 42 characters should be broken into two lines. Effectively, one subtitle can consist of up to two lines of up to 42 characters each. Maximum subtitle length in non-English subtitles may differ, especially for languages which do not employ the Latin alphabet. Subtitle and line length is displayed for every subtitle in the new editor on Amara. See this section to learn more about line breaking.

Subtitle duration

A subtitle should not stay on the screen for more than about 7 seconds. A subtitle cannot stay on the screen for less than approximately 1.12 seconds, even if it only contains a single word, because subtitles with a shorter duration will just be a flash that most viewers will miss. Conversely, a short subtitle should not stay on the screen for too long, because that would prompt the viewer to re-read it. If there is a longer piece of music or applause, have the sound representation (e.g. (Music)) display for 3 seconds and then indicate when the sound is about to end (e.g. (Music ends)).

To learn more about how to manage the reading speed of a subtitle on Amara, watch this tutorial. The duration should reflect the average reading speed, but also allow for a little more reading time for relatively "difficult" items that require more attention from the viewer, e.g. proper names or specialized terminology. Importantly, the reading speeds described above reflect values for English subtitles, and may vary for other languages.

Synchronization

Try to match the duration of the subtitle with the time the speaker is saying the equivalent sentence. However, please remember that the reading speed is more important. If the reading speed is above 21 characters / second and text reduction/compression can't help in shortening the subtitle, you can make the duration overlap a little over the time in the video when the speaker is starting the next sentence. This will allow the viewer to read the subtitle before it disappears off screen, which is more important than strictly timing it with what is being said in the audio. Note that this rule should only be followed in case of reading speed problems when all other strategies (e.g. text reduction, breaking up the subtitle into two shorter ones) have failed. Normally, you should try to synchronize the subtitles with what is being said.

The subtitle should not lag after the utterance for more than 2 seconds, but usually such long lagging is not necessary. Do not start the subtitle before the speaker says the equivalent sentence (giving the viewer a glimpse of the future can often be confusing where other cues, like non-verbal language, are not in keeping with what the current subtitle says). It's OK to let the subtitle run a little into the time when the next bit of speech is spoken, if that is necessary for maintaining a good reading speed (no more than 21 characters / second).

What are line breaks?

One subtitle can be composed of one or two lines. In languages based on the Latin script, the subtitle must be broken into two lines if it's longer than 42 characters (because a longer line is more difficult to read than a subtitle composed of two lines, and some offline and online players may not display longer lines correctly). "Line-breaking" refers to choosing the place where the line is broken, and also, how to end the whole subtitle. To make a line break in Amara, hit Shift+Enter. To learn more about how to break lines on Amara, watch this tutorial. Below, you will find a description of useful line-breaking strategies. Please also follow these guidelines when deciding where to end one subtitle and begin another.

Generally, each line should be broken only after a linguistic "whole" or "unit," no matter if it's the only line in the subtitle, or the first or second line in a longer subtitle. This means that sometimes it's necessary to rephrase the subtitle in order to make it possible to break lines without breaking apart any linguistic units, e.g. splitting apart an adjective and the noun that it refers to. Rules for what kind of linguistic unit can be broken vary by language, but these general guidelines can inspire you to make better line-breaking choices in your subtitles.

Don't end the subtitle with a bit of the next sentence

If the subtitle contains the end of a sentence, try not to include the beginning of the next sentence, and instead, put that beginning into the following subtitle. Examples:

Incorrect:

which is how I solved this.
And what I also noticed

is that the blue light went on.

Correct:

which is how I solved this.

And what I also noticed
is that the blue light went on.

Incorrect:

Somehow, this worked really well
in her garage. When you work

on something big,
you need to accept failure.

Correct:

Somehow, this worked really well
in her garage.

When you work on something big,
you need to accept failure.

When to break subtitles - proportional line length

The possible maximum length of a subtitle depends on how long it can stay on the screen. If your maximum length is over 42 characters, you need to break the subtitle into two lines. Actually, it's a good idea to break the line if it's over 40 characters, but you can go with the 42 character length limit when it's really difficult to make it shorter. Ideally, the lines in the two-line subtitle should be more or less balanced in length. So, you should break the line like this:

I adopted a dog, a cat,
three mice, and a goldfish.

...and you should not break the line like this:

I adopted a dog,
a cat, three mice, and a goldfish.

Breaking apart linguistic units for line length

However, it may be difficult to achieve balance in length when trying not to break apart linguistic units. For example, these lines are broken in a way that preserves similar length, but breaks the linguistic unit of the adjective "Romance" modifying the noun "languages":

I can speak over ten modern Romance
languages and read Latin pretty well.

In such cases, it is better to go with something less balanced, but preserve the linguistic unit. However, you should try to make the lines balanced enough so that neither is shorter than 50% of the other - sometimes even at the cost of breaking language units (which is only the last resort). If a line is shorter than 50% of the other line, it can often distract the viewer more than reading a line where a linguistic unit is broken.

For example, the lines in this subtitle are not balanced for length (34/16 characters):

I learned more about Jane Elliott
on Wikipedia.

An easy way of making the lines more similar in length would be to put the word "Elliott" in the second line:

I learned more about Jane
Elliott on Wikipedia.

However, this would break apart the proper name "Jane Elliott," which should be avoided at all cost. Proper names are an example of a linguistic unit that should not be divided. In this case, we could consider breaking apart another linguistic unit:

I learned more about
Jane Elliott on Wikipedia.

Here, we broke apart the verb and the complement. Some linguistic units "keep together" more than others, so if you need to go against non-breaking rules, it is better to break apart another unit and keep them unseparated. Proper names are one example of a unit that should be broken as rarely as possible (you can find more examples below).

Clean line breaks through compressing

Sometimes it may be necessary to rephrase the line in order to make it possible not to break apart linguistic units. For example, in subtitles translated into English, instead of going with this subtitle:

I learned more about Jane
Elliott on Wikipedia.

...you may be able to rephrase your translation (depending on the context) to say:

I learned more about her on Wikipedia.
Then, I read the Wikipedia article.
I learned more about Jane Elliott.
I learned more about her.

In subtitling, this type of rephrasing can be referred to as "compressing." Depending on the context, it may be possible to omit some information, if previous subtitles or other sources (a slide, the viewer's general knowledge) are certain to fill the blanks anyway. This way, you can avoid breaking apart any linguistic units. You can learn more about compressing when transcribing talks in this section below and this guide (meant to be used in subtitle translation, but most of the same rules can often be used when transcribing).

Clean line breaks through rephrasing

Of course, rephrasing is not only about making the subtitle so short that it can fit in one line (no longer than 42 characters). Sometimes, it's difficult or impossible to compress so much, but you can change the structure of the subtitle to make it easier to break cleanly. For example:

About Jane Elliott,
I learned more on Wikipedia.

Now, this is not necessarily good English, but the target language that you are translating into may allow this sort of phrasing. If possible, try to rephrase the subtitle to make it break cleanly without the need to sever any linguistic units.

Examples of correct and incorrect line-breaking

These examples show incorrect and correct line breaking for various subtitle/line lengths. The possible maximum length of a subtitle depends on how long it can stay on the screen. Unlike in the examples below, line length would normally be different for each subtitle. These examples show line breaks not divided into subtitles of up to two lines (the way we organize lines into subtitles depends on the talk).

Spoken sentence:

This is a very long, verbose piece of prose that no one knows and no one shall remember.

Incorrect short line breaks:

This is a
very long, verbose
piece of
prose that
no one knows and
no one shall
remember.

Correct short line breaks:

This is a very long,
verbose piece
of prose
that no one knows
and no one
shall remember.

Incorrect medium line breaks:

This is a very long, verbose
piece of prose that no one
knows and no one shall remember.

Correct medium line breaks:

This is a very long,
verbose piece of prose
that no one knows
and no one shall remember.

Incorrect long line breaks:

This is a very long, verbose piece of prose that
no one knows and no one shall remember.

Correct long line breaks:

This is a very long, verbose piece of prose
that no one knows and no one shall remember.

Simple rules-of-thumb for line-breaking

It is impossible to provide a list of rules to use with all the languages in the world. Line-breaking rules depend largely on the target language's grammar (and morphology) - on what kind of units are "wholes" in a sentence. The list below contains some rules that can be used in English and several Western-European languages and can serve as an inspiration to searching for similar rules in your own language.

Synchronizing line breaks

If possible, the line breaks should be synchronized with pauses between (or within) the speaker's utterances, as this will make it feasible to use the standard 250 ms break between subtitles, and make it easier for the viewers to follow what is being said.

Synchronizing line breaks with long pauses

If the speaker's voice trails off, the subtitle can be displayed over (cover up) the pause, provided that it is possible to adhere to the character length and duration time limits. If this "stitch-up" subtitle would have to stay on the screen for too long, of if the subtitle line covering up the pause would need to exceed the character limit, the first part of the broken utterance (before the speaker's voice trails off) can end in the em dash (--) or whatever is used to signify a broken-off utterence in the language you are transcribing. If the following utterance (after the pause) can be considered as a new sentence, the first word should begin with a capital letter. If the following part of the utterance cannot be considered as the beginning of a new sentence, it is sometimes necessary to insert a word in square brackets at the beginning of the line, in order to remind the viewer what the speaker talked about before the pause, e.g.:

SPOKEN:

And there are many things that I like a lot, my books, my iPad...
(3 seconds of applause)
...my bicycle, my cats and my hat collection.

TRANSCRIPT:
And there are many things that I like a lot, my books, my iPad--
(Applause)
My bicycle, my cats and my hat collection.
SPOKEN:

My grandmother liked many things, she read a lot, played games on her iPad...
(3 seconds of applause)
...rode her bicycle, talked to her cats and bought new hats for her collection.

TRANSCRIPT:

My grandmother liked many things, she read a lot, played games on her iPad--
(Applause)
[She] rode her bicycle, talked to her cats and bought new hats for her collection.

Cuts and on-screen changes

Subtitles function almost as an additional layer of editing, because they can connect or divide up cuts and scenes. The transcriber must bear this in mind when synchronizing the subtitles and breaking the lines, and should make sure that the line breaks reflect on-screen changes, preserving the flow of the video. Very often the subtitle will need to reflect on-screen content, such as when the speaker refers directly to visual information presented on a slide, or talks about something in the immediate physical environment (e.g. miming something while describing it).

Cueing and line-breaking for translation

Thanks to the Open Translation Project, every talk has a chance of being translated into many different languages. Keeping the lines within the character limit, ensuring adequate on-screen duration and putting line breaks in the correct places also helps the translators in creating foreign-language subtitles that are easy to follow and carry the original message across. Due to differences between languages, a short subtitle in English may turn out to be quite long in the target language, and vice versa. Even though the translators are able to compress the form of the translated subtitle, e.g. by omitting padding expressions and simplifying the syntax[1], sometimes such compression may be impossible. The most difficult cases are acronyms (e.g. "PTA meeting", "FDA approval"). Because the target language may not have a recognizable acronym for the same concept, the translators must very often use the full form of the name. Even though the translators are able to alter subtitle duration time, most inexperienced volunteers will prefer to keep the original duration. For this reason, it is advisable to use the acronym when the speaker uses it, but to try to make the duration of the subtitle containing the acronym a little longer (e.g. lagging one second), if possible, to allow more on-screen time for the translation of the full form of the name that the acronym refers to.

Spelling and punctuation

This section can suggest some spelling issues to think about, but you should always consult rules applicable to your language. For technical reasons, some generally accepted spelling and punctuation rules may not apply to subtitles. You should consult on this with subtitling professionals in your language and share your findings with other transcribers who work in your language (e.g. by creating an article in your language's OTPedia).

It is important to decide on a spelling and punctuation convention before starting the transcript. For example, TED transcripts of English talks use US spelling and punctuation rules (see the Wikipedia article on American and British English spelling differences). Such choices are also important when working in other languages with several regional variations (e.g. in French or Portuguese). Spelling and punctuation conventions for your language can be found in respected "official" sources, many of which can be found online.

Avoiding character display errors: simple quotes, apostrophes and dashes

Using smart/curly double quotes (“”) is precarious, because some players can have trouble showing them correctly. Please use the simple, straight ASCII double quotes (") and the straight apostrophes ('') for single quotes. The rule is similar for apostrophes: use the straight apostrophe (') instead of the typographic/curly apostrophe (’). Instead of an en/em dash (–/—), use a hyphen (-). For other characters in your languages, as much as possible, use a simple ASCII equivalent (research to find one for your language); this may go against strict typographic conventions, but the technical limitations of most subtitle formats mean that without this simplification, many of the "proper" characters will simply not be displayed for some users (e.g. playing talks offline). Note that these rules apply to subtitles only, and you can use proper punctuation in titles and descriptions.

Commas, colons and semicolons

A subtitle should preferably not end in a colon or semicolon, because these characters are not very visible at the end of the line[2]. The subtitle can end in a comma.

Abbreviations

Subtitles reflect spoken language, and thus should not contain elements typical to written language. "E.g." for "for example" and "i.e." for "that is" should not be used in subtitles. Abbreviations of any kind should not be added if they had not been used by the speaker, in spite of the fact that they may make a difficult subtitle shorter. The only exceptions to this are standard abbreviations for units of measurement (e.g. ft for "feet").

Capitalization

Capitalization rules vary from language to language. If the speaker is citing a title in English, or using a word that is capitalized in English, the transcript should conform to the appropriate English spelling rules (British or American). However, if the speaker is citing a title in their first language, the transcript should employ capitalization rules for that language. This also covers cases where the title or proper name is transliterated and does not have an established translation in English (or any language being transcribed).

Spelling in titles

Most words in movie and book titles, and usually in song titles, are capitalized in English (for which words not to capitalize, see Capitalization in Titles at the Writer's Block website). The rules governing the capitalization of article, report and paper titles vary, with some sources suggesting that the words in the title should be capitalized according to the rules for capitalizing book titles[3], while others suggesting that only the first word of such titles should be written with a capital letter[4]. TEDTalks titles follow the latter convention, with only the first word in the title capitalized (the first word in the talk title is almost always the speaker's first name). If the talk title contains a colon after the speaker's name, the first word after the colon is capitalized (e.g. "Paul Bloom: The origins of pleasure"[5]).

While book and movie titles are normally written in italics, TED transcripts do not use rich formatting and therefore putting text in italics is not possible. Quotation marks should be used instead (single or double, depending on whether the transcript should conform to British or American spelling rules, respectively). If a speaker forgets a title in English and replaces it with the equivalent from their first language, the English title should be written in square brackets, e.g.:

SPOKEN:

You know, she's like the bear in... "Pu der Bär".

TRANSCRIPT:

You know, she's like the bear in ["Winnie the Pooh"].

One exception to this is when the speaker or somebody in the audience immediately recollects the English title, or any other reference is made in the talk to the speaker's using a title from a different language. In such cases, using a title from a different language becomes part of the talk, and the original title must be kept.

Capitalization in proper names

Proper names are words used for unique entities. Proper names are capitalized in English. Multi-word proper names usually follow the capitalization rules for book titles, with most of the words capitalized [6].

Many words have a different meaning when capitalized. For example, according to the International Astronomical Union guidelines, the word "sun" should be capitalized when referring to the unique entity in Earth's solar system (i.e. the Sun), but is not capitalized when used as a common noun signifying a star in another system.[7]

Special characters

Em dash

In English, TED and TEDx transcripts use an em dash instead of dots. An em dash entered as two consecutive hyphens (--) is converted into a proper em dash. An em dash (—) can also be inserted into a text file (like a subtitle file) by holding down the Alt key and typing 0151 on the numeric keyboard.

Accented letters

Many accented letters found in languages that use the Latin alphabet (e.g. ó, ö), as well as commonly used special characters (e.g. ©), can be easily typed on Windows and OS X using a number of codes. Otherwise, one can insert a special character in a rich-formatted word processor (like LibreOffice Writer) and then copy it and paste into the online or offline subtitling tool that you are using. This method will not work with all special characters. The "Computing with Accents, Symbols and Foreign Scripts" website from Penn State University offers a very useful guide to typing special characters in Windows and OS X.

Importantly, such characters may be necessary even in English-language transcripts, when they appear in proper names without an established English transliteration, e.g. "Jónas Hallgrímsson" (the name of an Icelandic poet).

Spellchecker

Most offline subtitling tools offer a spellchecking feature. In online subtitling tools, plugins for the web browser can check the spelling of any text entered into a box. Alternatively, an exported subtitle file can be opened in a word processor with a spellchecking feature. If the particular word processor does not work with UTF-8 encoded text, open the file in any text editor that supports this format, and then copy and paste the text into the word processor. After making changes, copy the text in the word processor and paste it back into the subtitle file opened in the text editor.

Using HTML tags like <i> </i>

You should not use HTML tags in TEDxTalk transcripts, because these tags will not display correctly in the YouTube player. The subtitles that you create for TEDxTalks will be used on YouTube videos, and even though HTML tags may be displayed correctly in some offline players, YouTube users will just see the tag itself, so your subtitle would look like this:

This is how I am using <i>italics</i> for emphasis.

Note that to break a subtitle into two lines, you can simply use Shift-Enter in Amara, instead of using a HTML tag.

Sound information

Sound representation in a transcript is meant to enable deaf and hard-of-hearing viewers (as well as viewers watching the talk without the sound on) to understand all the non-spoken auditory information that is necessary to comprehend the talk to the same degree that a hearing audience potentially would. In TED transcripts, sound information is enclosed in parentheses, with the first word starting with a capital letter. There are generally two types of sound information used in TED transcripts: sound representation and speaker identification.

Duration of the sound representation

The line-length and duration rules for subtitles with sound representation are generally the same as for any subtitle. However, even if there is a longer piece of music playing, or a longer bit of audience applause, don't make the sound representation stay on the screen for more than 3 seconds. It's enough to indicate that the music or applause has started.

If a video consists of more than one music piece and no talk at all, indicate the beginning and end of each piece, with (Music) and (Music ends), respectively, so that the audience knows what is going on. Place the (Music ends) subtitle no longer than about 1.5 seconds BEFORE the end of the given piece of music (not after). Note that this only applies if there is a pause between the different pieces of music - if they flow into one another continuously, you do not need to indicate their boundaries.

Similarly, if the video combines some speaking from the stage, some music, then no music for a while, and then the music comes back, you need to signify again that the music has come back.

Phrasing the sound representations

Note that sound representations are not like stage directions (in a script or play), and they represent sounds, not the actions that cause the sounds. For example, the sound label should be (Gunshot) not "(Dog fires gun)."

The sound representations should also be short and have a simple grammatical structure - subject + active verb. For example, the sound representation should say: (Glasses clink), and NOT (Clinking of glasses) or (Glasses clinking).

Indicating a change of language

If a speaker speaks in a language different than the main language of the talk, you should indicate the language but translate the text:

(Arabic) Something.

There may be cases when the foreign language phrase was meant to be misunderstood by the audience. For example, the speaker may be quoting something she heard in a foreign language and originally did not understand, and then proceed to explain what the phrase meant a few minutes later. In this case, you should consider leaving the foreign phrase in the transcript.

You can reach out to other volunteers in the OTP community to help you identify parts of the talk in a language that you don't understand (for example, through the I transcribe TEDx talks or I translate TEDTalks Facebook groups or by contacting one of the Language Coordinators for the given language, using this list to find them).

Indicating sentence stress/emphasis

Do not indicate sentence stress (the way a certain word is emphasized in a sentence) with capital letters ("This is NOT what I'm talking about") or italics.

Common sound representation

The most common sound representations in TED/TEDx transcripts are:

Try to look at some other transcripts in your language to see what people have been using as the equivalents of these most common sounds, and use the most common one (ideally, there should be one sound label for one type of sound throughout the transcripts in one language, and not a few different versions, like (Applause) and (Clapping)).

Uncommon sound representation

There are many possible types of sounds that need to be represented in the transcript. For example, at this point in this TEDxKrakow talk[8], the transcript contains the phrase "(Phone rings twice)." The fact that the phone rings was represented in the transcript because the speaker pauses, and the slide with the phone is made prominent. Without the sound representation, a non-hearing viewer may have been confused as to why the speaker paused (why there are no subtitles representing spoken utterances) and what was meant to be conveyed by the slide with the picture of an old-style telephone. Additionally, the example of the phone ringing is referred to later in the talk, which serves as another reason why the sound representation must be there. However, in this particular talk, it was important not only to point out that the phone rang, but that it rang twice. The information about the phone ringing twice was included because the speaker later contrasted this audio example to the phone ringing only once. Because of this, the "sound information" that needed to be represented in the transcript became "phone ringing twice." If the speaker just intended to play the sound of a phone ringing in their talk, it would not be necessary to point out that the sound consisted of two separate rings, and the sound representation would thus simply be "(Phone ringing)."

Speaker sounds

Important sound information can also include sounds made by the speaker, e.g. (Gasping), (Hooting). It is necessary to represent these sounds if they are not made accidentally, but instead constitute an important part of the talk, e.g.:

Do you know how I felt after talking the whole day? (Gasping) I had to take a day off after that.

These types of speaker sounds must also be represented in the transcript if they are later referred to in some way, even if the sound was produced accidentally (e.g. if the speaker clears his throat and says "I wish they gave us more water").

Environmental sounds

There are sounds that are not an important part of the talk and elicit no visible reaction from the speaker or the audience (e.g. a shutter sound from somebody taking a picture in the audience), and so, they do not need to be represented in the transcript. The only exception to this rule is when a coincidental sound causes the speaker or the audience to react in a visible way. For example, if somebody in the audience drops a plastic bottle and the speaker jumps and then laughs, the sound of the bottle falling needs to be represented, in order to give the non-hearing viewers an idea of why the speaker reacted in this manner.

Screaming, shouting

Sometimes, it may be important to indicate that the speaker is intentionally raising their voice. In such cases, use sound cues like (Screaming) or (Shouting). Do not use capitalization to indicate shouting (e.g. I AM SHOUTING!) or intonation (e.g. I am going to stress THIS word in this sentence).

Speaker identification

Speaker changes need to be represented in the transcript. Additional speakers may appear if the speaker who began the talk is joined by another speaker on stage (e.g. for a question-and-answer session), or if video or audio material featuring spoken utterances is included in the talk. In TED transcripts, speakers are indicated by their full names and a colon the first time they appear, and by their initials (no periods) when they appear again in the same conversation. Consider this example:

Oh, you've got a question for me? Okay. (Applause)

Chris Anderson: Thank you so much for that. You know, you once wrote, I like this quote,
"If by some magic, autism had been eradicated from the face of the Earth, then men
would still be socializing in front of a wood fire at the entrance to a cave."

Temple Grandin: Because who do you think made the first stone spears? The Asperger guy. (...)

CA: So, I wanted to ask you a couple other questions. (...) But if there is someone here
who has an autistic child, or knows an autistic child and feels kind of cut off from them,
what advice would you give them?

TG: Well, first of all, you've got to look at age. (...)

Source: Temple Grandin: The world needs all kinds of minds[9]

Re-identifying speakers

If some time has passed since a given speaker was introduced, when they start speaking again, they need to be re-identified by their full name, not just the initials. For example, if a talk by speaker X features a short video with speaker Y, and the video is paused and then continued five minutes later into the talk, speaker Y must be identified again by their full name when they start speaking in the video again, because without access to sound information, a non-hearing viewer may not be able to tell that it is the same speaker as in the first part of the video.

Identifying off-camera voices

Any comment from off-camera also needs to be identified by the speaker's name. If the comment comes from the audience, it can be identified generically with just the word "audience" used as a sound representation cue, i.e.:

(Audience) I want to add something!

Transcribing on-screen text

In some instances, you may be able to transcribe important text that is displayed in the video (e.g. on a slide). Transcribing text visible in the video makes it possible to translate it into other languages. Put square brackets ([]) around anything in your transcript that represents on-screen text.

Note that transcribed text should obey the reading speed and line length limits (22 characters/second and 84 characters per subtitle, respectively). Do not transcribe on-screen text that is not relevant to the content of the talk or that will not be translated (e.g. the name of the TEDx event).

Transcribing videos shown in the talks

Speakers sometimes show videos to emphasize or illustrate their topic. In these cases, videos are important part of the speaker's message and should be transcribed as well. They should start with (Video) and continue following the general rules for identifying speakers and other sounds. If speakers are unknown, put (Man), (Woman) or (Man1), (Man2) to distinguish between more participants.

If the video is in a language different from the language of the TEDx talk, and already have embedded subtitles, copy them to the transcript.

Editing/compressing the talk

When working on subtitles, one is normally required to compress, omit certain linguistic items from the original spoken dialog (e.g. padding, emphasizing constructions), and rephrase certain complex syntactic structures to make the subtitle easier to follow (e.g. changing the Passive Voice into Active Voice).[10] There are many cases where some degree of editing is necessary to preserve the speaker's intended meaning and maintain the reading speed and subtitle length standards.

Types of linguistic issues that may need editing

Mistakes that may change the intended message of the talk are especially apparent in TEDx talks delivered in English by non-native speakers. In each case, however, one needs to be very careful not to alter the speaker's intended meaning while editing the transcript, and if there is any doubt as to whether altering part of the original talk may result in changing the intended meaning, it may be preferable to retain the original wording or consult with the speaker before making any modifications.

Types of mistakes that may require editing include:

Using square brackets to mark editing

If the correction you are making does not change the meaning of the given sentence and instead fixes a simple omission or slip of the tongue, do not use square brackets and simply use the correct phrase (e.g. if the speaker says "She do this often," your subtitle should say "She does this often," not "She [does] this often"). Use square brackets when on the face of it, the omission or slip of the tongue could change the meaning of the sentence but you are certain that it was not intentional (e.g. a speaker talking about going up and accidentally saying "down" at one point).

Examples of changes in transcripts

Incorrect vocabulary

ORIGINAL: (...) they know, from generation to generation, how to protect and prevent the land (...).

EDITED: (...) they know, from generation to generation, how to protect and [preserve] the land (...).
Source: Jadwiga Łopata: Food Sovereignty and the Family Farm[11]
ORIGINAL: These people are in many areas more vulnerable, or sensible (...).

EDITED: These people are in many areas more vulnerable, or [sensitive] (...).
Source: Łukasz Cichocki on the Pan Cogito hotel[12]

Slip of the tongue

ORIGINAL: I'm over and over again (...) intrigued the profound effects such movement lessons may have on us,(...)

EDITED: I'm over and over again (...) intrigued [by] the profound effects such movement lessons may have on us,(...)
Source: Jacek Paszkowski on the Feldenkreis Method[13]
ORIGINAL: They were the first on the market, and they are the leader, that is no doubt.

EDITED: They were the first on the market, and they are the leader, [there is] no doubt.
Source: Marcin Iwiński and Michał Kiciński: Think different - it's still extremely up to date[14]

Multiple syntactic issues, repetition

ORIGINAL:
I was several times asked by journalists
why in Wrocław there is possible some things
which is not possible or would not be possible
in Warsaw or even in Cracow.

EDITED:
I was asked several times by journalists
why some things are possible in Wrocław
which are not or would not be possible
in Warsaw or even in Cracow.
Source: Mirosław Miller: Dream Dealers from Wrocław[15]

What not to edit

Importantly, editing the talk (i.e. not transcribing verbatim) should be limited to cases where preserving the original wording would make it very difficult or impossible to follow the meaning of the talk (e.g. because a word-for-word transcription would mean a reading speed over 21 characters/second).. There may be words and phrases in the talk that do not conform to the transcriber's standards of style, such as colloquialisms/slang, swear words, and stylistic and grammatical issues that do not make it impossible to understand the talk (e.g. double negatives). Changing words like these based on the transcriber's preference or beliefs about grammatical correctness amounts to altering the speaker's style, and as such should be avoided on ethical grounds. Note: you should always conform to the spelling conventions in your language; do not use "phonetic spelling" in your transcripts, even if the speaker sounds "slangy."

External links

Subtitling articles and guidelines

Subtitling tools

Online subtitling tools

Offline subtitling tools

All of the offline tools listed below are freeware. Most of them can also be used to convert between subtitle formats.

Linux
OS X
Windows

Character encoding

Other tools

Playing videos with .srt subtitles

Most offline subtitling tools can also be used to play the video with subtitles. However, stand-alone players are usually more convenient.

For more information on how to play videos with subtitles, including instructions on obtaining subtitles to TEDTalks to play with the videos, see this guide.

Spelling and punctuation

Spelling

Punctuation

References

  1. Karamitroglou, Fotios. Subtitling Standards -- A Proposal. Retrieved 2011-08-03.
  2. Karamitroglou, Fotios. Subtitling Standards -- A Proposal. Retrieved 2011-08-03.
  3. The Mayfield Handbook of Technical and Scientific Writing. Section 9.1. Capitalization. Retrieved 2011-08-03.
  4. Baker, David S. and Lynn Henrichsen. APA REFERENCE STYLE: Articles in Journals. Retrieved 2011-08-03.
  5. Bloom, Paul. The origins of pleasure. Talk delivered at TEDGlobal 2011. Retrieved 2011-08-03.
  6. The Mayfield Handbook of Technical and Scientific Writing. Section 9.1. Capitalization. Retrieved 2011-08-03.
  7. International Astronomical Union. Naming Astronomical Objects. Retrieved 2011-08-03.
  8. Moskal, Paweł. Medical imaging with anti-matter. Talk delivered at TEDxKrakow 2010. Retrieved 2011-08-03.
  9. Grandin, Temple. The world needs all kinds of minds. Talk delivered at TED2010. Retrieved 2011-08-03.
  10. Karamitroglou, Fotios. Subtitling Standards -- A Proposal. Retrieved 2011-08-03.
  11. Łopata, Jadwiga. Food Sovereignty and the Family Farm. Talk delivered at TEDxKrakow 2010. Retrieved 2011-08-03.
  12. Cichocki, Łukasz. Łukasz Cichocki on the Pan Cogito hotel. Talk delivered at TEDxKrakow 2010. Retrieved 2011-08-03.
  13. Paszkowski, Jacek. Jacek Paszkowski on the Feldenkreis Method. Talk delivered at TEDxKrakow 2010. Retrieved 2011-08-03.
  14. Iwiński, Marcin and Michał Kiciński. Think different - it's still extremely up to date. Talk delivered at TEDxKrakow 2010. Retrieved 2011-08-03.
  15. Miller, Mirosław. Dream Dealers from Wrocław. Talk delivered at TEDxKrakow 2010. Retrieved 2011-08-03.
Personal tools
Namespaces
Variants
Actions
Navigation
Languages
Toolbox