Thursday, September 17, 2009

How YouTube Works

This report is based on the interesting seminar on last Friday, Sep 4th 2009. Actually, for security reasons, no one knows exactly how YouTube works, except for professional programmers working for Youtube. Therefore, most of the discussion we did in the seminar about this biggest video-hosting site is theoretical and based on subjective opinions. However, getting to brainstorm the process of creating such a great video-hosting site brought us a lot of inspiring ideas.

In the seminar, we discussed 6 problems that a video hosting site has to confront and how YouTube solves the problem.

1. Video compression:

According to the latest statistics, 20 hours of videos are uploaded to YouTube every minute. Assume the video has resolution of 640x480, frame rate of 30 frames/second and the frames in the videos are stored as consecutive bitmap images (1 pixel on the image is represented by 3 bytes), without compression technology, YouTube would have to store approximately 2 TB of video per minute. Even if videos had the same compression rate as a Word document (~25%), YouTube would still have to store 0.5 TB of information losslessly compressed.

Therefore, a video compression technology must be applied to reduce the amount of information that we have to store. As most of you know, any color can be displayed on the monitor by mixing a certain degree of red, green and blue together, and an image in bitmap format also stores color of each pixel as RGB. However, there are other ways to represent a color and one of these system is called HSL (Hue, Saturation, Lightness/Luminosity). Since human eyes are more sensitive to brightness than color, the piece of data representing brightness can be thrown away for a certain number of pixels in the picture.

In a video, there are a lot of temporal redundancies, i.e. the part of the image that stays the same for multiple frames. Hence, we can record the changes between one frame and the next instead of the whole frame to reduce the amount of information. Many video codecs use this principle to reduce the size of the video but still retain a part of the quality. Another method to bring down the video size is to reduce the quality, i.e. sharpness of image, video resolution, etc. All the methods we discussed above use the principle of lossy compression: since we compress by throwing away information, it's impossible to decompress it back to the original.

In fact, YouTube is using FLV format to serve video to users. This format allows a typical compressed video to be further compressed to only 25% of the original size. Of course, that comes with a reduction in resolution and quality. Recently, YouTube has introduced MP4 format to cater videos in high definition mode to users.

2. Video distribution:

YouTube has to serve 1.2 billion views daily. So how do they manage to stream the video to viewers at high speed and at the same time keep their Internet bill low?

Firstly, YouTube installs servers in many places around the world. This allows much faster connections between the users and the server because instead of establishing a possibly slow connection between 2 locations on opposite sides on the Earth, a local guaranteed faster connection is established for uploading and downloading videos.

Secondly, aside from using their own CDN (Content Distribution Network), YouTube partners with other CDNs such as LimeLight, LiveStream (formerly known as Mogulus), etc. to serve their videos. There has been an instance where YouTube relies on Akamai, another CDN especially experienced in streaming live content, to stream YouTube concert live to 700,000 concurrent viewers.

Thirdly, YouTube signed Peering contracts with ISPs to bring down their Internet bill. Basically, if ISP A transfers data to ISP B and vice versa for the same amount, both of them don't have to pay each other money for carrying the data.

3. Video fingerprint:

While YouTube is an effective way to share one's own video footage, it also offers the potential of spreading copyrighted contents, pornography, defamation and material encouraging criminal conduct. Therefore, a system to detect and delete these kinds of videos as soon as possible is needed. In our discussion, we mainly take guesses at how YouTube detects copyrighted contents uploaded to their server in only several minutes without relying too much on the users to flag the videos.

Video fingerprint is defined as features that make the video unique and therefore easily recognizable when given the fingerprint. Several ideas for creating a video fingerprint are suggested during the discussion. One idea is to compare the uploaded video with the copyrighted content to find any similarity. However, it would require a copy of the source on the server, and that would require a lot of resources. Another idea is to store the color ratio of several-minute-worth number of frames as fingerprint to lessen the amount of data that has to be stored, to enable the video to be identified regardless of size and also to prevent false recognition. There is also another suggestion that checksums of the pirated versions of videos (e.g. CRC32, SHA1, etc.) be stored to compare with what the user uploaded. The fingerprint produced with this method is truly unique and small compared with the previous ones. However, this method will not work if the file is edited or split into multiple parts.

4. Video thumbnails:

Video thumbnails are the tiny images that give the user a glimpse of the content inside the videos. Most users decide a video worth watching or not based on what they see in the thumbnail. Therefore, having a good thumbnail that reflects the content inside is very important and will decide the popularity of a video. Currently, YouTube captures the frames at 25%, 50% and 75% of the video to use as thumbnail. However, we all agree that this method does not guarantee to give user the best overview of the video. In this section, we mainly discuss how we can improve the system of capturing thumbnail to make it display the subject-matter of the video.

We can't just pick the frame at a particular position or randomly pick any frame because that has been proven ineffective through practice. Hence, we have to intensively process the frames inside the video to pick out the most appropriate one(s). There are many suggestions proposed during the discussion: face recognition (e.g. get the face of the distinguished person in the video showing his speech), OCR (e.g. get the text/banner/etc. in the video that matches the title), or search for the object that occurs most frequently in the video.

5. Video recommendation

Video recommendation is a function on YouTube meant to provide viewers with possibly useful videos to watch next. However, all of us agree that sometimes, this feature doesn't help us choose a good video to watch at all. In our discussion, we go over some of the possible methods to improve the system of recommendation.

The first way is, we can ask the users to indicate their preferences in the profile so that a better recommendation can be made based on information given. The second way is to record what the previous user views next after watching the same video, we then can sort out the most popular ones to recommend to the user in question. The third way is basically similar to the second way; however, instead of looking at the statistic of only one video and only the previous user, we will search for other users who have watched the same videos, then look for videos that other users have also watched but our user in question haven't and recommend it to the user in question. As we can see, the second method will work better for new users who have yet to watch many videos, and the third method will be more effective to old users who have a history of watching video that we can rely on to give our recommendation.

6. Video spammers

YouTube implemented video response function that allows users to post a video in response to other users' video. However, this function has been misused by malicious users to post video responses irrelevant to the topic in discussion. Their objectives include and not limited to
advertise, increase the popularity of their own videos, distribute pornography or simply pollute the system. (From this article). Hence, we need to devise a strategy to detect and apply punitive action against video spammers as soon as they pop up.

Our strategy involves in compiling a list of 'good' users and 'bad' users based on their history (how many times have their videos violated term of service, how many times have they been warned/(temporarily) banned for their actions, how many times have they been flagged by other users, did they post the same spam video anywhere, etc.). We will then find the common attributes among the 2 groups of users and use that to classify new users. In the discussion, several characteristics of a video spammer has been suggested, e.g. no favorite video, no friend, not having watched many videos, new user, etc.

In conclusion, although we didn't get to know exactly how YouTube works behind the scene, the seminar has educated us about the difficulties YouTube encountered in the past and is facing now, and how they resolved and can possibly resolve the problem.

by Hong Dai Thanh and Dang Dung Ha

*We apologize for putting this up more than 2 weeks late.

1 comment:

  1. Very good summary Dai Thanh and Dung Ha. Definitely worth the wait. Keep up the good work :-)

    ReplyDelete

Followers