Suppose you're looking to build real-time audio and video sharing applications for the web or native platforms using WebRTC, or you're just curious about what WebRTC is, and you want to learn about how it works. In that case, you are in just the right place because, in this guide, we will explain what is WebRTC, the components that power WebRTC under the hood, and then take a quick look at how to build video and audio calling apps using the WebRTC standard.
The basics of WebRTC
Web Real-Time Communication (WebRTC) is an open-source project and specification that allows browser-enabled devices, such as mobile phones and computers, to communicate in real-time using media such as voice and video. WebRTC enables users to send and receive media using browsers, thereby eliminating the need for otherwise complex third-party hardware or plugins.
Computers and devices with browsers such as Mozilla Firefox, Safari, Google Chrome, Microsoft Edge, and other chromium-based browsers support WebRTC as it is built-in to their native and mobile platforms.
WebRTC is very useful in modern-day communication as it enables direct communication between two or more clients.
Implementing real-time communication with a technology like WebSockets for example, would involve a client-to-server-to-client-to-server process which implies that for two Clients A and B for to communicate, client A would have to send some information to the server S, who in turn would relay that information to Client B, if Client B has a message that needs to get out, it would also have to send that information to server which in-turn proceeds to notify other clients on the network and vice-versa.
While this works for most use cases of real-time communication, there is usually some milliseconds of delay caused by data having to go through the server first.
WebRTC on the other hand aims to remove that latency by enabling a “ client-to-server-to-client-to-client” process.
What this implies is that Client A needs to get Server S (in this case a special type of server known as a STUN server) to relay some info (i.e. a connection request that contains port numbers, IP addresses and permission information) to Client B at first, if Client B accepts the request, it relays its own information back to the server in-order for client A to start the communication process.
Once permission is granted by both clients, the server is in a way, removed from the process as Client B can now directly communicate with Client A without the need to go through the server anymore.
How WebRTC works — the underlying components
While there are many underlying concepts behind WebRTC, the following are some of the ideas behind how WebRTC works.
APIs
Identification (signalling)
Data type detection
WebRTC HTML5 APIs
WebRTC uses three main APIs: the MediaStream API, which grants access to users' camera and audio using JavaScript; an RTCPeerConnection that enables participants to connect directly; and an RTCDataChannel that allows a two-way data transfer between peers.
MediaStream API
WebRTC makes use of JavaScript's getUserMedia() method to access a device's camera and audio. It's one of the critical elements of the WebRTC specification because users cannot share audio and video without access to the other device from where they are sent. It also provides a way to control the user's device to capture and render video and audio content.
An example usage of this API is shown below:
RTCPeerConnection
The RTCPeerConnection is an API that represents a WebRTC connection from one device to another. It allows a local computer to connect to a remote peer by providing methods to create a connection between two computers, maintain and monitor the connection, and close the connection once there is no need for it anymore. WebRTC works by taking media sent from JavasScript's MediaStream API and sending it to the RTCPeerConnection created to create an audio or video feed.
RTCData Channel
The RTCDataChannel API enables the two-way transfer of media and other data types between peers.
The WebRTC RTCDataChannel was designed to be similar to the WebSocket API, with a few notable differences, one of which is using a UDP-based stream that lets it be configured using the Stream Control Transmission Protocol (SCTP) protocol instead of a TCP connection, which is prone to bottlenecks.
In general, the key takeaways when using the APIs WebRTC provides are to:
Access the device camera and microphones using the MediaStream `getUserMedia()` API
Create a peer-to-peer link between users' devices using the `RTCPeerConnection` API
Transfer video and audio data to the connection created above using the `RTCDataChannel` API
And an optional step is to record audio and video sent using the MediaRecorder interface
Identification
For developers to achieve real-time communication between two or more devices, there needs to be a way for computers to identify one another correctly. WebRTC implements identification through the use of particular protocols called STUN and TURN. This process is known commonly as "signaling."
WebRTC signaling
WebRTC signaling refers to the processing of setting up, monitoring, and terminating a communication session when the need arises.
Let's go on to dissect some of the terminologies relating to WebRTC signaling below:
STUN : A Stun server shares information such as the IP address, connectivity status, and PORT address of a device in a local network behind a NAT.
Network Address Translator (NAT): A Network Address Translator allows a router to modify packets for multiple devices to share resources through a single Public IP address.
TURN: For most use cases of WebRTC to be achievable, there is a need for a server to act as an intermediary responsible for relaying traffic between peers as a two-way socket unless a mutual local network connects them. A TURN server (an acronym for Traversal Using Relay NAT) is a protocol for relaying network traffic between two or more devices that acts as a bridge to tackle this issue and so developers can use a TURN server for relaying media between two or more users when a STUN server is not enough.
5 reasons to use WebRTC
1. WebRTC removes the need for extra apps.
One of the main reasons behind the creation of WebRTC was to find a way to enable peer-to-peer communication between two or more computers, and thankfully not only does it perform that, but it also does so without extra plugins or apps from users. It's basically Skype or Zoom but built into your browser. How awesome is that?
2. WebRTC is embedded into web technologies.
Because WebRTC enables a client-to-client communication, the server does not need to utilize any more resources or carry more load than it should.
3. Security
In addition to having encryption built-in by default, WebRTC employs proven network principles and protocols to ensure data-transfer security. These strict encryption protocols help remove the need to utilize third-party tools and services regarding data protection.
4. Open-source
The WebRTC specification is open-source and backed by top companies such as Google, Microsoft, Facebook, and Apple. Being open-source translates to having a lot of people working actively to improve it for use by everyone.
5. Lower latency
As I mentioned in the chart above, WebSockets achieve real-time communication via a client-to-server-to-client-to-server approach where information that needs to.
3 Challenges with WebRTC
While WebRTC is a fantastic idea that has helped billions of people achieve real-time communication, some challenges developers and companies may face along the way are worth considering.
1. WebRTCuUses TCP - UDP protocols.
One of the main challenges with WebRTC is that it uses TCP and UDP protocols.
Transmission Control Protocol (TCP) is a connection-oriented standard of protocol that defines how applications can exchange data through a network conversation.
While WebRTC implementation of TCP is highly beneficial in the sense that it helps the user send and receive data with no loss or damage, it has a shortcoming in terms of scalability that translates to you not being able to send a full HD video to more than five people at the same time for example.
WebRTC tackles this issue using the User Datagram Protocol (UDP), allowing users to transfer high-quality media without limitation on recipients' numbers. While this is a perfect solution, it also comes at a cost because UDP is known to permit packet loss (i.e. transmitted files stand an unpleasant risk of being incomplete or damaged). Packet loss isn't usually a dire cost for video and audio calling apps, as a few frames may get lost in transit without so much worry. But when it comes to important documents such as PDFs and the likes, where even as little as a few bytes of data is lost in the transfer process, this could be a great cause for concern as the entire file may become corrupted or unreadable.
2. Browser compatibility
Browser compatibility is one area you should consider if you're looking to build large-scale applications that involve WebRTC and you intend to carry everyone along.
As I have mentioned earlier in this article, browsers such as Chrome, Mozilla, Opera, and some versions of Safari support WebRTC out of the box, while others such Microsoft Edge, Internet Explorer may need additional plugins or extensions to allow WebRTC to function fully and without hitches.
3. No standard signaling protocol
Aside from browser compatibility, another downside worthy of note when it comes to using WebRTC is that it has no standard protocol regarding signaling. The lack of a standard signaling protocol translates to that different companies and individuals building real-time applications that utilize WebRTC would not have a uniform way to implement signaling, hence, they would have to figure out their individual implementation methods.
Conclusion
There are many ways to build real-time voice and video calling features in chat apps. You can choose to develop and manage these features entirely from scratch, but that would take a relatively massive amount of time compared to using a managed WebRTC service such as CometChat, which empowers you to build video and voice calling solutions for your business.
With an array of SDKs and plugins supporting various technologies and frameworks ranging from Javascript, PHP, Python and more, CometChat helps reduce the development timeframe it takes to get a secure, stable and efficient WebRTC Application up and running. Try CometChat for FREE and see for yourself!
About the author
Oluwaseun is a full-stack Developer and technical writer with over four years experience in building enterprise software with Laravel, JavaScript (and its frameworks) and Python. He hopes to simplify technical concepts through his articles on various programming fields.
Oluwaseun Raphael Afolayan
CometChat