Application for collecting data from the Telegram (part 2: Chat participants)

vadimkolobanov

Vadim Kolobanov

Posted on October 31, 2021

Application for collecting data from the Telegram (part 2: Chat participants)

Hello everyone In the second part, we will continue to collect chat data from the "secure and twice encrypted" Telegram server.

Before reading the article, I strongly recommend reading the first part. In it, we created a Telegram developer account and set up our project.

Now our project is just one ".py" file, settings and session file.
As one well- known wisdom says:


A chain is only as strong as its weakest link.


Therefore, we will make a beautiful strong project

In this part, we will get a list of users from the telegram chat and see, what information about its participants telegram will give us if we are not the chat administrator, but went there as a regular user.

Go to PyCharm my friends

In order not to get confused in our code in the future, we will create several files in the project directory:

Users.py
links.txt

We will write all our code in this chapter in a separate file Users.py . This will greatly simplify our work in the future.

Let's import this file into our main project, which we wrote in the first part (At the end of the article I will post the code from both parts so don't worry)

import Users
Enter fullscreen mode Exit fullscreen mode

In addition, for further work with chat users, we will need a couple more imports from our Telethon library.

from telethon.tl.functions.channels import GetParticipantsRequest
from telethon.tl.types import ChannelParticipantsSearch
Enter fullscreen mode Exit fullscreen mode

For clarity and convenience, let's install the tqdm library in our project. It will allow us to create in our console a beautiful readable Progress Bar ( a graphical strip of the progress of our unloading )

Writing the pip install tqdm command

Importing the library class into our project

from tqdm import tqdm
Enter fullscreen mode Exit fullscreen mode

All imports in main file:

import configparser
from telethon import TelegramClient
import Users
from telethon.tl.functions.channels import GetParticipantsRequest
from telethon.tl.types import ChannelParticipantsSearch
from tqdm import tqdm
Enter fullscreen mode Exit fullscreen mode

We still have a file that we don't understand links.txt , which you and I will use as a list of our links to chats from which we will parse data, but we will talk about this a little further.

In our file Users.py let's create an async function:

async def dump_all_participants(channel,
                                ChannelParticipantsSearch,
                                client,
                                GetParticipantsRequest,
                                tqdm):
Enter fullscreen mode Exit fullscreen mode

channel— this will be our telegram chat which we will pass to our function

ChannelParticipantsSearch and GetParticipantsRequest are our imports that we did above, they are the same classes of the Telethon library that we will need in our function.

tqdm is our library for progress bar

client is our connection that we created in the first part. No way without it)

Now let's set up reading chat links from our file links.txt

In our main file Update.py inside the main function from the last part, we will write this code

async def main():
    with open("links.txt", "r") as f:
        while True:
            try:
                text = f.readline()
                url = text
                channel = await client.get_entity(url)
                await Users.dump_all_participants(channel, ChannelParticipantsSearch,
                                                  client, GetParticipantsRequest, tqdm)
            except Exception:
                pass
Enter fullscreen mode Exit fullscreen mode

Here, immediately after reading the file, we will call our dump_all_participants function from the file Users.py by passing our arguments to it.

Moving on....

Go to the file Users.py where we created our dump_all_participants function and we will write the initial settings for the Telethon library.

async def dump_all_participants(channel,
                                ChannelParticipantsSearch,
                                client,
                                GetParticipantsRequest,
                                tqdm):
    print('Get info from channel', channel.title)
    OFFSET_USER = 0   #start user position
    LIMIT_USER = 200  #the maximum number of records transmitted 
                      #at a time, but not more than 200
    ALL_PARTICIPANTS = []  
    FILTER_USER = ChannelParticipantsSearch('') #filter user

Enter fullscreen mode Exit fullscreen mode

Let's create an infinite while loop:

while True:
Enter fullscreen mode Exit fullscreen mode

Within our cycle, the magic itself begins.

participants = await client(GetParticipantsRequest(channel,
               FILTER_USER, OFFSET_USER, LIMIT_USER, hash=0)
if not participants.users:
            break
ALL_PARTICIPANTS.extend(participants.users)
OFFSET_USER += len(participants.users)


Enter fullscreen mode Exit fullscreen mode

We turned to telegram, and with the help of its class and our constants, we sent a request to get a list of users of the chat we are interested in, which is stored in the channel variable.

As a result, in the participants variable we will get a list of objects that contain information about each user.

For clarity, we will prescribe in advance the recording of the received information in a text file:

with open(f"{channel.title}.txt", "w", encoding="utf-8") as file:
Enter fullscreen mode Exit fullscreen mode

Be sure to specify the encoding to avoid problems with nicknames containing characters.

Now let's iterate through our users in a loop but using our tqdm library

 for participant in tqdm(ALL_PARTICIPANTS):
        try:
           file.writelines(f" ID: {participant.id} "
                           f" First_Name:{participant.first_name}"
                           f" Last_Name: {participant.last_name}"
                           f" Username: {participant.username}"
                           f" Phone: {participant.phone}"
                           f" Channel: {channel.title} \n")
            except Exception as e:
                print(e)
print('The end!')
Enter fullscreen mode Exit fullscreen mode

Here is the result on the example of the first found chat about Python:
Example

In the root of our program, we have a file {channel name}.txt which contains information about users

Users
To run our parsing, in a file links.txt you need to insert a link to a chat or group. You can find this link in the chat description.
Links
Important! You will not be able to parse the channel because you are not its administrator. To parse a chat or a group, you need to be subscribed to them.)

There can be many links to chats. It is important to write them from a new line.

Perfectly! We just downloaded the list of chat users. We found out their unique identifiers (IDs), which will help us in the future to find messages of a certain user in this (or in any other chat). We will talk about unloading messages from the chat in part 3.

Thank you all for your attention! Write about your impressions in the comments.

By tradition, the full code.


File Update.py


import configparser
from telethon import TelegramClient
import Users
from telethon.tl.functions.channels import GetParticipantsRequest
from telethon.tl.types import ChannelParticipantsSearch
from tqdm import tqdm
config = configparser.ConfigParser()
config.read("config.ini")

api_id: str = config['Telegram']['api_id']
api_hash = config['Telegram']['api_hash']
username = config['Telegram']['username']
client = TelegramClient(username, api_id, api_hash)
client.start()
async def main():
    with open("links.txt", "r") as f:
        while True:
            try:
                text = f.readline()
                url = text
                channel = await client.get_entity(url)
                print(url)
                await Users.dump_all_participants(channel, 
                ChannelParticipantsSearch,
                client, 
                GetParticipantsRequest, tqdm)
            except Exception:
                pass
with client:
    client.loop.run_until_complete(main())
Enter fullscreen mode Exit fullscreen mode

File Users.py


async def dump_all_participants(channel, 
                                ChannelParticipantsSearch,
                                client,
                                GetParticipantsRequest, tqdm):

    print('Get info from channel', channel.title)

    OFFSET_USER = 0  
    LIMIT_USER = 200 
    ALL_PARTICIPANTS = []  
    FILTER_USER = ChannelParticipantsSearch('') 
    while True:
       participants = await client(GetParticipantsRequest(channel,
                     FILTER_USER, OFFSET_USER, LIMIT_USER,hash=0))
        if not participants.users:
            break
        ALL_PARTICIPANTS.extend(participants.users)
        OFFSET_USER += len(participants.users)
        with open(f"{channel.title}.txt", "w", encoding="utf-8")
                                                         as file:
            for participant in tqdm(ALL_PARTICIPANTS):
                try:
                    file.writelines(f" ID: {participant.id} "
                          f" First_Name:{participant.first_name}"
                          f" Last_Name: {participant.last_name}"
                          f" Username: {participant.username}"
                          f" Phone: {participant.phone}"
                          f" Channel: {channel.title} \n")
                except Exception as e:  
                    print(e)
    print('The end!')
Enter fullscreen mode Exit fullscreen mode

I wish you success!

Write me on Face....oh...Meta

My Twitter

Become a patron

💖 💪 🙅 🚩
vadimkolobanov
Vadim Kolobanov

Posted on October 31, 2021

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related

What was your win this week?
weeklyretro What was your win this week?

November 29, 2024

Where GitOps Meets ClickOps
devops Where GitOps Meets ClickOps

November 29, 2024