Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Chatistics – Turn Telegram, WhatsApp, Messenger Chatlogs to DataFrames (masterscrat.github.io)
119 points by MasterScrat on Jan 16, 2020 | hide | past | favorite | 25 comments



Holy cow dude. I was working on the same thing in my limited free time but only for me and my buddies around me. You know the craziest part? I also called it Chatistics. Well, I assume it's not the furthest stretch, but somehow this is super surreal to me right now.

I got the idea when I downloaded a chat history of a 5 year old group chat of my friend group, and kinda just winged it from there.

If you're at all looking to collaborate, shoot me a message. I'll definitely keep an eye on your project.

I haven't started many personal projects (and finished even less), but again, this is a super weird experience for me haha. Good job man!


I also did this! Didn't call it Chatistics, though.


Direct link to repo: https://github.com/MasterScrat/Chatistics

I started this project years ago just to create word clouds - but it turned out it to be useful for many more things!

If you are interested in support for other platforms (Discord, Slack...), just create or subscribe to the relevant issue, we'll use them as a simple voting system: https://github.com/MasterScrat/Chatistics/issues


Cool, great to see people getting their data back from the cloud!

Some related links:

- https://github.com/fabianonline/telegram_backup : tool for incremental Telegram exports into sqlite database, I've been using it for several years with no issues. From a quick glance, code in chatistics isn't incremental and going to require redownloading everything? Perhaps it makes sense to adapt telegram_backup output database to chatistics and benefit from existing tools.

- https://github.com/karlicoss/fbmessengerexport : my own script for incremental FB Messenger exports, also outputs in an sqlite database


Also, I've got a python package that's kind of my personal API [0], so I just tried to demonstrate how would similar analysis look for me using my.fbmessenger module [1]. Not as pretty as in chatistics, but just few lines of code. The hardest bit, of course, is encapsulating all the complexity of message processing in my. package.

[0] https://github.com/karlicoss/my

[1] https://beepb00p.xyz/mypkg.html#messenger_stats


Something I really want someone to build (if you're interested let me know) is a true personal relationship manager with support for all conversations on all platforms.

Email, sms, WhatsApp, discord, messenger, some obscure website chat platform whathaveyou...

The ability to cross link profiles and truly manage contact lists across all those platforms as a single object. So Emma on Discord is the same contact in the app as Emma on Messenger, and on SMS, etc.

All this can easily be stored in Google contacts as metadata and would allow the user to backup and index their chat conversations, and quickly search through previous messages across all platforms. And backup all photos and files sent through those platforms as a single stream.


So, Trillian / Jitsi ?


Not quite. Those are write only and aren't contact centric. Also nowadays they look like slack clones, I can't even find the cross chat platform functionality anymore


I'm not sure what do you mean, since AFAIK they don't have any "native" chat, everything is cross-chat ? Write only ? Contact-centric would be a nice goal, but the best you can do is probably Android-style "Contacts" ?


https://github.com/MasterScrat/Chatistics, or using python to find out you text more about poop than you thought. your experience may vary, try it out.


I like it. I generated the word cloud for the biggest WhatsApp chat I have and it revealed that interjections prevail in the conversation =)


Could anyone explain how to use this for whatsapp chat ? my output always this

  Cannot infer own-name from less than {min_conversations} conversations. Please provide your username manually with the --own-name argument.')
  Exception: Cannot infer own-name from less than 2 conversations. Please provide your username manually with the --own-name argument.


The problem is that Chatistics needs to establish who you are ("you" as in "the person whose chat logs were exported") before it can parse anything.

This is because for fields such as `conversationWithName`, it needs to know which of the parties is "you" vs who is an interlocutor.

In general, this is easy enough to figure out: look at multiple conversations, and find out who is always part of them. However if you export a single discussion, then this can't be inferred!

In that situation as eecom pointed out (and as indicated by the script output) you need to specify your username using the `--own-name` argument, using the precise same name that appears in your WhatsApp account.


I had the same error. Did you try to provide your name as "--own-name <your-name>"? That fixed it for me.


I put like this still same output python parse whatsapp —own-name “myname”


Wow! I was working recently with Telegram chat analysis. Thanks for sharing, that looks promising, I'll be happy to contribute as well.


Would love something like this for email threads.


is there any way to use telegram's own export output


Yes this^ please. The other way is actually risking your account being taken down for flooding the API. And if not you at least will run into getting errors because of secret limits that will be reached. obviously this is only relevant if you have a lot chats and an old account.

Telegram Desktop can export to JSON so it just a matter of importing these JSON files. It does slow down when needed but can export 1k chats in minutes if you don't export media.


I don't understand what you mean?


you can export telegram chats using official desktop client in html or json format.

https://telegram.org/blog/export-and-more


Telegram has an export functionality.


I wasn't aware of that! We will investigate, although since there's already a way to export from that platform this won't be a priority...

> The other way is actually risking your account being taken down for flooding the API.

Have you heard about such problems? Are there published API limits?

I don't use Telegram so much, but for what it's worth I know people who used Chatistics to export 10s of thousands of messages without a problem.


It may be helpful if you link to the specific API export you are referencing.


I don't think it's possible via the api but some of the desktop clients allow you to export JSON files in a nice format. For macos, you need to download the appstore version as the other (nicer) one does not include this functionality




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: