Understanding writing style in social media with a supervised contrastively pre-trained transformer

Key Points

Key points are not available for this paper at this time.

Abstract

We introduce the Style Transformer for Authorship Representations (STAR) to detect and characterize writing style in social media. The model is trained on a heterogeneous large corpus derived from public sources with 4.5⋅106 authored texts from 70k authors leveraging Supervised Contrastive Loss to minimize the distance between texts authored by the same individual. This pretext pre-training task yields competitive performance at zero-shot with PAN challenges on attribution and clustering. We attain promising results on PAN verification challenges using STAR as a feature extractor. Finally, we present results from our test partition on Reddit, where using a support base of 8 documents of 512 tokens, we can discern authors from sets of up to 1616 authors with at least 80% accuracy. We share our pre-trained model at huggingface AIDA-UPM/star and our code is available at jahuerta92/star.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Huertas‐Tato et al. (Mon,) studied this question.

synapsesocial.com/papers/68e6d055b6db64358764deee https://doi.org/https://doi.org/10.1016/j.knosys.2024.111867

Bookmark

View Full Paper