Website: https://impersona-website.vercel.app
This repository contains the code to replicate the experiments in the paper "IMPersona: Enabling Individual Impersonation for LLMs" to create custom IMPersonas for LLMs. We support data downloaded from iMessage only currently.
- ICL (Sample chats in prompt)
- Memory (Baseline memory: recommended for basic testing)
- Hierarchical Memory (Best performance: but much slower inference)
- OpenAI (Prompting), Anthropic (Prompting), Together (Prompting, Finetuning)
- Local Custom Models (Prompting, Finetuning)
- Training-Free: Claude-3.5-Sonnet
- Training: Llama-3.1-8B-Instruct
-
Clone and navigate to the repository.
git clone https://github.com/princeton-nlp/p-impersona cd p-impersona
-
Create a virtual environment and install dependencies.
python -m venv .venv source .venv/bin/activate pip install -r requirements.txt
-
Download the imessage-exporter. If using Homebrew on Mac, run:
brew install imessage-exporter
If using cargo:
cargo install imessage-exporter
You may need to install xcode developer tools first:
xcode-select --install
You may otherwise install it manually:
git clone https://github.com/ReagentX/imessage-exporter cd imessage-exporter cargo run --release
-
Export your iMessage data from your Mac and/or iPhone.
# Export from your Mac imessage-exporter --format txt --export-path imessage_export_mac
If your Mac is missing many conversations, follow these steps to export from your iPhone:
-
Backup your iPhone
- Connect your iPhone to your Mac via USB and unlock your phone
- Click "Trust this computer" if prompted
- Open Finder and navigate to your iPhone under Locations
- Make sure "Encrypt local backups" is turned OFF
- Click "Back Up Now" and wait for the backup to complete (10-30 minutes)
- Once complete, you can eject and disconnect your iPhone
-
Find your backup location
# List backups (most recent will be last) ls -latr /Users/$(id -un)/Library/Application\ Support/MobileSync/Backup/
- Note the most recent backup folder (e.g.,
00004540-000107C20F90003H
)
- Note the most recent backup folder (e.g.,
-
Export messages from your backup
# Replace <backup_path_here> with your backup folder path imessage-exporter --format txt --export-path imessage_export_ios --db-path /Users/$(id -un)/Library/Application\ Support/MobileSync/Backup/<backup_path_here>
-
Merge data (if you exported from both sources)
python merge_imessage_exports.py
This will save the combined data in
imessage_export_mac_and_ios
If you use Facebook Messenger, you can export your Messenger data too.
- Download data from Facebook Messenger
- Go to Messenger for web
- Click on your profile picture in the corner
- Click on "Privacy and Safety"
- Click on "End-to-end encrypted chats"
- Click on "Message storage"
- Click on "Download secure storage data"
- Click on "Download file" (this may take a few minutes to start downloading)
- Move the downloaded file to this repo's root directory and name is
messenger_export.zip
- Unzip the file
unzip messenger_export.zip -d messenger_export
- Convert Messenger data to iMessage data
python convert_messenger_data.py --me_participant <your_name_here>
- This will automatically move the converted data to
data/imessage_export
, so you don't need to do anything else after running the script. - You can remove the
messenger_export
folder after running the script.
- This will automatically move the converted data to
# Choose either the mac-only or the merged export mv imessage_export_mac_and_ios data/imessage_export # OR mv imessage_export_mac data/imessage_export
-
-
[Highly Recommended] The above export utilizes phone numbers to identify users. To use names instead, you will need to export your contacts as a vcf file. Some users have reported issues with their mac and phone contacts not syncing: use whichever contact app contains the updated information.
- On your Mac: Go to
Contacts (Mac App) -> Select All (Cmd + A) -> File -> Export -> Export vCard
. - On your phone: Go to
Contacts (Phone App) -> Lists -> Long Press All Contacts -> Export -> AirDrop to Mac
. - Rename and move the file to
data/contacts.vcf
.
- On your Mac: Go to
-
Run the following command to process the data files for training. The terminal will prompt you to input any contacts that were left out of the vcf file, so pay attention to the output.
python process_imessage.py
-
Create a
.env
file in the root directory. Add your API key for OpenAI, namedOPENAI_API_KEY
, as well as other APIs that you plan to use.# .env OPENAI_API_KEY=<your_api_key> ANTHROPIC_API_KEY=<your_api_key> TOGETHER_API_KEY=<your_api_key>
Training is only necessary if you want to create finetuned IMPersonas. If you wish to only interact with prompting based IMPersonas, you may skip this step. If you have the resources to do so, we recommend training locally and on the full dataset for best results. TODO: better configuration for low-memory settings.
Option 1: Use Hugging Face
- We provide a script
IMPersona/train.py
that will train a finetuned IMPersona locally. You may need to install additional dependencies. To finetune Llama-3.1-8B-Instruct, on the full dataset, run the following command:python IMPersona/train.py \ --model_name meta-llama/Llama-3.1-8B-Instruct \ --dataset_path ./data/impersona_imessage_0buffer_BFull.json \ --output_dir ./output \ --num_epochs 3 \ --learning_rate 1e-4 \ --use_lora \ --lora_r 8 \ --lora_alpha 32 \ --lora_dropout 0.05 \ --format_template llama
- Note: The default effective batch size is 8. If you need to reduce the batch size, you may increase the gradient accumulation steps to compensate.
Option 2: Use LLaMA-Factory (Recommended for low-memory settings)
- Follow the instructions here to install LLaMA-Factory.
- Use LLaMA-Factory scripts for lora training and inference (training sets already in proper format)
If Using Together API
-
In the command line, run the following with your API key:
export TOGETHER_API_KEY=<your_api_key>
-
Run the following commands to check + upload the dataset to Together. Keep track of the file ids generated: you will need them to submit fine-tuning jobs.
together files check data/<your_name_here>_impersona_imessage_0buffer_BFull_together_format.jsonl
If you cannot find the file id, run the following to see a list of files:
together files list
-
Submit fine-tuning job(s) to the Together API.
together fine-tuning create \ --training-file <file_id> \ --model meta-llama/Meta-Llama-3.1-8B-Instruct-Reference \ --lora \ --suffix <your_name_here>-BFull \ --n-epochs 3 \ --batch-size 8 \ --learning-rate 0.0001
- Check the status of your fine-tuning jobs with the following command.
together fine-tuning list
- Check the status of your fine-tuning jobs with the following command.
-
After the job has finished, save/keep track of the value in model output name. This is how you will call your models.
-
If you need to see the model output name again, you can see them with the following command.
together fine-tuning list
-
Run the following command to create a memory bank for your IMPersona. This is necessary for the memory inference setting. This section will require an OpenAI API key.
python process_memory.py
Note: You may encounter errors in this step in the terminal. If you do, just rerun the script.
-
[Optional] To visualize the memories created, feel free to run the following web UI:
python memory_visualizer.py
-
The
run_impersona_chat.py
script allows you to converse with your IMPersona in a chat-room esque UI. Simply run the following command, fill in the parameters, and start chatting!python run_impersona_chat.py
Alternatively, to chat with your IMPersona in the terminal, you may use
run_impersona.py
. For example, to run Claude with icl and memory, run the following command:python run_impersona.py --model_name claude-3-5-sonnet-20241022 --impersonation_name <your_name_here> --memory --icl
-
[Optional] For those operating the Human or Not IMPersona task, you may use the
run_impersona_web.py
script to interface with the impersona. The training pipeline is optimized for this usage. The script takes in a full conversation in the format given by the web interface https://impersona-web.vercel.app/ and will output a response with<|>
serving as message delimiter. It takes in the same arguments asrun_impersona.py
. For example, to run Llama-BFull + BM, run the following command:# llama-BFull + BM python run_impersona_web.py --model_name <finetuned_model_name_here> --memory --impersonation_name <your_name_here>